Wednesday, May 18, 2022

NMRlipids databank: Quality evaluation

Defining a quantitative quality measure for lipid bilayer simulations has been one of the goals of the NMRlipids project since the beginning. Such measure is highly useful when selecting the best force field for a specific application, and for improving force field parameters, particularly with automated procedures. Based on literature review and results of the NMRlipids Project, summarized in the NMRlipids V publication, we have concluded that the C-H bond order parameters from NMR can be used to evaluate the conformational ensembles of individual lipids, and the x-ray scattering form factors can be used to evaluate the lipid bilayer dimensions. Based on the work in the NMRlipids workshops in Berlin (2019) and Prague (2021), we have now written a code that evaluates the quality of simulations in the NMRlipids Databank. The key ideas and results of the quality evaluation are described in this post. More details and results can be found from the NMRlipids Databank manuscript and from GitHub.

Results. The order parameter quality of 58 simulations and the form factor quality of 99 simulations have so far been evaluated in the NMRlipids Databank. Figure 1a shows the results for the 13 best simulations according to the overall order parameter quality; and Figure 1b shows comparison between simulations and experiments for the best simulations concerning the overall order parameters (left column), the headgroup order parameters (middle column), and the form factor (right column). Results for all ranked simulations ordered in various ways are available on GitHub.

Figure 1: a) The best 13 simulations currently in the NMRlipids Databank according to the overall order parameter quality. b) Comparison of x-ray scattering form factors and C-H bond order parameters between simulations and experiments demonstrated for the simulations giving the best qualities in the overall order parameters (left column), for the headgroup order parameters (middle), for the and x-ray scattering form factor (right).
 

Conformational ensembles evaluated against C-H bond order parameters from NMR. After the workshop in Prague, our idea was to define the poorness Š of each order parameter as Š=-log(P); here P is the probability mass within the experimental error for a normal distribution, whose mean is the order parameter from the simulation, and whose standard deviation is the standard error of the mean from the simulation. However, when testing this definition of Š on the simulations in the NMRlipids Databank, it turned out that the probability of the simulated order parameters to locate within experimental errors was often below the numerical accuracy of computers. To avoid such numerical instability, we decided to use the first order Student’s t-distribution instead, and calculate the probability from the equation\begin{equation}  P = f \left( \frac{S_{\rm CH} - (S_{\rm exp}+\Delta S_{\rm exp})}{s/\sqrt{n}} \right) - f \left( \frac{S_{\rm CH} - (S_{\rm exp}-\Delta S_{\rm exp})}{s/\sqrt{n}} \right),
\end{equation}where f(t) is the first order Student's t-distribution, s is the variance of the order parameter SCH calculated over individual lipids and n is the number of lipids in the simulation. Because Student's t-distribution has heavier tails than normal distribution, even order parameters far from experiments have distinguishable non-zero probabilities. Therefore, the logarithm used to define the poorness Š is not needed, and we report the qualities directly as probabilities. However, it should be noted that using the first order Student's t-distribution instead of the normal distribution slightly underestimates the statistical accuracy of order parameters calculated from simulations.

In order to rank simulations based on headgroup, acyl chain, or individual lipid qualities, the average probabilities can be calculated over lipid fragments and types. For more details see the NMRlipids Databank manuscript.

Lipid bilayer dimensions evaluated against x-ray scattering form factor. The qualities of form factors in simulations are evaluated as in the SIMtoEXP program \begin{equation}
    \chi^2 = \frac{\sqrt{\sum_{i=1}^{N_q}(|F_s(q_i)|-k_e|F_e(q_i)|)^2/(\Delta F_e(q_i))^2}}{\sqrt{N_q-1}},
\end{equation}
where Fs is the form factor from simulation and Fe from experiment, the summation goes over the experimentally available Nq points, and \begin{equation}
    k_e = \frac{\sum_{i=1}^{N_q} \frac{|F_s(q_i)||F_e(q_i)|}{(\Delta F_e(q_i))^2}}{\sum_{i=1}^{N_q} \frac{|F_e(q_i)|^2}{(\Delta F_e(q_i))^2}}.
\end{equation}It should be noted that in this evaluation the simulation uncertainty is not accounted for in any way.


8 comments:

  1. Looks amazing! You and Anne have done very nice job with this.

    I would still consider taking account the errors in the form factors too since they are accounted for the order parameters.
    If you definitely do not want to add them to the quality estimator, they would be nice to have at least the plots so one can visually asses the overlap with the experimental data and where the curves are most accurate.

    Error bars should be an easy addition to the code if you calculate the form factors averaging the frame-vise form factors.

    Having now found several caveats in the converge and calculation of the form factors, I have trouble trusting any curves published in the litterature. It would sense to do this with care once an for all for the databank.

    BR

    Hanne

    ReplyDelete
    Replies
    1. Additional reminder that the (relative) form factor lobe heights seem to be fairly sensitive to the system size. I just calculated a comparison of C36 and Slipids form factors from small (34 lipids) and big (200 lipids) simulations and scaled the first lobes to be identical for the different system sizes:
      https://github.com/hsantila/FormFactor_data/blob/main/Slipids_C36.png

      A quality estimator were significant amount of comparison point are on the lobes is very much affected by finite size effect. Conversely, the minima and maxima location in q-axis are fairly insensitive. A quick fix might be heuristically limiting the q-range where the comparison is performed?

      Delete
    2. The form factor quality estimator is currently taken directly from the SIMtoEXP and the simulation error is not there, so it is not in our equation either. If we would include it, would you have an idea how to do it exactly?

      Adding errors to plots is a good idea. Currently the code does not print the error, but I can take a look if we could do it. I am currently updating the code to work properly for united atom systems and systems with periodic jumps in z-direction.

      Delete
    3. About the system size, 34 lipids is quite a bit smaller than typical simulations. Have you checked whether there is size dependency when comparing only systems with more than 100 lipids for example? I have been looking correlations between form factor minima, area per lipid, thickness and membrane order, and the minima seems to correlate well with membrane properties, see https://github.com/NMRLipids/DatabankExercises/blob/master/APL/correlations.pdf and https://github.com/NMRLipids/DatabankExercises/blob/master/APL/AreaPerLipidAndThicknessExamples.ipynb

      It might be useful to try to figure out from the NMRlipids databank if lobe heights correlate with some membrane properties.

      Delete
    4. Samuli wrote: "The form factor quality estimator is currently taken directly from the SIMtoEXP and the simulation error is not there, so it is not in our equation either. If we would include it, would you have an idea how to do it exactly?"

      If I recall correctly, Hanne proposed an equation for this already in the NMRlipids2019 workshop in Berlin? Hanne, do you still remember how it was? Or Samuli, do you maybe still have the notes of what was proposed?

      Delete
    5. I do not remember such equation and do not find it from the notes that I have: https://raw.githubusercontent.com/NMRLipids/MATCH/master/scripts/NMRL3_analysis/qualitymeas.pdf

      Delete
    6. I think I included the sim2exp version to the slides in Berlin. However, including simulation error in this formula is quite easy. Now xi^2 essentially gives the difference between the simulated and experiment in the "units" of experimental error. To convert the scaling to experimental+simulation error one could just replace delta_Fe with delta_Fe+delta_Fs in the xi^2 formula.

      I have not checked how the size effect is with other system sizes and if it seems to saturate at some point. As Samuli said, it would be useful to do this analysis from the databank.

      As I said above, the minima are unaffected by the size so indeed it seems (luckily) unlikely that the strong correlation of the minima with the membrane properties is obscured by the possible simulation size effect.

      Delete
    7. I did more careful analysis on the size dependence of form factors and order parameters using the recent dataset containing simulations in different box sizes and cholesterol concentrations: https://doi.org/10.5281/zenodo.5767450

      The results can be found from here: https://raw.githubusercontent.com/NMRLipids/DataBankManuscriptText/main/Figures/SizeDependence.pdf

      Indeed, the lobe heights seem to decrease with increasing simulation box size. This seems to be related to the reduced peak heights in the electron density profiles. Nevertherless, the locations of form factor minima and order parameters are practically independent on the simulation box size.

      In conclusion, using the full form factor in the quality evaluation (as currently done) makes the quality dependent on the simulation box size. Therefore, a quality measure focusing on locations of form factor minima would be probably better. I am currently investigating different options for this.

      Delete

Please sign in before writing your comment.