The NMRlipids Project: NMRlipids databank: Quality evaluation

Wednesday, May 18, 2022

NMRlipids databank: Quality evaluation

Defining a quantitative quality measure for lipid bilayer simulations has been one of the goals of the NMRlipids project since the beginning. Such measure is highly useful when selecting the best force field for a specific application, and for improving force field parameters, particularly with automated procedures. Based on literature review and results of the NMRlipids Project, summarized in the NMRlipids V publication, we have concluded that the C-H bond order parameters from NMR can be used to evaluate the conformational ensembles of individual lipids, and the x-ray scattering form factors can be used to evaluate the lipid bilayer dimensions. Based on the work in the NMRlipids workshops in Berlin (2019) and Prague (2021), we have now written a code that evaluates the quality of simulations in the NMRlipids Databank. The key ideas and results of the quality evaluation are described in this post. More details and results can be found from the NMRlipids Databank manuscript and from GitHub.

Results. The order parameter quality of 58 simulations and the form factor quality of 99 simulations have so far been evaluated in the NMRlipids Databank. Figure 1a shows the results for the 13 best simulations according to the overall order parameter quality; and Figure 1b shows comparison between simulations and experiments for the best simulations concerning the overall order parameters (left column), the headgroup order parameters (middle column), and the form factor (right column). Results for all ranked simulations ordered in various ways are available on GitHub.

Figure 1: a) The best 13 simulations currently in the NMRlipids Databank according to the overall order parameter quality. b) Comparison of x-ray scattering form factors and C-H bond order parameters between simulations and experiments demonstrated for the simulations giving the best qualities in the overall order parameters (left column), for the headgroup order parameters (middle), for the and x-ray scattering form factor (right).

Conformational ensembles evaluated against C-H bond order parameters from NMR. After the workshop in Prague, our idea was to define the poorness Š of each order parameter as Š=-log(P); here P is the probability mass within the experimental error for a normal distribution, whose mean is the order parameter from the simulation, and whose standard deviation is the standard error of the mean from the simulation. However, when testing this definition of Š on the simulations in the NMRlipids Databank, it turned out that the probability of the simulated order parameters to locate within experimental errors was often below the numerical accuracy of computers. To avoid such numerical instability, we decided to use the first order Student’s t-distribution instead, and calculate the probability from the equation\begin{equation} P = f \left( \frac{S_{\rm CH} - (S_{\rm exp}+\Delta S_{\rm exp})}{s/\sqrt{n}} \right) - f \left( \frac{S_{\rm CH} - (S_{\rm exp}-\Delta S_{\rm exp})}{s/\sqrt{n}} \right),
\end{equation}where f(t) is the first order Student's t-distribution, s is the variance of the order parameter S_CH calculated over individual lipids and n is the number of lipids in the simulation. Because Student's t-distribution has heavier tails than normal distribution, even order parameters far from experiments have distinguishable non-zero probabilities. Therefore, the logarithm used to define the poorness Š is not needed, and we report the qualities directly as probabilities. However, it should be noted that using the first order Student's t-distribution instead of the normal distribution slightly underestimates the statistical accuracy of order parameters calculated from simulations.

In order to rank simulations based on headgroup, acyl chain, or individual lipid qualities, the average probabilities can be calculated over lipid fragments and types. For more details see the NMRlipids Databank manuscript.

Lipid bilayer dimensions evaluated against x-ray scattering form factor. The qualities of form factors in simulations are evaluated as in the SIMtoEXP program \begin{equation}
\chi^2 = \frac{\sqrt{\sum_{i=1}^{N_q}(|F_s(q_i)|-k_e|F_e(q_i)|)^2/(\Delta F_e(q_i))^2}}{\sqrt{N_q-1}},
\end{equation}
where F_s is the form factor from simulation and F_e from experiment, the summation goes over the experimentally available N_q points, and \begin{equation}
k_e = \frac{\sum_{i=1}^{N_q} \frac{|F_s(q_i)||F_e(q_i)|}{(\Delta F_e(q_i))^2}}{\sum_{i=1}^{N_q} \frac{|F_e(q_i)|^2}{(\Delta F_e(q_i))^2}}.
\end{equation}It should be noted that in this evaluation the simulation uncertainty is not accounted for in any way.

8 comments:

HanneMay 20, 2022 at 11:43 AM
Looks amazing! You and Anne have done very nice job with this.

I would still consider taking account the errors in the form factors too since they are accounted for the order parameters.
If you definitely do not want to add them to the quality estimator, they would be nice to have at least the plots so one can visually asses the overlap with the experimental data and where the curves are most accurate.

Error bars should be an easy addition to the code if you calculate the form factors averaging the frame-vise form factors.

Having now found several caveats in the converge and calculation of the form factors, I have trouble trusting any curves published in the litterature. It would sense to do this with care once an for all for the databank.

BR

Hanne

ReplyDelete
Replies

Add comment

Please sign in before writing your comment.

Pages

Wednesday, May 18, 2022

NMRlipids databank: Quality evaluation

8 comments: