Thursday, September 20, 2018

NMRlipids III: Quantitative measure for the force field quality needed

Progress in the NMRlipids III project about lipid-cholesterol interactions has been slow because the focus has been recently in improving ion binding to PC lipid bilayers and charged membranes.

After revisiting the manuscript with a serious intention to finalize the project, I think that we need to define a quantitative measure for the force field quality to simplify the discussion. For example, Berger model gives the best agreement with form factor data with high cholesterol content, but too large order parameters. On the other hand, Slipids give better order parameters with and without cholesterol but the form factor with high cholesterol concentrations is less good, and so on. This kind of discussion could be significantly simplified with a quantitative quality measure for the force field quality.

The quality measure could be also used to rank the force field quality in the databank collected from the contributions to the NMRlipids project and in further automatic force field development. The simplest measure to start with could, for example, sum up the deviation from the experimental order parameters for different segments and the deviation from experimental form factor using equation (3) from the SIMtoEXP publication. Similar measure has been recently introduced for proteins in solution.

Any kind of ideas and contributions about measuring the force field quality are welcomed.

3 comments:

  1. I have added a draft of an analysis script (https://github.com/NMRLipids/MATCH/blob/master/scripts/NMRL3_analysis/analysis_NMRL3.py) that takes the form factors and the order parameters calculated from simulation, and outputs a file with the rmsd errors with respect to the experimental data. The output (fitness.txt) contains separate OP errors for different segments of the molecule (head group, sn1-tail, sn2-tail, etc, defined manually in the code), and for the lines for form factor error.

    For quantifying the error with respect to the experimental form factor, I compare the locations of first 2 minima and maxima, and the height ratios of the first and second peak, and the second and third peak. This choice was made because these quantities are expected to carry information of the shape of the form factor, be the least prone to experimental error, and no not require separate fitting of the simulated curve to the experimental one.

    For the file formats, naming conventions etc expected by the script, see the exemplary files included in the folder (https://github.com/NMRLipids/MATCH/tree/master/scripts/NMRL3_analysis).

    Feel free to modify & update the script according to your needs. I hope this is useful!

    ReplyDelete
    Replies
    1. Thanks for the script. I modified it a bit to print out the difference between experimental and simulated order parameters instead of the absolute value: https://github.com/NMRLipids/MATCH/commit/f6b709c01ae9adcbb8a391383468367e11a11b8c#diff-a308ad517aac7e5c03a8afaf6b619532
      In this way we see if the order parameters is larger or smaller in simulations when compared with experiments. I also plotted these differences from Slipids and CHARMM36 models with 50% and without cholesterol: https://github.com/NMRLipids/NmrLipidsCholXray/blob/master/FIGS/OrderParametersCHOLfitness-eps-converted-to.pdf
      MacRog results are not there because we cannot calculate all order parameters due to overlapping atom names and the script does not work is there is not equal amount of order parameters from simulations and experiments.

      I also plotted the fitness factor for form factors from simulations with 50% and without cholesterol: https://github.com/NMRLipids/NmrLipidsCholXray/blob/master/FIGS/FFfitness-eps-converted-to.pdf

      The form factors against experiments are in this figure: https://github.com/NMRLipids/NmrLipidsCholXray/blob/master/FIGS/FormFactors-eps-converted-to.pdf
      (only 50% cholesterol and without cholesterol calculated with corrected code, others should be recalculated)

      As expected from the form factors, the fitness code gives the best quality for Berger model with 50% cholesterol. However, at a first glance, the quality of Berger model without cholesterol should be worse, because the minimas are in different location than in experiments. However, this seems to have a lower weight in the fitness code than the ratios of peak heights at the maxima. I am not sure if this is reasonable?

      Delete
    2. I have now created issues for further discussion about generating fitness.txt for incomplete order parameter data: https://github.com/NMRLipids/MATCH/issues/64
      and for the definition of the form factor quality: https://github.com/NMRLipids/MATCH/issues/65

      Delete

Please sign in before writing your comment.