Tuesday, April 23, 2019

NMRlipids IVb: Assembling the PE & PG results

As discussed in the previous post, the NMRlipids IV project concerning PE, PG and PS lipids was divided into two parts. The first part about the results from PS lipids is approaching to the submission, while the second part about PE and PG lipids is in a more preliminary stage.

I have now started to assemble the contributed data for PE and PG lipids into a manuscript in a new GitHub repository. From now on, I will call this part of the project as NMRlipids IVb. Current status of the manuscript and most important things to todo are summarized in this post.

Headgroup and glycerol backbone order parameters from lipids with different headgroups in experiments

Experimental order parameters delivered by Tiago Ferreira with the information about the sign revealed that the š›½-carbon order parameter of PG lipids is positive, while other lipids have negative order parameter for this carbon (Fig. 1). Therefore, the structure of PG lipid seems to be distinct from PC and PE, in contrast to the conclusions from the order parameter data without sign information in the previous post.

Figure 1: Experimental order parameters of the headgroup and glycerol backbone region C-H bonds from experiments. For literature references, see the manuscript draft.


Headgroup and glycerol backbone order parameters of PE and PG lipids in simulations

The experimental information about order parameter signs enables a similar comparison between different simulation models and experiments for PE and PG lipids as done previously for PC and PS lipids in NMRlipids I and NMRlipids IV projects. This comparison (Fig. 2) and the results in NMRlipids I and NMRlipids IV projects suggest that the differences between PC, PE, PG, and PS headgroups are well captured in CHARMM36 simulations, although order parameters of all C-H bonds are not within the experimental error for any of the lipids. Therefore, the structural differences between lipid headgroups could be analyzed from the CHARMM36 simulations, perhaps in a similar way as done for the PS headgroup in the NMRlipids IV project. The contributed data for PE and PG lipids from united atom simulations is not yet analyzed, partly because of the lack of suitable programs for the automatic analysis of the data.

Figure 2: Headgroup and glycerol backbone order parameters from PE (left) and PG (right) lipid bilayers from experiments and different simulation models. For literature references, see the manuscript draft.


Cation binding to lipid bilayers containing PE and PG lipids

Sodium binding to PE lipid bilayers seems to be slightly weaker than in corresponding simulations for PC lipids, but experimental data is not available. For calcium binding to PE lipids, we do not have experimental nor simulation data yet.

As discussed in the NMRlipids IV project, the counterion binding to negatively charged lipid bilayer is complicated to assess against experimental data. However, the results from PC:PG mixtures from CHARMM36 and Slipids simulations indicate that sodium ions could be slightly overbinding to PG lipids, similarly to PS lipids in the NMRlipids IV project. For calcium binding to PG containing lipid bilayers, we have data only from CHARMM36 simulations and the conclusions are not yet fully clear.



  1. Analyze united atom simulations of PE lipids (GROMOS, CHARMM36ua, OPLS-UA, Berger, and GROMOS-CKP) listed in Table I in the manuscript with or without the new code for united atom data.
  2. Analyze structural differences between POPC, POPE, POPG and POPS headgroups from CHARMM36 simulations. The most reasonable first step would be probably to follow a similar analysis that was done for PS lipids in the NMRlipids IV project.
  3. Simulation data for PG lipids with different force fields would be useful. Currently, we have only CHARMM36 and Slipids.
  4. Simulation data from POPC:POPG (1:1) mixtures with sodium counterions and different CaCl2 concentrations (0-1M) from different force fields would be useful. Currently, we have results only from CHARMM36.


  1. I have just uploaded a python code to determine the OP of UA/AA force fields to the scratch folder. It is also possible to ignore the explicit H atoms in AA forcefields by using the option "-a ignH". The minimum input is a pdb file and the itp file of the molecule for which you want to get the OP (your lipid molecule). Of course, you can also specify a xtc trajectory in case you want the average OP over it with the standard deviations. The script does not require any specific input file with the list of C atoms but reads them directly from the itp file.

    1. As discussed in the blog in september / october 2018 (http://nmrlipids.blogspot.com/2017/03/nmrlipids-iv-headgroup-glycerol.html?showComment=1537530206534#c5570719429573890331), and also during the NMRlipids 2019 workshop, I and some colleagues have been also working on a code for rebuilding H and calculate the OP from a UA trajectory. For information, we are close to releasing a first version.
      During the workshop, we also discussed some possible ways on how to integrate it with the existing one for AA trajectories initially written by @jmelcr (https://github.com/NMRLipids/MATCH/blob/master/scripts/calcOrderParameters.py).
      I guess your code, the one from @jmelcr and ours should be complementary.

    2. Hi Patrick. Thanks for your comment! I do agree it is good to have independent-complementary codes for the same aim. This will facilitate the detection of any bug. I wrote this because we needed it to analyze our trajectories... then we decided to share it here. We did not know you were working on this. Just to complete the information I wrote yesterday, in case you want to consider it: I used the code written by @jmelcr as a reference to validate my script. That is why I implemented the "-a ignH" option (ignore explicit H atoms and get the OP from rebuilt H's even for AA forcefields). There is a line commented in my code to write the new H's in pdb format, in case you can use them for anything. The code I uploaded yesterday also adds the H atoms in the terminal Carbons, in contrast with other codes for UA forcefields, and provides the OP for them. I also tried to make it efficient in terms of memory, processing and execution. It does not require specific input files but reads the structure from the itp file. The format of the output is also very similar to that of the @jmelcr script. I just have a slightly more conservative way of determining the standard deviation of the mean (you can see it in the code).

    3. Hi Angel, thanks for your reply. Indeed I understand that you developed it on your side if you needed it! But we were not aware of this neither. Anyway, the idea of sharing those codes will make easier to get the best out of them. And also, now, we all know that NMRlipids is the good place for uploading / asking / downloading some tools for analyzing lipid simulations. We (almost) all have the same needs (such as for example centering properly a bilayer ;-) ), so it is useful to share our scripts.
      About the strategy for rebuilding H in our code, we have to provide an input file describing the carbons on which we want to build H. As this task may be tedious, I'm also developing another script which automatically creates such a file based on graphs and the known connectivity of a given lipid.
      I have a few things to finish and some checks to do, but I do my best to upload something soon to NMRlipids github repo.

    4. Dear Angel,

      thanks a lot for the code! I have couple of questions and comments below.

      1) I used the code to calculate the order parameters of the Berger POPC simulation (databank: https://github.com/NMRLipids/MATCH/tree/master/Data/Lipid_Bilayers/POPC/T298K/MODEL_Berger,
      original data: http://dx.doi.org/10.5281/zenodo.13279)
      The results given by the code are here:
      They are close to the ones using gromacs protonate and awk scripts in the databank folder (link above),
      but not exactly the same. For example, for the headgroup beta carbon order parameters the results in databank are
      0.0372539 0.0708691,
      while the new code gives
      Atom_name Hydrogen OP STD STDmean
      C5 HR 0.05052 0.46963 0.00059
      HS 0.08555 0.47931 0.00060
      Do you have some idea where the difference could originate?
      Patrick, have you analyzed this dataset (links above) with your code?

      2) Could you specify what do you mean with the more conservative way of determining the standard deviation? In NMRlipids we use the standard deviation over different lipids, not block averaging or something like that. After a quick look into the code, I was not sure if you referred to this or the programming style.

      3) In the instructions there was a "-r" flag, which probably referred to residue. However, this was not
      in the code and I just ran it without this flag. Probably, this has been removed from the code, but is still in the instructions?

      4) Reading information from itp is very useful! However, we have quite a lot of data in the databank without itp. For these, reading from tpr would be good, but that might be tedious to program. Patrick, how this information is read by your code?

    5. Hi Patrick
      it would be nice to do the analysis without any input (carbon list or itp file). I am sure this would help. Good improvement!

    6. Hi Samuli
      I will try to answer your questions:

      1) About the differences between both scripts I have no idea. We validated our script in three different ways: (i) by checking that the added Hydrogens are correct (the line to represent the H atoms in pdb format is commented in the script) (this is qualitative); (ii) comparing the OP for all-atoms force fields by considering the explicit atoms and by rebuilding them (semiquantitative validation) and (iii) comparing our results to those coming from the script of @jmelcr (quantitative analysis). All the validations were correct for different force fields and lipids. We got exactly the same results than those from @jmelcr for explicit all-atom force fields. We did not detect any fail in the rebuilding of the Hydrogens.

      2) On the more conservative way to get the standard deviation of the mean: we get the standard deviation of the average of all the lipids and frames together while the script of @jmelcr does the calculation over lipids in a single frame and then over frames (as far as I remember). Our standard deviations are a bit larger than those of @jmelcr except when you have a single frame. In that case the results are identical.

      3) you are right. Initially we put that flag for the residues but then we decided to use directly the itp file. You can just ignore it.

      4) I guess Patrick will not need the itp since the bonds can be directly read from the structure, as Patrick suggests. I used the itp since I assumed that they are always available. Actually I think they should be available. At some point they had to be available since the itp file was required to generate the tpr. I could modify my code to generate the Hydrogens without the itp file but since Patrick is already working on that I guess it is worth to wait...

    7. Ad 2) In the earlier version of the calcOrderParameter.py script, we have also used statistics from all lipids and trajectory frames. This is now done only for the mean value of the order parameter as shown here
      https://github.com/jmelcr/MATC/blob/a448e2d11a6a526ba11430e50e4df2dfd2d48c2f/scripts/calcOrderParameters.py#L159 .

      For the error estimate, however, we have decided to use the standard error of the mean (denoted as STEM in the code). We first calculate standard error (STD) for each lipid separately from the whole trajectory and after that we calculate STEM from this ensemble of measurements. So, first trajectory averaging, then averaging over lipids.

      Since the reconstruction of the hydrogen atoms is a crucial step in the accuracy of the whole script, this needs to be rather well documented. It would also make it easier to point to any possible sources of differences between different codes.

  2. Thank you for the comment Josef. We used your script in order to validate ours and the results are identical for the OP values of explicit H atoms. The only difference is in the determination of the STEM values. As you say, you do the calculation over lipids and then over frames while we get the STEM of all lipids over all frames together, this is why I said that our calculation of the STEM is a bit more conservative, our STEM values are a bit larger than yours. For a single frame our results are identical for OP, STD and STEM.

    Regarding the OP of non-explicit H atoms, we had not compared our results to those obtained from the awk scripts mentioned by Samuli. As I explained above, we validated the reconstruction of H atoms and, once we have the coordinates, we validated at quantitative level the determination of OP for explicit atoms using your script.

  3. Hello all, sorry for my late answer, I have little time now. I'll try to give a first answer to the different questions. Since my message is too big, I cut it in two.

    (Message 1/2)
    1) OP of the Berger POPC simulation: I was unsure what to analyze since there are 2 trr files. I have tried with the first one (0-25 ns). After a quick look (no real quantification), it seems that I have generally differences on the 3rd digit (more seldom to the 2nd digit). For the two beta op my script finds 0.03634 and 0.06587. Could you please confirm on what time window I should analyze? Then I can make a more quantitative estimate of the difference.

    4) My program needs a file (for now let's call it dic_lipids.py) containing this:
    POPC = { "atom_1": ("typeofH2build", "helper1", "helper2"),
    "atom_2": ("typeofH2build", "helper1", "helper2"),
    "atom_n": ("typeofH2build", "helper1", "helper2", "helper3"),
    Where "typeofH2build" is in ("CH3", "CH2", "CH", "CHdoublebond"), "helper1" to "helper3" are the names of helper atoms that are needed to build the H(s). For example on a saturated chain, if we have C10, C11 and C12 that follow each other, and if C10 is a CH2, we would have: "C11": ("CH2", "C10", "C12"). I also use (for now) the same type of file as the one needed by Josef's script (found in https://github.com/NMRLipids/MATCH/blob/master/scripts/orderParm_defs/) so that I can have the same output as Josef and it makes comparison easy. This is a tedious task to write such files, and we have to do it for each type of lipid *and* force field. However, as I told above, I'm developing another script that should be able to build automatically dic_lipids.py for any force field.

    The script is not ready yet, but I can explain the idea. First I build a graph of one lipid molecule with the nodes having a name you use in your mapping files (e.g. M_G1_M, M_G1H1_M, etc), say for example a POPC. Then I use a pdb with a single POPC (structure chosen from a given force field, with PDB atom names as indicated in itp or psf), I build another graph with the nodes having the name of each atom found in that pdb. Then I match the two graphs and get the correspondance between the generic names and the names in the pdb (as a python dictionnary).

    So what we need in the end is only a valid pdb (or gro file) with one lipid. With that script we should be able to build any of those files:
    - generic to pdb atom names (such as https://github.com/NMRLipids/MATCH/tree/master/MAPPING)
    - the one needed by my program (dic_lipid.py)
    - the one needed by the script of Josef (https://github.com/NMRLipids/MATCH/tree/master/scripts/orderParm_defs)
    One could argue that at the beginning, it'll be needed to build a file with the connectivity of a lipid using generic names. However, we have to do it only once for a given lipid (e.g. POPC), and building a new lipid will be quite trivial if the structure is not so different (e.g. building POPS or POPE from POPC). In the end, if we already know the structure (connectivity) of a lipid, e.g. POPC, we will only need a PDB file (no itp, tpr, psf, etc., needed).

    1. Thanks Patrick. In my analysis, I have combined the trajectories and analyzed 0-50ns (gmx trjcat -f popc0-25ns.trr popc25-50ns.trr in README in https://github.com/NMRLipids/MATCH/tree/master/Data/Lipid_Bilayers/POPC/T298K/MODEL_Berger).

      You results seems to be quite close to protonate/awk combination as expected from what you describe, but it might be nice to check by analyzin the same frames.

    2. Hi Samuli, thanks for the precisions. So I ran my code on the full trajectory, and the mean absolute difference between the OP calculated with my script and the awk one is 0.002. It might be even less because it is possible that some hydrogens were mixed (in the database, for the case we have 2 or 3 hydrogens, I didn't check whether my code outputs these hydrogens in the same order, that's why it's useful to have the name of each H as in Josef's script). I can put the results here if you want, let me know. I do my best to make my code available as soon as I can.

  4. (Message 2/2)
    @Angel & Josef:
    So far my strategy for validation was the following :
    i) I calculate the OP with my code from a UA trajectories.
    ii) Using my code, I generate from a UA trajectory a new xtc traj *with* hydrogens. Then using this new traj, I run Josef's script and compare the output of both programs.
    Based on this, I find very similar results (main absolute difference of 0.001). The tiny difference is due to the fact that when my code builds an H, it stores many digits for its coordinates. But when I write an xtc it's rounded to 3rd digit. If you think I should validate it some other way, please let me know.
    Concerning the STEM, I'll implement it like Josef, so comparison will be easier.
    Last, I also validated my H reconstruction against g_protonate. The RMSD of the new built Hs with my program against those from g_protonate is below 0.01 A.

    Last about making available my two codes:
    - For the script calculating OP from UA traj, I do my best to upload a first version this week. Unfortunately, I have little time since I'm attending a conference in Berlin.
    - For the other script I hope to upload something at the beginning of July.

  5. As far as I see the three codes (Josef, Patrick and mine) provide exactly the same OP values once the Hydrogens are rebuilt. There is a difference in the STEM values (my calculation is a bit more conservative than yours) but this can be corrected easily, if necessary. At the moment I do not see a reason to change my calculation. The only possible source of errors that I can imagine is in the rebuilding of Hydrogens. This is more tricky than the determination of OP. Since my code can ignore explicit H atoms (using the "-a ignH" option) I checked this for AA trajectories with explicit and rebuild Hydrogens. The results were not identical (as expected) but very similar. As explained before, I validated my rebuilding of H atoms by comparing the explicit Hydrogens in different AA trajectories with those rebuilt by using it.

    @Patrick, in case of doubts perhaps it is useful to compare the output of your code for UA with that coming from my code... although I understand that your code has already been well validated and it is robust. In the end I guess your code will be the best one since it will not need any input other than the trajectory.

    1. I am looking now the figure 6 from the recent paper by Tom Piggot (http://dx.doi.org/10.1021/acs.jctc.7b00643). If I understand it correctly, they make a similar comparison as you have done (comparison between regenerated and original hydrogens in all atom trajectory). In their results, NMRlipids protonate approach does not give exactly the same values as the original trajectory. Could you comment or send a figure about the results from your code, are they closer to originals than NMRlipids protonate in the figure by Piggot et al.? I have not read the paper very carefully, but maybe Tom could have some insight why the results from different protonation methods are slightly different?

      It may be that we just have to put larger error bars to the united atom results which include the results from different hydrogen generation methods, although my feeling is that the results closer the original all atom trajectories would be better.

  6. Hi,
    we leave here the routes of the MD trajectories we have uploaded to the respository, indicating the membrane composition of each one and the different force-fields used. All of them were carried out using a concentration of 150 mM NaCl (we are now carrying out analogous MD simulations but using just the minimum amount of ions to neutralize the systems):

    POPC-POPG 7:3









    POPG-POPE 3:1









    POPG-POPE 1:3




































  7. Sorry for taking so much time. I have done a comparison of the OP using the explicit and rebuilt H atoms for Charmm36 POPG and for slipids POPC using my code. I took two of the trajectories recently uploaded by Antonio PeĆ³n and Rebeca GarcĆ­a (see the list just above). In order to directly see the impact of the H-rebuilding I determined the OP for a single lipid, for a single frame and also the average OP over 20 ns. You can see the results here: http://smmb.usc.es/docs/OPcomparisonAAvsUA.pdf. I have also generated a pdb file of POPC with the rebuilt H atoms of the lipid head over the original explicit H atoms (http://smmb.usc.es/docs/POPC_rebuiltHs.pdb). For the rebuilt ones I used X as atomname in the pdb, to facilitate the selection and comparison with a molecular viewer (note that the the bond distance for my rebuilt H atoms is exactly 1 A, since this is irrelevant for the OP calculation). I am commenting here just the OP for the lipid heads since the results for the tails were almost indistinguishable.

    Some comments:
    As you can see in the first column of the plots (http://smmb.usc.es/docs/OPcomparisonAAvsUA.pdf) the impact of my rebuilt H atoms in the OP is significant for some positions and almost negligible for others. G1-G3 seem to be more sensitive than alpha and beta.
    If the differences are just random noise, they are expected to be diluted when averaging over many lipids. Actually this is visible in the second and third columns (average over a single frame with 500 lipid molecules and average over 20 ns for the same system).
    The results for charmm-POPG are reasonably good for all the atoms but for slipids-POPC the difference between explicit and rebuilt H atoms is significant for G3S and for G2R. The reason of the differences is not the method employed for the calculation since I used the same code. I do not know if the rest of the methods behave similarly. I will try to repeat the same analysis for other trajectories (lipids and forcefields).

  8. Hello, I have compared the head orden parameters for the different all atoms force fields (CHARMM36, SLIPID) using explicit and rebuilt H atoms. The analysis was performed over the last 100 ns of the trajectories (500 ns long with 500 lipids) with POPE, POPC and POPG bilayers in 150 mM NaCl. You can see the OP for both explicit Hs(dot) and rebuilt Hs ( line, calculated using the Ɓngel's code) giving very similar results (within the statistical uncertainties). The biggest difference is in hydrogen g2 and specifically in SLIPID. Link: https://drive.google.com/open?id=10tT0YLT4OHajEbsLNDP1XBLm71sKXtoO


Please sign in before writing your comment.