Tuesday, April 23, 2019

NMRlipids IVb: Assembling the PE & PG results

As discussed in the previous post, the NMRlipids IV project concerning PE, PG and PS lipids was divided into two parts. The first part about the results from PS lipids is approaching to the submission, while the second part about PE and PG lipids is in a more preliminary stage.

I have now started to assemble the contributed data for PE and PG lipids into a manuscript in a new GitHub repository. From now on, I will call this part of the project as NMRlipids IVb. Current status of the manuscript and most important things to todo are summarized in this post.

Headgroup and glycerol backbone order parameters from lipids with different headgroups in experiments

Experimental order parameters delivered by Tiago Ferreira with the information about the sign revealed that the š›½-carbon order parameter of PG lipids is positive, while other lipids have negative order parameter for this carbon (Fig. 1). Therefore, the structure of PG lipid seems to be distinct from PC and PE, in contrast to the conclusions from the order parameter data without sign information in the previous post.

Figure 1: Experimental order parameters of the headgroup and glycerol backbone region C-H bonds from experiments. For literature references, see the manuscript draft.


Headgroup and glycerol backbone order parameters of PE and PG lipids in simulations

The experimental information about order parameter signs enables a similar comparison between different simulation models and experiments for PE and PG lipids as done previously for PC and PS lipids in NMRlipids I and NMRlipids IV projects. This comparison (Fig. 2) and the results in NMRlipids I and NMRlipids IV projects suggest that the differences between PC, PE, PG, and PS headgroups are well captured in CHARMM36 simulations, although order parameters of all C-H bonds are not within the experimental error for any of the lipids. Therefore, the structural differences between lipid headgroups could be analyzed from the CHARMM36 simulations, perhaps in a similar way as done for the PS headgroup in the NMRlipids IV project. The contributed data for PE and PG lipids from united atom simulations is not yet analyzed, partly because of the lack of suitable programs for the automatic analysis of the data.

Figure 2: Headgroup and glycerol backbone order parameters from PE (left) and PG (right) lipid bilayers from experiments and different simulation models. For literature references, see the manuscript draft.


Cation binding to lipid bilayers containing PE and PG lipids

Sodium binding to PE lipid bilayers seems to be slightly weaker than in corresponding simulations for PC lipids, but experimental data is not available. For calcium binding to PE lipids, we do not have experimental nor simulation data yet.

As discussed in the NMRlipids IV project, the counterion binding to negatively charged lipid bilayer is complicated to assess against experimental data. However, the results from PC:PG mixtures from CHARMM36 and Slipids simulations indicate that sodium ions could be slightly overbinding to PG lipids, similarly to PS lipids in the NMRlipids IV project. For calcium binding to PG containing lipid bilayers, we have data only from CHARMM36 simulations and the conclusions are not yet fully clear.



  1. Analyze united atom simulations of PE lipids (GROMOS, CHARMM36ua, OPLS-UA, Berger, and GROMOS-CKP) listed in Table I in the manuscript with or without the new code for united atom data.
  2. Analyze structural differences between POPC, POPE, POPG and POPS headgroups from CHARMM36 simulations. The most reasonable first step would be probably to follow a similar analysis that was done for PS lipids in the NMRlipids IV project.
  3. Simulation data for PG lipids with different force fields would be useful. Currently, we have only CHARMM36 and Slipids.
  4. Simulation data from POPC:POPG (1:1) mixtures with sodium counterions and different CaCl2 concentrations (0-1M) from different force fields would be useful. Currently, we have results only from CHARMM36.


  1. I have just uploaded a python code to determine the OP of UA/AA force fields to the scratch folder. It is also possible to ignore the explicit H atoms in AA forcefields by using the option "-a ignH". The minimum input is a pdb file and the itp file of the molecule for which you want to get the OP (your lipid molecule). Of course, you can also specify a xtc trajectory in case you want the average OP over it with the standard deviations. The script does not require any specific input file with the list of C atoms but reads them directly from the itp file.

    1. As discussed in the blog in september / october 2018 (http://nmrlipids.blogspot.com/2017/03/nmrlipids-iv-headgroup-glycerol.html?showComment=1537530206534#c5570719429573890331), and also during the NMRlipids 2019 workshop, I and some colleagues have been also working on a code for rebuilding H and calculate the OP from a UA trajectory. For information, we are close to releasing a first version.
      During the workshop, we also discussed some possible ways on how to integrate it with the existing one for AA trajectories initially written by @jmelcr (https://github.com/NMRLipids/MATCH/blob/master/scripts/calcOrderParameters.py).
      I guess your code, the one from @jmelcr and ours should be complementary.

    2. Hi Patrick. Thanks for your comment! I do agree it is good to have independent-complementary codes for the same aim. This will facilitate the detection of any bug. I wrote this because we needed it to analyze our trajectories... then we decided to share it here. We did not know you were working on this. Just to complete the information I wrote yesterday, in case you want to consider it: I used the code written by @jmelcr as a reference to validate my script. That is why I implemented the "-a ignH" option (ignore explicit H atoms and get the OP from rebuilt H's even for AA forcefields). There is a line commented in my code to write the new H's in pdb format, in case you can use them for anything. The code I uploaded yesterday also adds the H atoms in the terminal Carbons, in contrast with other codes for UA forcefields, and provides the OP for them. I also tried to make it efficient in terms of memory, processing and execution. It does not require specific input files but reads the structure from the itp file. The format of the output is also very similar to that of the @jmelcr script. I just have a slightly more conservative way of determining the standard deviation of the mean (you can see it in the code).

    3. Hi Angel, thanks for your reply. Indeed I understand that you developed it on your side if you needed it! But we were not aware of this neither. Anyway, the idea of sharing those codes will make easier to get the best out of them. And also, now, we all know that NMRlipids is the good place for uploading / asking / downloading some tools for analyzing lipid simulations. We (almost) all have the same needs (such as for example centering properly a bilayer ;-) ), so it is useful to share our scripts.
      About the strategy for rebuilding H in our code, we have to provide an input file describing the carbons on which we want to build H. As this task may be tedious, I'm also developing another script which automatically creates such a file based on graphs and the known connectivity of a given lipid.
      I have a few things to finish and some checks to do, but I do my best to upload something soon to NMRlipids github repo.

    4. Dear Angel,

      thanks a lot for the code! I have couple of questions and comments below.

      1) I used the code to calculate the order parameters of the Berger POPC simulation (databank: https://github.com/NMRLipids/MATCH/tree/master/Data/Lipid_Bilayers/POPC/T298K/MODEL_Berger,
      original data: http://dx.doi.org/10.5281/zenodo.13279)
      The results given by the code are here:
      They are close to the ones using gromacs protonate and awk scripts in the databank folder (link above),
      but not exactly the same. For example, for the headgroup beta carbon order parameters the results in databank are
      0.0372539 0.0708691,
      while the new code gives
      Atom_name Hydrogen OP STD STDmean
      C5 HR 0.05052 0.46963 0.00059
      HS 0.08555 0.47931 0.00060
      Do you have some idea where the difference could originate?
      Patrick, have you analyzed this dataset (links above) with your code?

      2) Could you specify what do you mean with the more conservative way of determining the standard deviation? In NMRlipids we use the standard deviation over different lipids, not block averaging or something like that. After a quick look into the code, I was not sure if you referred to this or the programming style.

      3) In the instructions there was a "-r" flag, which probably referred to residue. However, this was not
      in the code and I just ran it without this flag. Probably, this has been removed from the code, but is still in the instructions?

      4) Reading information from itp is very useful! However, we have quite a lot of data in the databank without itp. For these, reading from tpr would be good, but that might be tedious to program. Patrick, how this information is read by your code?

    5. Hi Patrick
      it would be nice to do the analysis without any input (carbon list or itp file). I am sure this would help. Good improvement!

    6. Hi Samuli
      I will try to answer your questions:

      1) About the differences between both scripts I have no idea. We validated our script in three different ways: (i) by checking that the added Hydrogens are correct (the line to represent the H atoms in pdb format is commented in the script) (this is qualitative); (ii) comparing the OP for all-atoms force fields by considering the explicit atoms and by rebuilding them (semiquantitative validation) and (iii) comparing our results to those coming from the script of @jmelcr (quantitative analysis). All the validations were correct for different force fields and lipids. We got exactly the same results than those from @jmelcr for explicit all-atom force fields. We did not detect any fail in the rebuilding of the Hydrogens.

      2) On the more conservative way to get the standard deviation of the mean: we get the standard deviation of the average of all the lipids and frames together while the script of @jmelcr does the calculation over lipids in a single frame and then over frames (as far as I remember). Our standard deviations are a bit larger than those of @jmelcr except when you have a single frame. In that case the results are identical.

      3) you are right. Initially we put that flag for the residues but then we decided to use directly the itp file. You can just ignore it.

      4) I guess Patrick will not need the itp since the bonds can be directly read from the structure, as Patrick suggests. I used the itp since I assumed that they are always available. Actually I think they should be available. At some point they had to be available since the itp file was required to generate the tpr. I could modify my code to generate the Hydrogens without the itp file but since Patrick is already working on that I guess it is worth to wait...

    7. Ad 2) In the earlier version of the calcOrderParameter.py script, we have also used statistics from all lipids and trajectory frames. This is now done only for the mean value of the order parameter as shown here
      https://github.com/jmelcr/MATC/blob/a448e2d11a6a526ba11430e50e4df2dfd2d48c2f/scripts/calcOrderParameters.py#L159 .

      For the error estimate, however, we have decided to use the standard error of the mean (denoted as STEM in the code). We first calculate standard error (STD) for each lipid separately from the whole trajectory and after that we calculate STEM from this ensemble of measurements. So, first trajectory averaging, then averaging over lipids.

      Since the reconstruction of the hydrogen atoms is a crucial step in the accuracy of the whole script, this needs to be rather well documented. It would also make it easier to point to any possible sources of differences between different codes.

  2. Thank you for the comment Josef. We used your script in order to validate ours and the results are identical for the OP values of explicit H atoms. The only difference is in the determination of the STEM values. As you say, you do the calculation over lipids and then over frames while we get the STEM of all lipids over all frames together, this is why I said that our calculation of the STEM is a bit more conservative, our STEM values are a bit larger than yours. For a single frame our results are identical for OP, STD and STEM.

    Regarding the OP of non-explicit H atoms, we had not compared our results to those obtained from the awk scripts mentioned by Samuli. As I explained above, we validated the reconstruction of H atoms and, once we have the coordinates, we validated at quantitative level the determination of OP for explicit atoms using your script.

  3. Hello all, sorry for my late answer, I have little time now. I'll try to give a first answer to the different questions. Since my message is too big, I cut it in two.

    (Message 1/2)
    1) OP of the Berger POPC simulation: I was unsure what to analyze since there are 2 trr files. I have tried with the first one (0-25 ns). After a quick look (no real quantification), it seems that I have generally differences on the 3rd digit (more seldom to the 2nd digit). For the two beta op my script finds 0.03634 and 0.06587. Could you please confirm on what time window I should analyze? Then I can make a more quantitative estimate of the difference.

    4) My program needs a file (for now let's call it dic_lipids.py) containing this:
    POPC = { "atom_1": ("typeofH2build", "helper1", "helper2"),
    "atom_2": ("typeofH2build", "helper1", "helper2"),
    "atom_n": ("typeofH2build", "helper1", "helper2", "helper3"),
    Where "typeofH2build" is in ("CH3", "CH2", "CH", "CHdoublebond"), "helper1" to "helper3" are the names of helper atoms that are needed to build the H(s). For example on a saturated chain, if we have C10, C11 and C12 that follow each other, and if C10 is a CH2, we would have: "C11": ("CH2", "C10", "C12"). I also use (for now) the same type of file as the one needed by Josef's script (found in https://github.com/NMRLipids/MATCH/blob/master/scripts/orderParm_defs/) so that I can have the same output as Josef and it makes comparison easy. This is a tedious task to write such files, and we have to do it for each type of lipid *and* force field. However, as I told above, I'm developing another script that should be able to build automatically dic_lipids.py for any force field.

    The script is not ready yet, but I can explain the idea. First I build a graph of one lipid molecule with the nodes having a name you use in your mapping files (e.g. M_G1_M, M_G1H1_M, etc), say for example a POPC. Then I use a pdb with a single POPC (structure chosen from a given force field, with PDB atom names as indicated in itp or psf), I build another graph with the nodes having the name of each atom found in that pdb. Then I match the two graphs and get the correspondance between the generic names and the names in the pdb (as a python dictionnary).

    So what we need in the end is only a valid pdb (or gro file) with one lipid. With that script we should be able to build any of those files:
    - generic to pdb atom names (such as https://github.com/NMRLipids/MATCH/tree/master/MAPPING)
    - the one needed by my program (dic_lipid.py)
    - the one needed by the script of Josef (https://github.com/NMRLipids/MATCH/tree/master/scripts/orderParm_defs)
    One could argue that at the beginning, it'll be needed to build a file with the connectivity of a lipid using generic names. However, we have to do it only once for a given lipid (e.g. POPC), and building a new lipid will be quite trivial if the structure is not so different (e.g. building POPS or POPE from POPC). In the end, if we already know the structure (connectivity) of a lipid, e.g. POPC, we will only need a PDB file (no itp, tpr, psf, etc., needed).

    1. Thanks Patrick. In my analysis, I have combined the trajectories and analyzed 0-50ns (gmx trjcat -f popc0-25ns.trr popc25-50ns.trr in README in https://github.com/NMRLipids/MATCH/tree/master/Data/Lipid_Bilayers/POPC/T298K/MODEL_Berger).

      You results seems to be quite close to protonate/awk combination as expected from what you describe, but it might be nice to check by analyzin the same frames.

    2. Hi Samuli, thanks for the precisions. So I ran my code on the full trajectory, and the mean absolute difference between the OP calculated with my script and the awk one is 0.002. It might be even less because it is possible that some hydrogens were mixed (in the database, for the case we have 2 or 3 hydrogens, I didn't check whether my code outputs these hydrogens in the same order, that's why it's useful to have the name of each H as in Josef's script). I can put the results here if you want, let me know. I do my best to make my code available as soon as I can.

  4. (Message 2/2)
    @Angel & Josef:
    So far my strategy for validation was the following :
    i) I calculate the OP with my code from a UA trajectories.
    ii) Using my code, I generate from a UA trajectory a new xtc traj *with* hydrogens. Then using this new traj, I run Josef's script and compare the output of both programs.
    Based on this, I find very similar results (main absolute difference of 0.001). The tiny difference is due to the fact that when my code builds an H, it stores many digits for its coordinates. But when I write an xtc it's rounded to 3rd digit. If you think I should validate it some other way, please let me know.
    Concerning the STEM, I'll implement it like Josef, so comparison will be easier.
    Last, I also validated my H reconstruction against g_protonate. The RMSD of the new built Hs with my program against those from g_protonate is below 0.01 A.

    Last about making available my two codes:
    - For the script calculating OP from UA traj, I do my best to upload a first version this week. Unfortunately, I have little time since I'm attending a conference in Berlin.
    - For the other script I hope to upload something at the beginning of July.

  5. As far as I see the three codes (Josef, Patrick and mine) provide exactly the same OP values once the Hydrogens are rebuilt. There is a difference in the STEM values (my calculation is a bit more conservative than yours) but this can be corrected easily, if necessary. At the moment I do not see a reason to change my calculation. The only possible source of errors that I can imagine is in the rebuilding of Hydrogens. This is more tricky than the determination of OP. Since my code can ignore explicit H atoms (using the "-a ignH" option) I checked this for AA trajectories with explicit and rebuild Hydrogens. The results were not identical (as expected) but very similar. As explained before, I validated my rebuilding of H atoms by comparing the explicit Hydrogens in different AA trajectories with those rebuilt by using it.

    @Patrick, in case of doubts perhaps it is useful to compare the output of your code for UA with that coming from my code... although I understand that your code has already been well validated and it is robust. In the end I guess your code will be the best one since it will not need any input other than the trajectory.

    1. I am looking now the figure 6 from the recent paper by Tom Piggot (http://dx.doi.org/10.1021/acs.jctc.7b00643). If I understand it correctly, they make a similar comparison as you have done (comparison between regenerated and original hydrogens in all atom trajectory). In their results, NMRlipids protonate approach does not give exactly the same values as the original trajectory. Could you comment or send a figure about the results from your code, are they closer to originals than NMRlipids protonate in the figure by Piggot et al.? I have not read the paper very carefully, but maybe Tom could have some insight why the results from different protonation methods are slightly different?

      It may be that we just have to put larger error bars to the united atom results which include the results from different hydrogen generation methods, although my feeling is that the results closer the original all atom trajectories would be better.

    2. Sorry for the slow reply, I've just seen this now.

      In the paper you mention, I go into all this in some detail. The main gist of it is that if you are trying to reproduce all-atom results using a united-atom methodology it can be very difficult. This is because in the all-atom simulations the explicit hydrogen atoms may not just be in some idealised geometry (as the united-atom analysis methods assume in two different main ways) but deviate from these due things like the bonded force field parameters. For CH2 groups in lipid tails, this deviation from an ideal tetrahedral geometry doesn't tend to be very large and so the united-atom analysis methods are in general very good. However, for the CH groups in the unsaturated double-bonds, this is a different case. Here the positions of the explicit hydrogen atoms in the all-atom force fields isn't where either of the two different united-atom methods predict.



    3. As to different protonation methods, I didn't really see differences when looking at the tails (well, if they were doing what they claimed to and not something incorrect).

      The main differences arose in the double-bond due to the use of two fairly different approaches. One approach (e.g. as employed by protonate and g_lomepro) measures the explicit C-C=C angles and places the hydrogen atom bisecting this. The other method (e.g. as used by the fixed version of g_order) assumes the C-C=C angle is 120 degrees and builds this into the order parameter calculation. It turns out that neither of these place the hydrogen atoms where they are in the all-atom simulations.

  6. Hi,
    we leave here the routes of the MD trajectories we have uploaded to the respository, indicating the membrane composition of each one and the different force-fields used. All of them were carried out using a concentration of 150 mM NaCl (we are now carrying out analogous MD simulations but using just the minimum amount of ions to neutralize the systems):

    POPC-POPG 7:3









    POPG-POPE 3:1









    POPG-POPE 1:3




































    1. Here, I update the paths for the different dynamics both for the 150 mM NaCl and the new ones without ionic concentration:

      150 mM NaCl
      • LIPID17 https://zenodo.org/record/2577305#.XR9fUyZ7nes
      • GROMOS https://zenodo.org/record/2574491#.XR9fUiZ7nes
      • CHARMM36 https://zenodo.org/record/2577454#.XR9fOiZ7nes
      • SLIPIDS https://zenodo.org/record/2578069#.XR9fUCZ7nes
      0 nM NaCl
      • LIPID17 https://zenodo.org/record/3237701#.XR9azCZ7nes
      • GROMOS https://zenodo.org/record/3237754#.XR9axSZ7nes
      • CHARMM36 https://zenodo.org/record/3237461#.XR9fNyZ7nes
      • SLIPIDS https://zenodo.org/record/3231342#.XR9atiZ7nes
      150 mM NaCl
      • LIPID17 https://zenodo.org/record/2573905#.XR9fVyZ7nes
      • GROMOS https://zenodo.org/record/3257649#.XR9fOSZ7nes
      • CHARMM36 https://zenodo.org/record/2573531#.XR9fViZ7nes
      • SLIPIDS https://zenodo.org/record/3269320#.XR9atSZ7nes
      0 nM NaCl
      • LIPID17 https://zenodo.org/record/3247659#.XR9aziZ7nes
      • GROMOS https://zenodo.org/record/3266166#.XR9awiZ7nes
      • CHARMM36 https://zenodo.org/record/3237463#.XR9fNiZ7nes
      • SLIPIDS https://zenodo.org/record/3364460#.XV_B__x7nes
      150 mM NaCl
      • LIPID17 https://zenodo.org/record/2574959#.XR9fOyZ7nes
      • GROMOS https://zenodo.org/record/2574691#.XR9fVSZ7nes
      • CHARMM36 https://zenodo.org/record/2628335#.XR9fPSZ7nes
      • SLIPIDS https://zenodo.org/record/2574689#.XR9fUSZ7nes
      0 nM NaCl
      • LIPID17 https://zenodo.org/record/3237657#.XR9azCZ7nes
      • GROMOS https://zenodo.org/record/3247435#.XR9awyZ7nes
      • CHARMM36 https://zenodo.org/record/3247813#.XR9fNSZ7nes
      • SLIPIDS https://zenodo.org/record/3235552#.XR9atyZ7nes
      POPG-POPE 1:3
      150 mM NaCl
      • LIPID17 https://zenodo.org/record/2579061#.XR9fTyZ7nes
      • GROMOS https://zenodo.org/record/2579063#.XR9fSyZ7nes
      • CHARMM36 https://zenodo.org/record/2579108#.XR9fSiZ7nes
      • SLIPIDS https://zenodo.org/record/2579224#.XR9fSCZ7nes
      0 nM NaCl
      • LIPID17 https://zenodo.org/record/3241269#.XR9ayCZ7nes
      • GROMOS https://zenodo.org/record/3266238#.XR9awCZ7nes
      • CHARMM36 https://zenodo.org/record/3249960#.XR9fNCZ7nes
      • SLIPIDS https://zenodo.org/record/3238258#.XR9auCZ7nes
      POPG-POPE 3:1
      150 mM NaCl
      • LIPID17 https://zenodo.org/record/2579344#.XR9fRyZ7nes
      • GROMOS https://zenodo.org/record/2580158#.XR9fQyZ7nes
      • CHARMM36 https://zenodo.org/record/2580153#.XR9fRCZ7nes
      • SLIPIDS https://zenodo.org/record/2580153#.XR9fRCZ7nes
      0 nM NaCl
      • LIPID17 https://zenodo.org/record/3256490#.XR9aySZ7nes
      • GROMOS https://zenodo.org/record/3267234#.XR9avSZ7nes
      • CHARMM36 https://zenodo.org/record/3250127#.XR9atCZ7nes https://zenodo.org/record/3258112#.XR9azyZ7nes
      • SLIPIDS https://zenodo.org/record/3240403#.XR9auyZ7nes
      POPC-POPG 7:3
      150 mM NaCl
      • LIPID17 https://zenodo.org/record/2585523#.XR9fQSZ7nes
      • GROMOS https://zenodo.org/record/2582721#.XR9fPiZ7nes
      • CHARMM36 https://zenodo.org/record/2580902#.XR9fQCZ7nes
      • SLIPIDS https://zenodo.org/record/2581186#.XR9fPyZ7nes
      0 nM NaCl
      • LIPID17 https://zenodo.org/record/3241243#.XR9ayiZ7nes
      • GROMOS https://zenodo.org/record/3266240#.XR9aviZ7nes
      • CHARMM36 https://zenodo.org/record/3248689#.XR9fMyZ7nes
      • SLIPIDS https://zenodo.org/record/3240156#.XR9auSZ7nes

  7. Sorry for taking so much time. I have done a comparison of the OP using the explicit and rebuilt H atoms for Charmm36 POPG and for slipids POPC using my code. I took two of the trajectories recently uploaded by Antonio PeĆ³n and Rebeca GarcĆ­a (see the list just above). In order to directly see the impact of the H-rebuilding I determined the OP for a single lipid, for a single frame and also the average OP over 20 ns. You can see the results here: http://smmb.usc.es/docs/OPcomparisonAAvsUA.pdf. I have also generated a pdb file of POPC with the rebuilt H atoms of the lipid head over the original explicit H atoms (http://smmb.usc.es/docs/POPC_rebuiltHs.pdb). For the rebuilt ones I used X as atomname in the pdb, to facilitate the selection and comparison with a molecular viewer (note that the the bond distance for my rebuilt H atoms is exactly 1 A, since this is irrelevant for the OP calculation). I am commenting here just the OP for the lipid heads since the results for the tails were almost indistinguishable.

    Some comments:
    As you can see in the first column of the plots (http://smmb.usc.es/docs/OPcomparisonAAvsUA.pdf) the impact of my rebuilt H atoms in the OP is significant for some positions and almost negligible for others. G1-G3 seem to be more sensitive than alpha and beta.
    If the differences are just random noise, they are expected to be diluted when averaging over many lipids. Actually this is visible in the second and third columns (average over a single frame with 500 lipid molecules and average over 20 ns for the same system).
    The results for charmm-POPG are reasonably good for all the atoms but for slipids-POPC the difference between explicit and rebuilt H atoms is significant for G3S and for G2R. The reason of the differences is not the method employed for the calculation since I used the same code. I do not know if the rest of the methods behave similarly. I will try to repeat the same analysis for other trajectories (lipids and forcefields).

    1. "I am commenting here just the OP for the lipid heads since the results for the tails were almost indistinguishable."

      Let me know if I understood this correctly. Do you mean with this that the order parameters from regenerated and all-atom hydrogens are essentially identical for tails, but not for glycerol backbone and headgroup?

    2. Hi Samuli. Yes, I meant that... but we should probably do the same comparison we did with the heads. For the tails we just did a couple of plots.

  8. Hello, I have compared the head orden parameters for the different all atoms force fields (CHARMM36, SLIPID) using explicit and rebuilt H atoms. The analysis was performed over the last 100 ns of the trajectories (500 ns long with 500 lipids) with POPE, POPC and POPG bilayers in 150 mM NaCl. You can see the OP for both explicit Hs(dot) and rebuilt Hs ( line, calculated using the Ɓngel's code) giving very similar results (within the statistical uncertainties). The biggest difference is in hydrogen g2 and specifically in SLIPID. Link: https://drive.google.com/open?id=10tT0YLT4OHajEbsLNDP1XBLm71sKXtoO

    1. Thanks Antonio and Angel for the data. I think that we should do similar analysis with the code by Fuchs et al. once it is available, and then decide how we calculate the united atom order parameters for the publication and determine the error bars. The numerical data from your plots would be useful for this.

  9. Hello all, I'm very sorry for my late answer but it took me longer than expected. Especially because I wanted to make a thorough comparison / validation of my program for rebuilding hydrogens / calculating OPs vs the programs from Josef and Angel. And also because I wanted to add docstrings and an understandable documentation.

    So my program is called "buildH" and the repository is on github: https://github.com/patrickfuchs/buildH. I didn't upload any script on nmrlipids github repo for now since more than one file is required, I thought it would be easier like that. If you use it, please do some regular "git pull" since we will work on it on august / september. There are still many little things to do, but the core of the program is working.
    Importantly, I made a thorough validation of buildH which is also on github (https://github.com/patrickfuchs/buildH/tree/master/CHARMM36_POPC_validation, see report_buildH.ipynb or report_buildH.pdf). For easier comparison, buildH outputs the OPs in the format of the program from Josef *and* that from Angel. For the validation, I used the strategy like in the paper from Thomas Piggot: starting from an all-atom (POPC CHARMM36) traj, I remove the Hs, reconstruct them and calculate the OPs and compare them to the initial traj with Hs (OPs calculated with Josef's prog). You will see in the report that both Angel program and buildH agree very well (except for a few details). I also put there some links on output files with numbers in case they are useful. Don't hesitate if you have questions / comments.

    For the other script building automatically the required files for buildH, I didn't have time to complete it.

    In the near future, we plan to add some improvements to buildH (such as multithreading, unit tests, automatic topology detection using the other script, etc).

    1. Thanks a lot Patrick!

      Based on your report it seems to me that there is no essential difference between results from different codes used for united atom reconstruction, but there are differences in hydrogen locations with atomistic models.

      I think that the solution in NMRlipids IVb is to use slightly larger error bars for order parameters calculated from united atom simulations.

      I will try your code and start the analysis of the available data asap.

    2. I have now analysed the united atom simulations for NMRlipids IVb using buildH and added the results into the manuscript:

      I also forked the code into the NMRlipids organization (https://github.com/NMRLipids/buildH) and committed the files that I needed for the analysis. I did not make a pull request because I only did the modifications required for the NMRlipids IVb project, and I was not sure if you want to have such incomplete files in the main version. If you want, you can of course pull it yourself.

      Thanks again for the code, this is really useful for the project! Also the automatic topology detection sounds highly interesting. How close it would be to get that included?

    3. Hi Samuli, I'm glad that buildH is useful! I will have a look to the tiny changes you made and will update the files. In the coming months we plan to i) use threads, ii) add tests and iii) release it on PyPI (so that it's installable via pip).
      For the automatic topology detection, I have now a first draft code working for AA or UA POPC. The code reads a graph of POPC (that I built by hand) using generic atom names. Then it reads a single POPC molecule (in pdb format) and builds a new graph based on atom connectivities (calculated with distances). By using an isomorphism between the two graphs the code automatically generates a dictionnary like {'generic_name_atom1': 'pdb_name_atom1', ...}. So now, it'll be trivial to write all the files needed by our programs (such as dic_lipids.py) from that dictionnary.
      So I'm trying to clean up this draft and add a little documentation ASAP. Unfortunately, I have very little time now since my teaching duties just restarted. But I do my best to push something on github (hopefully) early october.

  10. Hi,
    In my opinion, it is noticeable the difference found in the order parameters calculated with the script from Ɓngel PiƱeiro for some of the hydrogens in carbons g2 and g3 in CHARMM and SLIPIDs (notice the graphics uploaded by Antonio PeĆ³n: https://drive.google.com/open?id=10tT0YLT4OHajEbsLNDP1XBLm71sKXtoO). It does not seem to be a problem with the reconstruction of implicit hydrogens done with the script, since the script’s reconstruction matches much better with the explicit hydrogens found in other force-fields (CHARMM, in particular). I think it is a more general problem that has the origin in the different torsion angles considered for the angles involving g2 and g3 in both force-fields (CHARMM and SLIPIDS). This difference was also observed in your previous manuscript (Figure 2 in https://pubs.acs.org/doi/pdf/10.1021/acs.jpcb.5b04878?rand=p3onqvgq). There you can notice that again Slipids is the force-field fitting worse with experimental order parameters calculated for DPPC and POPC for these carbons.
    Curiously, if you go to the original paper of Slipids (https://pubs.acs.org/doi/pdf/10.1021/jp212503e?rand=mo3qdgjs), they say they take the parameters for all covalent bonds and angles, LJ and torsional parameters for the lipid head group from the original CHARMM36 (C36) FF (https://pubs.acs.org/doi/pdf/10.1021/jp101759q?rand=19u4avp3), BUT they also say they derived new parameters “because of the known difficulties of reproducing the vdW dispersion interaction by ab initio methods, experimental heats of vaporization and densities were used during the fitting of the LJ parameters. First, the new charges were used with the original parameters from C36 (LJ and torsional) and the LJ parameters were then altered to agree with experiments experimental results. After this, the torsional potentials were fitted from ab initio computations for the model compound. After one round in the parametrization scheme, it was necessary to refit the LJ parameters again and the torsional potentials until self-consistency was obtained.” So, it seems that people from Slipids “manually” changed some torsion parameters, affecting the order parameters for atoms g2 and g3, as our results (and previous results) are showing.
    As a conclusion I would say that inconsistences between OP coming from explicit and rebuilt H atoms are due to the mis-geometrical-optimization of the explicit H atoms in the target forcefield.

    1. I think that this a good point. The glycerol backbone dihedrals are fitted also in CHARMM36 parametrization to optimize the order parameters, but in my understanding they do not fit dihedrals involving hydrogens. If hydrogens are not in ideal conformation in all atom simulations, the question is then if the ideal or twisted geometry of hydrogens is more realistic?

  11. Running a similar orden parameters analysis for tails as with heads for the different all atoms force fields (CHARMM36, SLIPID) using explicit and rebuilt H atoms (implicit, Ɓngel’s script). For both hydrogens in a tail the analyses are very similar to the implicit hydrogens.

  12. Dear Samuli, Patrick, Josef and any other interested or involved in the calculation of order parameters. I would like to know your opinion, and to reach an agreement, on a couple of points:

    It seems that there are some differences in the order parameter values depending on the calculation method and on the force field. Rebeca found that the force fields for which the OP determined using the explicit and the rebuilt H atoms (based on optimized geometry) are significantly different, match with those for which the theoretical OP differ more from the experimental values. The comparison between OP values obtained from explicit and rebuilt H atoms is useful for the validation of the scripts that determine OP values but I think we should accept the relatively small differences observed between both methods since they seem to come from the force field more than from errors in the scripts, do you agree?

    On the uncertainties for the OP values obtained for a trajectory: in my scriptI take the OP values for a given C atom over frames and lipids all together, then the average value, the standard deviation of the sample (STD), and finally the standard deviation of the mean (STD/sqrt(len(OP)) are determined. As far as I understand, the uncertainty of the final OP value with a confidence level of ~68% should be directly 2*STDmean. This value is typically very small (ridiculously small) for a reasonable trajectory (several thousand frames and hundreds of lipids). In contrast, the STD of the sample, determined as I did, is quite large. The other two scripts do the calculation in a different way: the average of the OP for each C-atom for all the lipids within each frame is first determined and then the average over these average values over all the frames and the corresponding STD are provided. A much smaller value is obtained for the standard deviation of the sample in this way but I do not know how this is justified. I guess I am missing something basic…

    1. I agree with the first point.

      When calculating the standard deviation of the mean, the samples should be independent. However, successive frames in trajectory are probably not independent and block averaging methods should be used. In NMRlipids, we have instead assumed that each lipid samples its conformations independently from other lipids, which is the case in fluid phase by definition. Therefore, we first calculated the order parameters for individual lipids averaging over time, and then calculated the standard deviation of the mean over independent lipids. The STD is smaller in this case, because individual lipid values are already averaged over time. However, the error is similar as in the other error calculation method because the number of independent samples is smaller in the latter case. Different kinds of arguments can be made to justify different kind of statistical error calculation methods, but all approaches seem to give similar and very small errors from simulations.

      As demonstrated in the discussion here, uncertainty from other sources can be larger than the statistical error. Especially for the results from united atom simulations, I think that the statistical error underestimates the accuracy of the calculation.

    2. Thank you very much for your answer and explanation. I do agree with your last comment, so for the representation of OP values obtained from reasonable trajectories (bilayers with several hundred lipids over several tens of ns) I guess we can forget the statistical error, since it is negligible. Other way of obtaining uncertainties could be applying different methods to rebuild the Hydrogens... as I stated in my previous post, different reference atoms can be employed to get the position of the same hydrogen. I guess this explains some small differences between calculations performed with different codes.

    3. Hi all, here are my (late) comments on the two questions.

      1) About the difference between reconstructed Hs and all-atom (AA) Hs. To me there are two separate things:
      - The fact that we cannot recalculate the exact same position of some Hs (oleoyl double bond, glycerol Hs, in general Hs close to a polar atom) because there are other subtleties in the FF (bonded parameters according to Thomas, see his comment of october 14) that we cannot catch with simple geometrical rules. Let's call it geometric error.
      - The fact that for an AA force field, the "lipid backbone" (I mean the succession of heavy atoms on which the Hs are attached) will be incorrectly oriented on average because of the FF. Let's call it FF error.
      Sure we have to accept a difference and indeed some AA models will be away from experimental values. But I'd like to discuss with you up to what value this difference is acceptable. From what I could see from buildH / CHARMM36 comparison on a full traj, the max absolute difference is slightly less than 0.02 (on one H of glycerol), and on average I found an error of 0.0034 on all Hs that are not CH3. To me, it is acceptable since it is on the order of the experimental uncertainty on the worst case, and almost always it is below. Does everyone agree?
      If we place ourself in the context of UA simulations, we want to evaluate the FF error but we will always have a systematic geometric error. So that it is important to evaluate the latter accurately (see below).

      2) I agree about the statistical error. In our simulations of hundred(s) of ns, each lipid samples quite well its conformational space. So with the length (> 100 ns) & nb of lipids (>= 100) we use, the error is very small (most of the time <= 0.001). I think we can forget about it for simple lipid compositions like those we studied so far.
      But, as suggested by Angel, it would be important to provide an error which reports on the accuracy of H reconstruction (the one I called reconstruction error). The only way is to compare H reconstruction to existing AA trajectories. Angel proposes to try using different reconstruction methods (different codes). I think it's indeed a good idea to make a thorough analysis. One easy way could be:
      - to agree on a set of AA trajectories (also I never tested two different trajectories for a given FF, maybe that'd be worth a try?) ;
      - then use buildH, Angel's script, g_protonate+awk script (some others could go in) on these trajectories ;
      - report on a shared document (e.g. framacalc) the OP of the AA traj and for the different methods of H reconstruction (one for each column), one sheet per traj.
      We'll likely find differences between Hs (e.g. higher error for glycerol Hs than for aliphatic CH2). So it may be doable to come up with different errors according to each H.
      What do you think?

    4. Hi -
      do you think that using kernel density estimation (KDE) could be potentially useful to capture the uncertainty in the hydrogen direction?

      If yes, I would use the uncertainty of the hydrogen reconstruction as the width of the kernel function (usually gaussian) used to estimate the distribution of orientations (i.e. also mean and standard deviation).

    5. Hi Josef,
      sorry I didn't notice your comment! I don't know about this kernel density estimation. I'll have a look into it.


Please sign in before writing your comment.