Wednesday, March 25, 2015

Mapping scheme for lipid atom names for universal analysis scripts

During project we have generated unique collection of simulation data in the Zenodo community which can be openly used for different kind analyses. Analysis of this data is straighforward for expert, however generating scripts may be tedious since the atom naming convention is different between different models. To ease this I have generated a atom name mapping convention which should allow the usage of the same analysis script for different force fields with minimum effort. This has been done bearing in mind the analysis we have to do for the current activities in this blog, but also the wider usage beyond this project. In this post I describe the idea of the mapping and show some examples where I have used it. I hope that other people try to use this as well and tell if there are suggestions for improvement.

Mapping:
The mapping consist of a file which has the universal atom names in the first column and force field specific atom names in the second column. Here is example from the beginning of the mapping file for the CHARMM36 force field:


Universal name                   force field name
M_G1_M                                     C3
M_G1H1_M                               HX
M_G1H2_M                               HY
M_G1O1_M                               O31
M_G1C2_M                               C31
M_G1C2O1_M                             O32
.
.
.


Universal atomnames always starts with "M_" flag and ends with "_M" flag to ensure that the scripts using grep command never confuses between them and force field specific names. In the actual naming convention between the flags, the first two characters define in which glycerol backbone chain the atoms attached (G1, G2 or G3). Then the naming goes like this:

M_G1_M               ;glycerol backbone C1
M_G1H1_M         ;hydrogen attached to glycerol backbone C1
M_G1H2_M         ;hydrogen attached to glycerol backbone C1
M_G1O1_M         ;first atom (oxygen) in sn-1 chain
M_G1C2_M         ;second atom (carbon) in sn-1 chain
M_G1C2O1_M    ;oxygen attached to second atom (carbon) in sn-1 chain
M_G1C3_M         ;third atom (carbon)  in sn-1 chain
M_G1C3H1_M    ;first hydrogen attached to third atom (carbon)  in sn-1 chain
M_G1C3H2_M    ;second hydrogen attached to third atom (carbon)  in sn-1 chain
.
.
.

So the third character tells the atom type and fourth character tells the counting number from the glycerol backbone carbon. If there are hydrogens or other atoms attached to the main chain, those will be added to the end of the naming, for example, in these cases:

M_G1C2O1_M    ;oxygen attached to second atom (carbon) in sn-1 chain
M_G1C3H1_M    ;first hydrogen attached to third atom (carbon)  in sn-1 chain

The G3 atoms are the headgroup atoms, thus they are little bit more complicated. Examples of complete mapping files for CHARMM36 and MacRog models can be found from GitHub.

Usage:
I have used this mapping for couple of cases now and I think that it works.
Below is example script which calculates the order parameters for acyl chains for CHARMM36 model in low hydrated conditions. The main point is that only changes required to use the script for different systems are the file names for simulation files, output files, mapping file and the order parameter analysis script location. These are 7 lines after "#Define file names" comment. The gro_OP.awk and mapping files can be found from GitHub. It is tedius to generate the mapping files, however, once done the analysis of different properties becomes very straightforward.

Example script:

#!/bin/bash
#Dowload files from Zenodo
wget https://zenodo.org/record/13945/files/popcRUN4.tpr  
wget https://zenodo.org/record/13945/files/popcRUN4.trr
#Define file names
tprname=popcRUN4.tpr
trajname=popcRUN4.trr
trajgroname=analTMP.gro
sn1outname=OrderParamSN1lowHYD.dat
sn2outname=OrderParamSN2lowHYD.dat
mappingFILE=../MAPPING/mappingPOPCcharmm.txt
analFILE=../../nmrlipids.blogspot.fi/scripts/gro_OP.awk
#Make gro file which can be used to calculate the order parameters
echo System | /home/ollilas1/gromacs/gromacs465/bin/trjconv -f $trajname -s $tprname -o $trajgroname -pbc res -b 0
#This is loop over sn-1 carbon segments
for((  j = 3 ;  j <= 16;  j=j+1  ))
do
    #This greps the force field specific atom names using the mapping file
    Cname=$(grep M_G1C"$j"_M $mappingFILE | awk '{printf "%5s\n",$2}')
    H1name=$(grep M_G1C"$j"H1_M $mappingFILE | awk '{printf "%5s\n",$2}')
    H2name=$(grep M_G1C"$j"H2_M $mappingFILE | awk '{printf "%5s\n",$2}')
    #Calculate order parameters
    H1op=$(awk -v Cname="$Cname" -v Hname="$H1name" -f $analFILE $trajgroname)
    H2op=$(awk -v Cname="$Cname" -v Hname="$H2name" -f $analFILE $trajgroname)
    #Print results to file
    echo $j $H1op $H2op >> $sn1outname
done
#This is loop over sn-2 carbon segments
for((  j = 3 ;  j <= 18;  j=j+1  ))
do
    #This greps the force field specific atom names using the mapping file
    Cname=$(grep M_G2C"$j"_M $mappingFILE | awk '{printf "%5s\n",$2}')
    H1name=$(grep M_G2C"$j"H1_M $mappingFILE | awk '{printf "%5s\n",$2}')
    H2name=$(grep M_G2C"$j"H2_M $mappingFILE | awk '{printf "%5s\n",$2}')
    #Calculate order parameters
    H1op=$(awk -v Cname="$Cname" -v Hname="$H1name" -f $analFILE $trajgroname)
    H2op=$(awk -v Cname="$Cname" -v Hname="$H2name" -f $analFILE $trajgroname)
   #Print results to file
    echo $j $H1op $H2op >> $sn2outname
done



The script and output files (OrderParamSN1lowHYD.datOrderParamSN2lowHYD.dat ) can be found also from github.

The results from the above script together with the full hydration results are also shown in Fig. 1

Fig 1. Order parameters for acyl chains calculated from trajectories availabe in Zenodo collection using the mapping file and similar scripts as above. Also experimental results by Ferreira et al. for full hydration are shown. Dehydration induced ordering of acyl chains qualitatively agrees with experiments for DMPC [Dvinskikh et al., Mallikarjunaiah et al.].

I have also ran similar script to calculate acyl chain order parameters from different simulation data with different cholesterol concentrations available in the Zenodo collection. These scripts and produced results are available in GitHub (CHARMM36, MacRog). The results are summarized in Fig. 2
Fig 2. Order parameters for acyl chains calculated from some trajectories with cholesterol availabe in the Zenodo collection using the mapping file and similar scripts as above. The experimental data is from Ferreira et al. For more discussion about CHARMM36 and MacRog in GitHub.

This is quite technical issue, and I am not sure if this presentation is understandable, so please do not hesitate ask further clarifications.


Tuesday, March 17, 2015

Towards first submission to journal (2)

I have now added the new version of the "Towards atomistic resolution structure of phosphatidylcholine glycerol backbone and choline headgroup at different ambient conditions" manuscript in the GitHub:
https://github.com/NMRLipids/nmrlipids.blogspot.fi/tree/master/HGmodelMANUSCRIPT

The filenames for this version are HGmodel_ACStemplate.tex and
HGmodel_ACStemplate.pdf. The latter can be found also from the Dropbox link.

The manuscript has been now put in the ACS template and the current idea is to submit it first to the Journal of American Chemical Society (JACS), for the reasons discussed in the Towards first submission to journal post. I have now modified the abstract and conclusions to be more suitable for this journal. The journal for the first submission can be still discussed if there are objections from the authors.

Otherwise, I hope that the current version will become the final version for the first submission once the issues written with red in the manuscript are fixed.

If you want to proofread, or otherwise comment the manuscript, the most convenient way is that you modify directly the HGmodel_ACStemplate.tex file and make a pull request to update the changes also to the GitHub. More traditional options for commenting/modifying can be used, however the changes should be made somehow visible.

We have also started to use the issue tracker in GitHub and it seems to be quite convenient, at least from my point of view:
https://github.com/NMRLipids/nmrlipids.blogspot.fi/issues

There are issues related to this version as well, and do not be shy to add more. I think that especially more detailed and technical issues are convenient to deal with the issue tracker.

I am now also writing the draft of the cover letter, and I will add this to the GitHub also in the near future.

Monday, March 9, 2015

Current and future activity

The project has expanded, and many things are going on. Here I go through the status of currently active topics, and present some new ideas on which to focus in the near future.

Two publications are currently being written based on results discussed in this blog:

One of the original goals of the project is still in progress:
  • Finding an atomistic (preferably united-atom) force field that correctly describes the glycerol backbone and choline headgroup structure in different conditions. This line of research was discussed mainly by me and Antti Lamberg until July 2014. After that, I have focused on writing the manuscripts on results already obtained. During this time I have learned some relevant details about the (dihedral angle distributions of the) structures, see especially Figs. 5 and 6, and related discussion, in the publication 1. Also, a new united atom model by Tj√∂rnhammar and Edholm, which seems to provide a better structure to start with, was published. I think that this line of research should be continued using the Tj√∂rnhammar14 model as a starting point.

There are several possibilities for the future directions of this project, and I believe that there is a huge potential also on a longer time span. The project will run for at least one more year from now. Further continuation of myself and Markus Miettinen as editors, as well as other long term future directions, depend on the success of our grant applications. For now, I will focus on these two topics that go beyond the original goals of the project:
  • Quantitative quality of lipid–cholesterol interactions in simulations based on x-ray scattering and NMR data. Lipid–cholesterol interactions have been widely studied with MD simulations in the recent years. Qualitatively, simulations reproduce the condensing effect, but it is not clear how accurate, quantitatively, the models are. This is crucial information to judge the credibility of the various simulation predictions. Directly comparing the simulations with different models (made available in Zenodo in this project) against x-ray scattering and NMR results, we can get a pretty clear picture of the quantitative quality of the models. The discussion on this is already going on in the blog with Peter Heftberger and Georg Pabst. I will soon write a blog post discussing this project in more detail.
  • Glycerol backbone and headgroup order parameters for other than phoshatidylcholine lipids (e.g., phophatidylethanolamine (PE), phosphatidylglycerol (PG), phosphatidylserine (PS), glycolipids etc.). I have privately discussed the existence of order parameter data for these and other headgroups at many occasions. The bottom line is that a lot of published data exists, however, they seem to be quite poorly known and difficult to find. To my knowledge, these data have not been reviewed or collected anywhere (not even by Derek Marsh). I think we should, at minimum, collect the available data sets in an easily accessible format. This, with potentially some comparison to simulations, would teach us interesting lessons about the structural differences between headgroups, allowing us to expand on what has been discussed in the literature so far.

In addition, several issues have been shortly touched in the discussions of the blog, but are beyond both the original and the extended scopes of the project. I would be more than happy too see someone to progress, for example, these topics:
  • Temperature dependence of the glycerol backbone and choline order parameters. For example, Andrea Catte has reported interesting data on this and they have been discussed a bit already. There is also some temperature dependence in the data delivered by Fernando Favela.
  • Peter Heftberger shared a collection of x-ray scattering data for different systems. In addition to the lipid–cholesterol interactions mentioned above, there may be many other interesting issues to study with moderate effort by combining the simulation data now available in Zenodo with the scattering data shared in the blog.