Wednesday, March 25, 2015

Mapping scheme for lipid atom names for universal analysis scripts

During project we have generated unique collection of simulation data in the Zenodo community which can be openly used for different kind analyses. Analysis of this data is straighforward for expert, however generating scripts may be tedious since the atom naming convention is different between different models. To ease this I have generated a atom name mapping convention which should allow the usage of the same analysis script for different force fields with minimum effort. This has been done bearing in mind the analysis we have to do for the current activities in this blog, but also the wider usage beyond this project. In this post I describe the idea of the mapping and show some examples where I have used it. I hope that other people try to use this as well and tell if there are suggestions for improvement.

Mapping:
The mapping consist of a file which has the universal atom names in the first column and force field specific atom names in the second column. Here is example from the beginning of the mapping file for the CHARMM36 force field:


Universal name                   force field name
M_G1_M                                     C3
M_G1H1_M                               HX
M_G1H2_M                               HY
M_G1O1_M                               O31
M_G1C2_M                               C31
M_G1C2O1_M                             O32
.
.
.


Universal atomnames always starts with "M_" flag and ends with "_M" flag to ensure that the scripts using grep command never confuses between them and force field specific names. In the actual naming convention between the flags, the first two characters define in which glycerol backbone chain the atoms attached (G1, G2 or G3). Then the naming goes like this:

M_G1_M               ;glycerol backbone C1
M_G1H1_M         ;hydrogen attached to glycerol backbone C1
M_G1H2_M         ;hydrogen attached to glycerol backbone C1
M_G1O1_M         ;first atom (oxygen) in sn-1 chain
M_G1C2_M         ;second atom (carbon) in sn-1 chain
M_G1C2O1_M    ;oxygen attached to second atom (carbon) in sn-1 chain
M_G1C3_M         ;third atom (carbon)  in sn-1 chain
M_G1C3H1_M    ;first hydrogen attached to third atom (carbon)  in sn-1 chain
M_G1C3H2_M    ;second hydrogen attached to third atom (carbon)  in sn-1 chain
.
.
.

So the third character tells the atom type and fourth character tells the counting number from the glycerol backbone carbon. If there are hydrogens or other atoms attached to the main chain, those will be added to the end of the naming, for example, in these cases:

M_G1C2O1_M    ;oxygen attached to second atom (carbon) in sn-1 chain
M_G1C3H1_M    ;first hydrogen attached to third atom (carbon)  in sn-1 chain

The G3 atoms are the headgroup atoms, thus they are little bit more complicated. Examples of complete mapping files for CHARMM36 and MacRog models can be found from GitHub.

Usage:
I have used this mapping for couple of cases now and I think that it works.
Below is example script which calculates the order parameters for acyl chains for CHARMM36 model in low hydrated conditions. The main point is that only changes required to use the script for different systems are the file names for simulation files, output files, mapping file and the order parameter analysis script location. These are 7 lines after "#Define file names" comment. The gro_OP.awk and mapping files can be found from GitHub. It is tedius to generate the mapping files, however, once done the analysis of different properties becomes very straightforward.

Example script:

#!/bin/bash
#Dowload files from Zenodo
wget https://zenodo.org/record/13945/files/popcRUN4.tpr  
wget https://zenodo.org/record/13945/files/popcRUN4.trr
#Define file names
tprname=popcRUN4.tpr
trajname=popcRUN4.trr
trajgroname=analTMP.gro
sn1outname=OrderParamSN1lowHYD.dat
sn2outname=OrderParamSN2lowHYD.dat
mappingFILE=../MAPPING/mappingPOPCcharmm.txt
analFILE=../../nmrlipids.blogspot.fi/scripts/gro_OP.awk
#Make gro file which can be used to calculate the order parameters
echo System | /home/ollilas1/gromacs/gromacs465/bin/trjconv -f $trajname -s $tprname -o $trajgroname -pbc res -b 0
#This is loop over sn-1 carbon segments
for((  j = 3 ;  j <= 16;  j=j+1  ))
do
    #This greps the force field specific atom names using the mapping file
    Cname=$(grep M_G1C"$j"_M $mappingFILE | awk '{printf "%5s\n",$2}')
    H1name=$(grep M_G1C"$j"H1_M $mappingFILE | awk '{printf "%5s\n",$2}')
    H2name=$(grep M_G1C"$j"H2_M $mappingFILE | awk '{printf "%5s\n",$2}')
    #Calculate order parameters
    H1op=$(awk -v Cname="$Cname" -v Hname="$H1name" -f $analFILE $trajgroname)
    H2op=$(awk -v Cname="$Cname" -v Hname="$H2name" -f $analFILE $trajgroname)
    #Print results to file
    echo $j $H1op $H2op >> $sn1outname
done
#This is loop over sn-2 carbon segments
for((  j = 3 ;  j <= 18;  j=j+1  ))
do
    #This greps the force field specific atom names using the mapping file
    Cname=$(grep M_G2C"$j"_M $mappingFILE | awk '{printf "%5s\n",$2}')
    H1name=$(grep M_G2C"$j"H1_M $mappingFILE | awk '{printf "%5s\n",$2}')
    H2name=$(grep M_G2C"$j"H2_M $mappingFILE | awk '{printf "%5s\n",$2}')
    #Calculate order parameters
    H1op=$(awk -v Cname="$Cname" -v Hname="$H1name" -f $analFILE $trajgroname)
    H2op=$(awk -v Cname="$Cname" -v Hname="$H2name" -f $analFILE $trajgroname)
   #Print results to file
    echo $j $H1op $H2op >> $sn2outname
done



The script and output files (OrderParamSN1lowHYD.datOrderParamSN2lowHYD.dat ) can be found also from github.

The results from the above script together with the full hydration results are also shown in Fig. 1

Fig 1. Order parameters for acyl chains calculated from trajectories availabe in Zenodo collection using the mapping file and similar scripts as above. Also experimental results by Ferreira et al. for full hydration are shown. Dehydration induced ordering of acyl chains qualitatively agrees with experiments for DMPC [Dvinskikh et al., Mallikarjunaiah et al.].

I have also ran similar script to calculate acyl chain order parameters from different simulation data with different cholesterol concentrations available in the Zenodo collection. These scripts and produced results are available in GitHub (CHARMM36, MacRog). The results are summarized in Fig. 2
Fig 2. Order parameters for acyl chains calculated from some trajectories with cholesterol availabe in the Zenodo collection using the mapping file and similar scripts as above. The experimental data is from Ferreira et al. For more discussion about CHARMM36 and MacRog in GitHub.

This is quite technical issue, and I am not sure if this presentation is understandable, so please do not hesitate ask further clarifications.


No comments:

Post a Comment

Please sign in before writing your comment.