Wednesday, April 20, 2022

New yaml format of mapping files

Mapping files for lipid atom names were introduced in the NMRlipids project to enable flexible analyses over simulations with different force fields and atom naming conventions. In the mapping file format described in the original post, the first column defines the universal atom name based on its attachment to lipid glycerol backbone carbons and second column gives the topology dependent name in a specific simulation. This format has been highly useful in many projects, including the NMRlipids databank development.

However, also other classifications than the name might be useful for a given atom in some applications. For example, the NMRlipids quality ranking can be made separately for lipid headgroup and tails, and monitoring, for example, flip-flops of lipid headgroups from one leaflet to another might be interesting. For such applications we need information whether a specific atoms belongs to the headgroup or acyl chain region. 

To enable such analyses we have now updated the format of mapping files to yaml in the NMRlipids databank. These files can be directly read as dictionaries in Python, thereby enabling more flexibility in terms of adding new information in the mapping files. The universal atom names are defined as in the original format, but they are now given as the keys for a dictionary. The values of these keys are another dictionary containing the topology specific atom names and the position in a lipid (headgroup, tails, etc.). The mapping file for CHARMM36 POPC exemplified in the original post looks like this in the new format:

 M_G1_M:
  ATOMNAME: C3
  FRAGMENT: glycerol backbone
M_G1H1_M:
  ATOMNAME: HX
  FRAGMENT: glycerol backbone
M_G1H2_M:
  ATOMNAME: HY
  FRAGMENT: glycerol backbone
M_G1O1_M:
  ATOMNAME: O31
  FRAGMENT: glycerol backbone
M_G1C2_M:
  ATOMNAME: C31
  FRAGMENT: sn-1
M_G1C2O1_M:
  ATOMNAME: O32
  FRAGMENT: sn-1
                      . 

              . 

              .

This format enables to extend the information given in the mapping files by adding new subdictionaries as values to the key defining the universal atom name.

While the original universal naming convention was designed for glycerolipids with glycerol backbone, the mapping files have been generated also for other types of molecules, such as cholesterol and dihexadecyldimethylammonium. While the mapping are applicable also for such situation, the universal atom naming convention has to be defined. For example for cholesterol, the universal atom naming convention was based on Fig. 4 in this publication.