Figure 1: Schematic structure of the new databank. Beta versions of the Databank, Databank builder and Databank analyzer codes are available. |
The presentation was followed by a highly useful discussion, thanks to more than 20 participants. The discussion was mainly focused on urgent issues brought up in the presentation that were complemented by additional points raised by the participants. The outcomes of the meeting and some decisions based on the discussions are listed here.
- What information will be stored into the dictionary files composing the databank? Current plan is to include information requested from contributor that cannot be read afterwards (force field information, trajectory length, etc.), and information necessary for using (file names and sources) and searching (number of molecules and temperature) the data. Note, however, that the tpr (or corresponding) and trajectory files are accessible through the databank. Thereby all the information of each simulation is available even thought everything is not written directly into the dictionary. For detailed discussion, see the GitHub issue.
- How molecules will be named? When writing and searching the data from the databank, we need unique machine readable names for molecules. There will be a list of molecule names (for example, POPC, POT, TIP3P, etc.) that will be used by default. If the uploaded simulation has different names, user has to tell those. For detailed discussion, see the GitHub issue.
- Unique convention for the atoms within the molecules. For now, we will use the idea of mapping files updated with a third column that tells the residue name for each atom. This should be useful in situations where parts of one lipid are named with different residue names, such as in the current Amber force field convention. For detailed discussion see the GitHub issue.
- File format for the dictionary. If practically feasible, we will consider saving dictionary in yaml format instead of json. For detailed discussion see the GitHub issue.