Dear collaborators,
I am Alexey Nesterenko, a postdoc working with Markus at UiB. I would like to announce a new NMRlipids project, NMRlipids Databank 2.0, that will result in a paper targeting the journal Digital Discovery (RSC) and reporting both code improvements and a repository extension. For both of these, we already have quite some material from my contributions over the last two years, and from collaborators' contributions triggered by the NMRlipids2025 workshop that we hosted in Bergen recently.
Current state
Since the inaugural publication in 2024, the NMRlipids Databank has undergone a number of important changes. Below I list the changes that are completely done, (some) with the GitHub handle of the contributor who was mostly responsible:
- We have isolated a package called
DatabankLib. Now it can be installed, imported, etc. - We have separated all data to a separated repo BilayerData. We do not track history of code and data together anymore.
- We have made unit and regression tests.
- We have made a portal for automatic system addition (thanks to @MagnusSletten)
- We have removed form-factor code completely and integrated instead MAICoS, which now deals with both the electron density and form-factors (thanks to @picocentauri). It gives us also the possibility to calculate water orientation profiles for all systems and dielectric profiles for neutral systems.
- C-H bond order parameter code became many times faster (thanks to @batukav).
- We developed a schema and added metadata to all lipid molecules (thanks to @korbinib).
- We made a branch that allows Databank to handle lipid monolayers and compare their isotherms and reflectivity profiles (thanks to @fsuarezleston).
This set of changes has already made the project significantly stronger. I would say it is now close to becoming a "beta" version, and thus worth making the next publication.
Contribution rules
The spirit of the original NMRlipids authorship rules will be followed, with the following revisions:
- I will be the last and corresponding author.
- Otherwise, the author list be alphabetical. However, the first-author position can be ear-marked to someone willing to commit themselves to solving a major part of issues; if you would like to do this, please let me know!
- Everyone who contributes by committing
- to NMRlipids/Databank starting from v1.0.1 or
- to NMRlipids/BilayerData starting from v.1.0.1 or
- to NMRlipids/databank-templates starting from 7.02.2024 (Nat. Com. paper release) or
- to NMRlipids/BilayerGUI_laravel,
- to the manuscript-repository or to the connected Overleaf project (will be shared soon)
will be invited to become co-authors.
Each coauthor shall detail their contribution in the "Author contribution" paragraph, aligning with the authorship definition described by ICMJE, as recommended by RSC.
All co-authors get access to the Databank-v2 manuscript repository and/or the corresponding Overleaf project.
All contributors are invited to biweekly sprint sessions to monitor the progress, agree on tasks, and redistribute remaining issues. The first meeting will be on 15th of August 2025 at 16:30 CEST (Zoom ID: 627 3757 9559, password "NMRlipids2"). The last meeting will be in December, after which I will submit the manuscript to the journal and to ChemRxiv, so the contribution window is Aug–Dec 2025.
Outcome list
To make the Databank interesting for the scientific community, we need demonstrate what it can provide them. However, instead of demonstrating scientific results in membrane biophysics, I would like this paper to be a guide for "scientific librarians"; this is also something that our target journal allows.
I see the following four workpackages as potential outcomes:
-
[we are on the way] Community project -> public project that has potential to become a standard in structural chemistry of biomembranes and is easy-to-use in various modes. The previous Databank version was used only by people from a narrow community, and one reason was probably that the implementation was a bit raw. Here we now offer a product targeting a broader audience due to increased usability, cleaner structure, and polished workflow. This workpackage refers mostly to GitHub projects, and assumes that we present a mature package published in pypi.org, supplemented with clean policies and data schemes, regular releases, sufficient documentation, automatized system addition, and finally well-indexed web-site with the pages properly cross-linked with other lipid projects. It institutionalizes the work process around lipid bilayer structural data so that the threshold of entrance becomes low.
-
[almost ready] The architecture for the repositories of experimental-driven simulations. From the beginning, NMRlipids Databank has possessed a very nice data flow from the architectural point of view. The flow was so good that it survived our recent large refactoring without being changed at the core. We describe the flow in the manuscript in a way that it can become a seed for analogous projects for various molecular systems. It is nice to have clear schemes, figures, and code examples of how the engine works so that people can inherit them for their own objects. It is important for this workpackage to have a clear code for the core of
DatabankLib. We have a clear example of "expandability": lipid monolayers. This was done by Fabian, who added the functionality to include monolayer simulations and two types of experiments: p–A isotherms and X-ray reflectometry. I want his branch to be rebased onto the current version and the monolayer data to be isolated in a separate repository. That will create a ready-to-go NMRlipids Monolayer Databank forked out from the main trunk. -
[contribution required] Good coverage of experimental data and clear force field ranking. Currently, only POPC is covered well, which is not good. There are also experiments that are not connected to simulations. For example, only 19 of 45 registered form-factor experiments has a paired simulation, and 5 have only one. With order parameters, only 8 of 50 molecules has at least one simulation connected. To improve the coverage, first and easy way is to add simulations for uncovered experiments, and to add existing experiments from the literature. Also, it is rather straightforward work to have experiments well covered with different force fields. So many modes of contribution are open here! And it will be very helpful for us to recruit people into real-time testing.
-
[contribution required] Clear, purified, annotated and curated datasets is very important if our project is to find a place among other chemistry datasets (see, e.g., awesome-chemistry-datasets and Kaggle). In our original paper, we demonstrated that we have API and datasets can be extracted. But now we should actually extract datasets, clean and purify them, and demonstrate what kind of models can be potentially learned from them. What I currently have in mind: (i) bilayer geometry (area / thickness), (ii) electron-density / form-factor, (iii) atom-aligned SMILES / OP values, (iv) carbonyl-aligned water profiles (water orientation, dielectric profiles). I am personally targeting to have a clear cheminformatics dimension in these datasets so that we could, for example, learn order parameter from chemical environment (DS#iii).
I am very open to modify the outcome list during late August – early September if some contributors express strong commitment to include a particular topic there.
Best regards,
Alexey