GitHub - llambda/Lipid2MD: Lipid2MD modify

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
__pycache__		__pycache__
CONFIGCONFIG.xlsx		CONFIGCONFIG.xlsx
README.txt		README.txt
TEMP.txt		TEMP.txt
TEMPFILE.csv		TEMPFILE.csv
cg2_trial.csv		cg2_trial.csv
cg2_trial_class.csv		cg2_trial_class.csv
cg3_trial.csv		cg3_trial.csv
cg3_trial_class.csv		cg3_trial_class.csv
codici motore.txt		codici motore.txt
config.csv		config.csv
config2MD.yaml		config2MD.yaml
dataset_species_ER_median.csv		dataset_species_ER_median.csv
fa_trial.csv		fa_trial.csv
fa_trial_class.csv		fa_trial_class.csv
fa_trial_grouped.csv		fa_trial_grouped.csv
lipid2MD.log		lipid2MD.log
m_codes.txt		m_codes.txt
message		message
trial.csv		trial.csv
trial_cg.csv		trial_cg.csv
trial_cg_grouped.csv		trial_cg_grouped.csv
trial_class.csv		trial_class.csv
trial_grouped.csv		trial_grouped.csv
x.py		x.py

Repository files navigation

The lipid2MD tools was designed with the aim to connect experimental lipidomics data
to in silico data. With this tool we are able to connect categories of lipids from 
lipidomics data to the definitions of the lipids available in the all-atom force field charmm36
as well as selected lipids in the coarse-grain force fields Martini v2. and v3. 
The first input file should be in the following format:

"species","ER"
"Cer 32:0;2",0
"Cer 32:1;2",0.0176653725206329
"Cer 34:0;2",0.00176727834288462
"Cer 34:1;2",0.158168168483941
"Cer 34:2;2",0.0287423986260666
"Cer 34:3;2",0
....

In one column we have the category of the lipid while on the other its concentration.
The second input file is the config2MD.yaml available in the current folder in which
the definitions of the lipids of charmm36 and Martini are stored.
To run the tool:

lipid2MD -i dataset_species_ER_median.csv -c config2MD.yaml -fa -cg -cg2 -o trial -og trial -l 600

The flag "-o" and "-og" specify the name of the output files. Only one or both
flag can be selected.

The "-o" flag specify an output in the following format :

---------------------------------------
Species,abbrev,lm_id,name,Exceptions,Charmm36,concentration
CL 68:2,CL 68:2,LMGP12010081,"CL(1'-[16:0/18:1(9Z)],3'-[16:0/18:1(9Z)])",False,POCL2,0.0
CL 68:2,CL 68:2,LMGP12010081,"CL(1'-[16:0/18:1(9Z)],3'-[16:0/18:1(9Z)])",False,POCL1,0.0
Cer 42:1;2,Cer 42:1;O2,LMSP02010012,Cer(d18:1/24:0),True,CER240,0.0675962090464608
Cer 42:2;2,Cer 42:2;O2,LMSP02010009,Cer(d18:1/24:1(15Z)),True,CER241,0.134158391912655
PA 32:2,PA 32:2,LMGP10010969,PA(16:1(9Z)/16:1(9Z)),False,DYPA,0.0283727439444567
PA 34:1,PA 34:1,LMGP10010032,PA(16:0/18:1(9Z)),False,POPA,0.0526526227726728
....
CL 68:5,NA,NA,NA,False,NA,0.0
---------------------------------------

Here are stored the information coming from the original lipidomics dataset i.e column "Species",the abbreviations of the chains, the lipidmaps ID of the lipid,the name, the category of the lipid, the force field name and ultimately the concentration from
the original dataset.

The "-og" group all the results from above and organized them in the following manner:


---------------------------------------
Species,ID,charmm36,concentration
CL 64:3,NA,['NA'],['NA'],0.0
CL 64:4,"CL(1'-[16:1(9Z)/16:1(9Z)],3'-[16:1(9Z)/16:1(9Z)])",['NA'],"['TYCL2', 'TYCL1']",0.0
CL 66:2,CL 66:2,"['LMGP12010655', 'LMGP12010025', 'LMGP12010115', 'LMGP12010010']",['NA'],0.0
.....
---------------------------------------

Here there are stored the information from the original dataset i.e column "Species", the lipid ID of the lipids, the force field
name of the lipid and the concentration.
In case of more than one species more name will be stored in the list i.e è ['DUPC', 'DLiPC].

In the case of not matching species values are replaced with NA both in the non-grouped and grouped output.

The flag "-fa", "-cg" and "-cg2" specify the force field. Only one can be selected.

The flag "-l" is optional, and takes an integer as argument, specifying the desired number of lipids in the membrane. 

An output file "filename_class.csv" is created, calculating for each membrane class the number of lipids to be selected for membrane simulation,
based on the relative abundance of each lipid class. 
The file has the following format:

---------------------------------------
Class, relative abundance %, adjusted relative, number of lipids, rounded number of lipid, most common lipid, Match in force field
Cer, 0.418, 0.506, 2.025, 2.0, Cer 34:1;2, CER160
Chol, 7.374, 8.929, 35.717, 36.0, Chol, Cholesterol
Hexcer, 0.042, 0.051, 0.207, 0.0, HexCer 34:1;2, 
.....
---------------------------------------

Here each membrane class is related to its relative abundance, adjusted abundance after removing non-membrane lipids.
A number of lipids to match the relative abundance is calculated to reach 1000 lipids in total or the amount chosen by the user. 
For each class, the most common lipid is selected from the original concentrations. 
Finally, the most common lipid is matched to its name in the force field. For coarse grain force fields, there can be more than one name.