Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new tool: msp_split #234

Merged
merged 19 commits into from
Mar 25, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions tools/msp_split/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
owner: recetox
remote_repository_url: "https://github.com/RECETOX/galaxytools/tree/master/tools/msp_split"
homepage_url: "https://github.com/RECETOX/msp_split"
wverastegui marked this conversation as resolved.
Show resolved Hide resolved
categories:
- Metabolomics
repositories:
msp_split:
description: "split msp spectra."
wverastegui marked this conversation as resolved.
Show resolved Hide resolved

include:
- splitMSP.xml
- splitMSP.py
- test-data
45 changes: 45 additions & 0 deletions tools/msp_split/splitMSP.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import argparse
import os

from matchms.exporting import save_as_msp
from matchms.importing import load_from_msp


def read_spectra(filename):
return list(load_from_msp(filename, False))


def get_spectra_names(spectra):
return [x.get("compound_name") for x in spectra]
martenson marked this conversation as resolved.
Show resolved Hide resolved


def make_outdir(outdir):
return os.mkdir(outdir)


def write_spectra(filename, outdir):
spectra = read_spectra(filename)
names = get_spectra_names(spectra)
for i in range(len(spectra)):
outfile = str(names[i]) + ".msp"
outpath = os.path.join(outdir, outfile)
save_as_msp(spectra[i], outpath)
martenson marked this conversation as resolved.
Show resolved Hide resolved


def split_spectra(filename, outdir):
make_outdir(outdir)
return write_spectra(filename, outdir)


listarg = argparse.ArgumentParser()
listarg.add_argument('--filename', type=str)
listarg.add_argument('--outdir', type=str)
args = listarg.parse_args()
outdir = args.outdir
filename = args.filename


if __name__ == "__main__":
split_spectra(filename, outdir)
else:
print('Do nothing')
martenson marked this conversation as resolved.
Show resolved Hide resolved
39 changes: 39 additions & 0 deletions tools/msp_split/splitMSP.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
<tool id="splitMSP" name="split spectra" version="0.1.0" python_template_version="3.5">
martenson marked this conversation as resolved.
Show resolved Hide resolved
<description>Split MSP spectra file</description>
<requirements>
<requirement type="package" version="0.14.0">matchms</requirement>
</requirements>
<command detect_errors="exit_code"><![CDATA[
python3 $__tool_directory__/splitMSP.py
wverastegui marked this conversation as resolved.
Show resolved Hide resolved
--filename '$msp_input'
--outdir 'output'
]]></command>
<inputs>
<param type="data" name="msp_input" format="msp" />
martenson marked this conversation as resolved.
Show resolved Hide resolved
</inputs>
<outputs>
<collection format="msp" name="sample" type="list">
<discover_datasets pattern="__designation_and_ext__" ext="msp" directory="output"/>
</collection >
</outputs>
<tests>
<test>
<param name="msp_input" value="sample_input.msp" />
<output_collection name="sample" type="list">
<element name="1-NITROPYRENE" file="1-NITROPYRENE.msp" ftype="msp" compare="contains"/>
<element name="3,5-DICHLOROPHENOL" file="3,5-DICHLOROPHENOL.msp" ftype="msp" compare="contains"/>
<element name="3,4-DICHLOROPHENOL" file="3,4-DICHLOROPHENOL.msp" ftype="msp" compare="contains"/>
<element name="2,6-DICHLOROPHENOL" file="2,6-DICHLOROPHENOL.msp" ftype="msp" compare="contains"/>
<element name="2,5-DICHLOROPHENOL" file="2,5-DICHLOROPHENOL.msp" ftype="msp" compare="contains"/>
<element name="2,4-DINITROPHENOL" file="2,4-DINITROPHENOL.msp" ftype="msp" compare="contains"/>
<element name="2,4-DICHLOROPHENOL" file="2,4-DICHLOROPHENOL.msp" ftype="msp" compare="contains"/>
<element name="2,4,6-TRICHLOROPHENOL" file="2,4,6-TRICHLOROPHENOL.msp" ftype="msp" compare="contains"/>
<element name="2,4,5-TRICHLOROPHENOL" file="2,4,5-TRICHLOROPHENOL.msp" ftype="msp" compare="contains"/>
<element name="2,3-DICHLOROPHENOL" file="2,3-DICHLOROPHENOL.msp" ftype="msp" compare="contains"/>
</output_collection>
</test>
</tests>
<help><![CDATA[
All good
martenson marked this conversation as resolved.
Show resolved Hide resolved
]]></help>
</tool>
100 changes: 100 additions & 0 deletions tools/msp_split/test-data/1-NITROPYRENE.msp
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
SYNONYM: 1-NITROPYRENE
DB#: JP000001
INCHIKEY: ALRLPDGCPYIVHP-UHFFFAOYSA-N
MW: 247.063328528
FORMULA: C16H9NO2
ACCESSION: JP000001
AUTHOR: KOGA M, UNIV. OF OCCUPATIONAL AND ENVIRONMENTAL HEALTH
LICENSE: CC BY-NC-SA
INSTRUMENT: VARIAN MAT-44
SMILES: [O-1][N+1](=O)c(c4)c(c1)c(c3c4)c(c2cc3)c(ccc2)c1
INCHI: InChI=1S/C16H9NO2/c18-17(19)14-9-7-12-5-4-10-2-1-3-11-6-8-13(14)16(12)15(10)11/h1-9H
SMILES_2: [H]C=1C([H])=C2C([H])=C([H])C3=C([H])C([H])=C(C=4C([H])=C([H])C(C1[H])=C2C34)N(=O)=O
INSTRUMENT_TYPE: EI-B
MS_LEVEL: MS1
IONIZATION_ENERGY: 70 eV
ION_TYPE: [M]+*
IONIZATION_MODE: positive
LAST_AUTO-CURATION: 1495210335755
MOLECULAR_FORMULA: C16H9NO2
TOTAL_EXACT_MASS: 247.063328528
COMPOUND_NAME: 1-NITROPYRENE
PRECURSOR_MZ: 0
PARENT_MASS: 247.06333
NUM PEAKS: 75
51.0 2.66
55.0 8.0
57.0 7.33
58.0 1.33
59.0 1.33
60.0 14.0
61.0 1.33
62.0 3.33
63.0 3.33
66.0 1.33
68.0 8.66
70.0 2.0
72.0 5.33
73.0 7.33
74.0 3.33
75.0 2.66
76.0 2.0
78.0 1.33
80.0 4.0
81.0 2.0
82.0 1.33
83.0 3.33
86.0 12.66
87.0 8.66
92.0 2.0
93.0 10.0
94.0 6.0
98.0 14.66
99.0 83.33
100.0 60.66
104.0 4.0
107.0 1.33
108.0 1.33
110.0 3.33
112.0 1.33
113.0 1.33
115.0 1.33
116.0 1.33
120.0 1.33
122.0 4.0
123.0 2.66
124.0 2.66
125.0 2.0
126.0 1.33
134.0 1.33
135.0 2.0
137.0 1.33
147.0 1.33
149.0 2.0
150.0 4.66
151.0 3.33
159.0 2.0
162.0 2.0
163.0 2.66
173.0 2.0
174.0 8.66
175.0 4.66
177.0 2.0
187.0 5.33
188.0 4.66
189.0 56.66
190.0 12.0
191.0 16.66
198.0 10.66
199.0 9.33
200.0 72.66
201.0 99.99
202.0 16.0
203.0 1.33
207.0 1.33
214.0 1.33
217.0 25.33
218.0 5.33
247.0 52.66
248.0 10.16

67 changes: 67 additions & 0 deletions tools/msp_split/test-data/2,3-DICHLOROPHENOL.msp
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
SYNONYM: 2,3-DICHLOROPHENOL
DB#: JP000006
INCHIKEY: UMPSXRYVXUPCOS-UHFFFAOYSA-N
MW: 161.963920108
FORMULA: C6H4Cl2O
ACCESSION: JP000006
AUTHOR: KOGA M, UNIV. OF OCCUPATIONAL AND ENVIRONMENTAL HEALTH
LICENSE: CC BY-NC-SA
INSTRUMENT: VARIAN MAT-44
SMILES: Oc(c1)c(Cl)c(Cl)cc1
INCHI: InChI=1S/C6H4Cl2O/c7-4-2-1-3-5(9)6(4)8/h1-3,9H
SMILES_2: [H]OC=1C([H])=C([H])C([H])=C(Cl)C1Cl
INSTRUMENT_TYPE: EI-B
MS_LEVEL: MS1
IONIZATION_ENERGY: 70 eV
ION_TYPE: [M]+*
IONIZATION_MODE: positive
LAST_AUTO-CURATION: 1495210335870
MOLECULAR_FORMULA: C6H4Cl2O
TOTAL_EXACT_MASS: 161.963920108
COMPOUND_NAME: 2,3-DICHLOROPHENOL
PRECURSOR_MZ: 0
PARENT_MASS: 161.96392
NUM PEAKS: 42
51.0 4.43
53.0 10.39
60.0 9.21
61.0 24.93
62.0 43.19
63.0 99.99
64.0 12.57
65.0 4.81
66.0 3.39
71.0 3.67
72.0 15.34
73.0 25.07
74.0 11.84
75.0 8.79
81.0 4.78
82.0 3.25
83.0 2.63
84.0 3.87
85.0 2.49
87.0 5.09
89.0 2.21
91.0 6.02
96.0 3.11
97.0 12.05
98.0 35.88
99.0 22.09
100.0 13.5
101.0 6.26
107.0 3.33
109.0 2.73
125.0 3.11
126.0 59.16
127.0 5.61
128.0 19.32
133.0 5.33
135.0 2.84
161.0 2.52
162.0 68.96
163.0 6.51
164.0 51.64
165.0 2.9
166.0 7.58

90 changes: 90 additions & 0 deletions tools/msp_split/test-data/2,4,5-TRICHLOROPHENOL.msp
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
SYNONYM: 2,4,5-TRICHLOROPHENOL
DB#: JP000009
INCHIKEY: LHJGJYXLEPZJPM-UHFFFAOYSA-N
MW: 195.924947756
FORMULA: C6H3Cl3O
ACCESSION: JP000009
AUTHOR: KOGA M, UNIV. OF OCCUPATIONAL AND ENVIRONMENTAL HEALTH
LICENSE: CC BY-NC-SA
INSTRUMENT: VARIAN MAT-44
SMILES: Oc(c1)c(Cl)cc(Cl)c(Cl)1
INCHI: InChI=1S/C6H3Cl3O/c7-3-1-5(9)6(10)2-4(3)8/h1-2,10H
SMILES_2: [H]OC1=C([H])C(Cl)=C(Cl)C([H])=C1Cl
INSTRUMENT_TYPE: EI-B
MS_LEVEL: MS1
IONIZATION_ENERGY: 70 eV
ION_TYPE: [M]+*
IONIZATION_MODE: positive
LAST_AUTO-CURATION: 1495210336033
MOLECULAR_FORMULA: C6H3Cl3O
TOTAL_EXACT_MASS: 195.924947756
COMPOUND_NAME: 2,4,5-TRICHLOROPHENOL
PRECURSOR_MZ: 0
PARENT_MASS: 195.92495
NUM PEAKS: 65
51.0 2.58
53.0 14.73
59.0 2.03
60.0 12.75
61.0 30.62
62.0 36.79
63.0 19.11
64.0 2.15
65.0 5.23
66.0 13.42
67.0 7.46
69.0 2.46
71.0 6.55
72.0 13.85
73.0 16.02
74.0 7.55
75.0 4.47
79.0 2.34
80.0 8.06
81.0 5.21
82.0 3.22
83.0 7.1
84.0 6.05
85.0 6.38
86.0 2.53
87.0 3.44
89.0 1.93
95.0 3.8
96.0 33.63
97.0 67.27
98.0 25.02
99.0 31.7
100.0 5.86
106.0 2.03
107.0 8.66
108.0 3.94
109.0 6.55
131.0 12.51
132.0 48.06
133.0 32.0
134.0 33.42
135.0 18.37
136.0 6.55
137.0 2.96
149.0 6.48
151.0 3.39
160.0 10.69
161.0 4.76
162.0 10.76
163.0 3.58
164.0 3.61
167.0 4.06
169.0 3.89
177.0 4.76
179.0 2.94
192.0 6.69
194.0 4.64
195.0 6.79
196.0 99.99
197.0 11.45
198.0 92.58
199.0 7.82
200.0 29.54
201.0 2.08
202.0 3.15

Loading