Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added tool: matchms_metadata_merge #428

Merged
merged 6 commits into from
Nov 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions tools/matchms/matchms_metadata_merge.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
<tool id="matchms_metadata_merge" name="matchms metadata merge" version="@TOOL_VERSION@+galaxy0" profile="21.09">
<description>Merge metadata csv into MSP by a specified column</description>

<macros>
<import>macros.xml</import>
<import>help.xml</import>
</macros>

<expand macro="creator"/>

<edam_operations>
<edam_operation>operation_2409</edam_operation>
</edam_operations>
<expand macro="bio.tools"/>

<requirements>
<requirement type="package" version="@TOOL_VERSION@">matchms</requirement>
</requirements>

<command detect_errors='aggressive'><![CDATA[
python '${matchms_python_cli}'
]]></command>

<configfiles>
<configfile name="matchms_python_cli">
import pandas
import matchms
import numpy as np

matchms.set_matchms_logger_level('ERROR')
matchms.Metadata.set_key_replacements({})

spectra = list(matchms.importing.load_from_msp('${spectral_library}', False))

metadata_table = pandas.read_csv('${metadata_table_file}', dtype=object)
metadata_table.columns = map(str.lower, metadata_table.columns)

metadata_table.drop_duplicates(subset='${user_specified_column}'.lower(), inplace=True)

spectra_metadata= pandas.DataFrame.from_dict([x.metadata for x in spectra])
spectra_metadata.dropna(axis=1, inplace=True)

merged = metadata_table.merge(spectra_metadata, on='${user_specified_column}'.lower(), how='right')

spectra_arr = np.asarray(spectra, dtype=object)

def update_metadata(spectrum: matchms.Spectrum, row):
metadata = spectrum.metadata
metadata.update(row)
spectrum.metadata = metadata
return spectrum

vec_update_metadata = np.vectorize(update_metadata)
merged_array = vec_update_metadata(spectra_arr, merged.to_dict(orient='records'))

matchms.exporting.save_as_msp(merged_array.tolist(), '${output}')
</configfile>
</configfiles>

<inputs>
<param label="Spectra file" name="spectral_library" type="data" format="msp"
help="Mass spectral library file." />
<param label="Metadata csv file" name="metadata_table_file" type="data" format="csv"
help="csv file containing the metadata." />

<param label="specify column/metadata key" name="user_specified_column" type="text" value="compound_name" help="Name of the user specified column to merge the data on." />
</inputs>

<outputs>
<data label="${tool.name} on ${on_string}" name="output" format="msp">
</data>
</outputs>

<tests>
<test>
<param name="spectral_library" value="metadata_merge/input.msp" ftype="msp"/>
<param name="metadata_table_file" value="metadata_merge/metadata.csv" ftype="csv"/>
<param name="user_specified_column" value="name"/>
<output name="output" file="metadata_merge/output.msp" ftype="msp"/>
</test>
</tests>

<help>
**Description**
The tool takes an msp file and a metadata csv file and merges the metadata in the csv
file with the metadata in the MSP file on a user specified column.
</help>

<citations>
<citation type="doi">https://doi.org/10.5281/zenodo.8083373</citation>
</citations>
</tool>
35 changes: 35 additions & 0 deletions tools/matchms/test-data/metadata_merge/input.msp
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
INCHI: InChI=1S/C10H15N5O10P2/c11-8-5-9(13-2-12-8)15(3-14-5)10-7(17)6(16)4(24-10)1-23-27(21,22)25-26(18,19)20/h2-4,6-7,10,16-17H,1H2,(H,21,22)(H2,11,12,13)(H2,18,19,20)
INSTRUMENTTYPE: LC-ESI-QQ
COLLISIONENERGY: 40
FORMULA: C10H15N5O10P2
NAME: ADP
PRECURSORMZ: 428.31
IONMODE: positive
NUM PEAKS: 2
135.0 83.0
136.0 999.0

INCHIKEY: BEJNERDRQOWKJM-UHFFFAOYSA-N
INCHI: InChI=1S/C6H6O4/c7-2-4-1-5(8)6(9)3-10-4/h1,3,7,9H,2H2
INSTRUMENTTYPE: LC-ESI-ITFT
COLLISIONENERGY: 60 % (nominal)
FORMULA: C6H6O4
NAME: Kojic acid
PRECURSORTYPE: [M-H]-
PRECURSORMZ: 141.0193
IONMODE: negative
NUM PEAKS: 1
141.0194 999.0

INCHI: InChI=1S/C18H22N2/c1-19-12-14-20(15-13-19)18(16-8-4-2-5-9-16)17-10-6-3-7-11-17/h2-11,18H,12-15H2,1H3
INSTRUMENTTYPE: LC-ESI-ITFT
COLLISIONENERGY: 85% (nominal)
FORMULA: C18H22N2
NAME: Cyclizine
PRECURSORTYPE: [M+H]+
PRECURSORMZ: 267.1856
IONMODE: positive
NUM PEAKS: 3
99.0917 6.0
165.0698 2.0
167.0856 999.0
4 changes: 4 additions & 0 deletions tools/matchms/test-data/metadata_merge/metadata.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Name,inchikey,SMILES
ADP,XTWYTFMLZFPYCI-UHFFFAOYSA-N,C1=NC2=C(C(=N1)N)N=CN2C3C(C(C(O3)COP(=O)(O)OP(=O)(O)O)O)O
Kojic acid,BEJNERDRQOWKJM-UHFFFAOYSA-N,C1=C(OC=C(C1=O)O)CO
Cyclizine,,CN1CCN(CC1)C(C2=CC=CC=C2)C3=CC=CC=C3
41 changes: 41 additions & 0 deletions tools/matchms/test-data/metadata_merge/output.msp
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
INCHI: InChI=1S/C10H15N5O10P2/c11-8-5-9(13-2-12-8)15(3-14-5)10-7(17)6(16)4(24-10)1-23-27(21,22)25-26(18,19)20/h2-4,6-7,10,16-17H,1H2,(H,21,22)(H2,11,12,13)(H2,18,19,20)
INSTRUMENTTYPE: LC-ESI-QQ
COLLISIONENERGY: 40
FORMULA: C10H15N5O10P2
NAME: ADP
PRECURSORMZ: 428.31
IONMODE: positive
INCHIKEY: XTWYTFMLZFPYCI-UHFFFAOYSA-N
SMILES: C1=NC2=C(C(=N1)N)N=CN2C3C(C(C(O3)COP(=O)(O)OP(=O)(O)O)O)O
NUM PEAKS: 2
135.0 83.0
136.0 999.0

INCHIKEY: BEJNERDRQOWKJM-UHFFFAOYSA-N
INCHI: InChI=1S/C6H6O4/c7-2-4-1-5(8)6(9)3-10-4/h1,3,7,9H,2H2
INSTRUMENTTYPE: LC-ESI-ITFT
COLLISIONENERGY: 60 % (nominal)
FORMULA: C6H6O4
NAME: Kojic acid
PRECURSORTYPE: [M-H]-
PRECURSORMZ: 141.0193
IONMODE: negative
SMILES: C1=C(OC=C(C1=O)O)CO
NUM PEAKS: 1
141.0194 999.0

INCHI: InChI=1S/C18H22N2/c1-19-12-14-20(15-13-19)18(16-8-4-2-5-9-16)17-10-6-3-7-11-17/h2-11,18H,12-15H2,1H3
INSTRUMENTTYPE: LC-ESI-ITFT
COLLISIONENERGY: 85% (nominal)
FORMULA: C18H22N2
NAME: Cyclizine
PRECURSORTYPE: [M+H]+
PRECURSORMZ: 267.1856
IONMODE: positive
INCHIKEY: nan
SMILES: CN1CCN(CC1)C(C2=CC=CC=C2)C3=CC=CC=C3
NUM PEAKS: 3
99.0917 6.0
165.0698 2.0
167.0856 999.0