Topic: Using MetFrag for compound identification with MS/MS data and additional information.
A online training presentation is available here.
In this hands-on session you will learn how to use MetFrag to annotate MS/MS spectra as a first step to identify a molecular structure given MS and MS/MS information. Furthermore, we will use additional experimental and meta data to support a putative identification.
In this example we have extracted a feature from a water (river) sample from a LC-MS/MS measurement with a precursor m/z 230.1162 at retention time 10.1 minutes. The data is acquired on a LTQ Orbitrap XL with a high mass accuracy (<5ppm) in positive ion mode. The adduct type of the selected precursor ion is known as [M+H]⁺.
Please download the prepared data:
- visit the MetFragWeb tool in your browser https://msbi.ipb-halle.de/MetFrag
- define database settings to retrieve candidates given the MS1 information:
- use the precursor m/z value and type to calculate the neutral monoisotopic mass
- check mass accuracy
- select PubChemLite in the "Local Databases" section as compound database
- start a candidate retrieval by clicking "Retrieve Candidates"
- MetFrag searches candidates matching the information given by the "Database settings" (here: Neutral Mass and 5 ppm deviation)
- after the retrieval you can download the candidate list as CSV or XLS to get a first overview about the retrieved data set
- use the "Fragmentation settings" tab to add the given MS2 peak list
- you can visualize the peak list by clicking on the "Show Spectrum" button
- keep the settings for the in silico fragmentation and start the processing by clicking "Process Candidates"
- MetFrag now generates fragments for each candidate up to the specified tree depth
- the fragments are mapped to the MS/MS peak list (based on mass) which is used to calculate a score for each candidate
- after the processing is finished you see the ranked candidates list in the "Results" tab
- here you have different possibilities:
- you can filter candidates by explained peaks
- investigate explained fragments and calculated scores for each candidate
- download ranked candidate list as CSV or XLS file
Questions:
Q1: How many different molecular formulas are present?
Q2: What do you think is the correct molecular formula?
Q3: What else could you do to verify the molecular formula besides using the given MetFrag results?
--
Visit http://www.envipat.eawag.ch/index.php and verify your molecular formula.
- use the same settings as in 1 a) but add the molecular formula
- also select "Include references" when using PubChem
- use the same settings as in 1 b) and process the candidates
Questions:
Q4: Looking at the results, what has changed compared to using the monoisotopic mass as candidate filter?
Q5: Is the molecular formula helpful here?
- adding additional information available from the experimental context is often helpful to verify a putative identification
- we want to add retention time as additional experimental information
- MetFrag includes a retention time model
- linear correlation of n-octanol/water partition coefficient(logP) and retention time
- candidate logP is predicted by XLogP3(retrieved from PubChem) or calculated by CDK's XLogP
- rt_XlogP.csv contains a data set of measured rt and XLogP3 values of 254 Eawag standards:
- upload the data set to the MetFragWeb tool in the "Candidate Filter & Score Settings" tab using the "Retention Time" panel on the right side (direct download: rt_XlogP.csv)
- after the file upload set the retention time of the precursor and select XLogP3 as partition coefficient which is used for correlation
- this results in an additional scoring term in the scoring function of MetFrag
- use the same settings as in 2 b) and process the candidates
Questions:
Q6: What has changed compared to the previous run?
Q7: Use the weight sliders in the "Results" tab. Does it change anything?
Q8: Is the retention time information helpful here?
- meta information can help to verify putative identifications depending on the experimental context
- however, you need to be careful when using this information which is not related to your acquired data
- in the "Candidate Filter & Score Settings" tab select the additional "Database Scoring Terms"
- PubChemNumberPubMedReferences
- PubChemNumberPatents
- use the same settings as in 3 b) and process the candidates
Questions:
Q9: What has changed compared to the previous run?
Q10: Would the number of references and patents have helped for a metabolomics experiment?
Q11: Investigate the high intensity fragments of the first ranked candidate. Are they plausible compared to fragment structures of other candidates?
- visit MassBank EU (https://massbank.eu)
- select the "Peak Search" and add the most intense explained peaks
- hitting "Search" to find spectra with matching peaks in the database
Questions:
Q12: Investigate the results and compare them to your MetFrag result list. Any conclusions?
- in the "Candidate Filter & Score Settings" tab enable "Spectral Similarity"
- MetFrag will now query the MS/MS peak list against a spectral library mirror to search for similar spectra of known compounds
- use the same settings as in 4 b) and process the candidates
Questions:
Q13: Discard the meta information scores to just use the results based on experimental data. Any conclusions?
- visit the CASMI contest site (http://www.casmi-contest.org/2017/challenges_1-45.shtml)
- try to identify some of the compounds
- check your results here
- MetFrag can be used on command line to process batches of annotation tasks -- its called MetFragCLI
- parameter files for MetFragCLI can be created by web interface
- get your copy of MetFragCLI from
http://ipb-halle.github.io/MetFrag/ - setup one example calculation to retrieve a set of valid parameter
~/course$ ls *
MetFrag2.4.2-CL.jar MetFragWeb_Parameters.zip
data:
challenge-001-msms.txt challenge-003-msms.txt challenge-005-msms.txt challenge-007-msms.txt challenge-009-msms.txt
challenge-001-ms.txt challenge-003-ms.txt challenge-005-ms.txt challenge-007-ms.txt challenge-009-ms.txt
challenge-002-msms.txt challenge-004-msms.txt challenge-006-msms.txt challenge-008-msms.txt
challenge-002-ms.txt challenge-004-ms.txt challenge-006-ms.txt challenge-008-ms.txt
MetFragWeb_Parameters:
MetFragWeb_Parameters.cfg MetFragWeb_Peaklist.txt README.txt
- slightly adjust MetFragWeb_Parameters.cfg to use ionized precursor mass
- works well in conjunction with "PrecursorIonMode" option
# 1 for M+H and -1 for M-H
PrecursorIonMode = 1
FragmentPeakMatchRelativeMassDeviation = 5.0
SampleName = MetFragWeb_Sample
MetFragCandidateWriter = XLS
DatabaseSearchRelativeMassDeviation = 5.0
FragmentPeakMatchAbsoluteMassDeviation = 0.001
MetFragDatabaseType = PubChem
ResultsPath = .
#NeutralPrecursorMass = 272.068624
IonizedPrecursorMass = 272.068624
MetFragScoreTypes = FragmenterScore
MetFragScoreWeights = 1.0
MetFragPreProcessingCandidateFilter = UnconnectedCompoundFilter,IsotopeFilter
IsPositiveIonMode = true
MaximumTreeDepth = 2
NumberThreads = 1
UseSmiles = true
PeakListPath = MetFragWeb_Peaklist.txt
- create the directories and populate with files
for x in `seq -f %03g 1 9`; do
mkdir challenge-${x};
cp data/challenge-${x}* challenge-${x};
cp MetFragWeb_Parameters/MetFragWeb_Parameters.cfg challenge-${x};
ln -s challenge-${x}-msms.txt challenge-${x}/MetFragWeb_Peaklist.txt;
done
- inject precursor mass
for x in `seq -f %03g 1 9`; do
mass=`head -n1 challenge-${x}/challenge-${x}-ms.txt | cut -f1`;
sed -i 's|IonizedPrecursorMass =.*|IonizedPrecursorMass ='${mass}'|g' challenge-${x}/MetFragWeb_Parameters.cfg
done
for x in `seq -f %03g 1 9`; do
cd challenge-$x;
java -jar ../MetFrag2.4.2-CL.jar MetFragWeb_Parameters.cfg
cd ..;
done