Skip to content

Generates quality metrics and msAlign files from FreeStyle deconvolutions

License

Notifications You must be signed in to change notification settings

liv-acollins/TDAuditor

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TDAuditor

"Inclusion List" experiments are frequently used in top-down proteome experiments to capture tandem mass spectra of proteoforms that are challenging to detect due to being less intense ions or due to relatively high charge (for example, above 30+). These experiments, however, are notoriously difficult to handle in conventional bioinformatics pathways for identification. The TopFD deconvolution engine in TopPIC, for example, will generally produce an error when presented with mzMLs produced from these files. While ProSight PC could handle these spectra if they were first deconvoluted (manually for each precursor) by Thermo's Xtract engine in XCalibur, the contemporary ProSight PD (Proteome Discoverer) software cannot be used on these Xtract-analyzed RAW files. A special new "Xtract All" feature in Thermo FreeStyle 1.8 is now able to automatically deconvolve all precursor ions in a RAW file, but the compact RAW files it generates are different enough from instrument-produced ones that conventional tools like ProteoWizard msConvert encounter errors when attempting to read them. Helpfully, the CompOmics ThermoRawFileParser gained the ability in a 1.4.1 test release to process these files despite their missing "trailers."

Since FreeStyle "Xtract All" outputs RAW files, we need to reformat the data to text to support its use in PrSM visualization tools like ProSight Lite and ClipsMS and to enable its search in TopPIC. The visualization tools require text columns of deconvolved masses and (optionally) intensities, while TopPIC needs input files in msAlign format, a close relative of Mascot Generic Format. In addition to managing these outputs from an mzML produced from an Xtract All RAW file, TDAuditor seeks to produce quality metrics to characterize the RAW data, with emphasis on precursor charge state distributions and a summary table giving brief statistics for each MS/MS scan.

The challenge that most of the source code is designed to address is the determination of charge and precursor mass for each MS/MS precursor. The simplest route to this information is to find a mass in the preceding MS scans that can be matched to one of the allowable charge states (currently up to 50+) for the precursor m/z value. At present, TDAuditor will default to giving this observed mass and inferred precursor Z for each MS/MS. If no precursor mass can be found in the preceding MS scan, TDAuditor will instead give the observed mass and inferred precursor Z that can be observed in the MS/MS itself (this works more reliably in ETD data, when breakage is not followed by dissociation of the fragments: "ETnoD"). The third option for inferring precursor mass is quite unconventional. The "Complements" route seeks a precursor mass that results in matching more complementary pairs of fragment masses. If no precursor can be detected in the preceding MS scan, and no precursor can be detected in the MS/MS scan, the complements route will be folowed. If no reasonable sums exist, the software will default to a +1 precursor (this is the only route for which outputting +1 is a possibility).

The Complements algorithm begins by summing all possible pairs of fragments and then sorting these sums by mass. Each peak pair is then considered as the first peak pair in a range defined by allowable ppm mass tolerance (10 ppm) multipled by the square root of 2 (because the sums of two peaks have a wider variance than individual peak measurements). When a group of peak pairs fall within ~14ppm, then, they buttress each other in evidence for a precursor mass. These mass sums are weighted by the summed natural logs of their intensities to make a weighted average precursor mass. The "score" of this precursor mass is the product of two probabilities: first, the Poisson distribution is used to estimate the probability that this many peak sums would appear in so narrow a space of the total mass sum range. Second, the mass sum is divided by the precursor m/z to estimate the precursor charge. We subtract the rounded ratio from the ratio itself to produce what we might call the "ratio excess." If the excess is 0.5, we have a problem since a predicted charge of 7.5 is not a realistic amount of error. If the excess is 0.01, we feel better. We compute the probability that a random ratio would be closer to 0.0 than this value. The product of the Poisson ratio and the excess probability is the score for this mass sum. We sort by these probability products and consider only the highest ranked.

In practice, the software is able to assign 70-80% of precursors by the MS1 route. A very small number of precursors are detectable in the MS2 if they are not in the MS1. Cleaning up much of the rest requires the Complements route, and a very slim percentage are assigned as singly-charged precursors.

About

Generates quality metrics and msAlign files from FreeStyle deconvolutions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C# 100.0%