Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spectral decon #675

Open
wants to merge 48 commits into
base: master
Choose a base branch
from

Conversation

Alexander-Sol
Copy link
Contributor

@Alexander-Sol Alexander-Sol commented Nov 11, 2022

This is the minimal viable implementation of a new deconvolution method, spectral deconvolution.

Spectral Deconvolution works by generating theoretical isotopic distributions for each peptide/proteoform in a given library, generating isotopic envelopes for each charge state, then comparing the envelopes in each spectra to those in the library.

Note that this PR includes a significant refactor (see PR #671) that moves the Proteomics project into the MassSpectrometry project. Due to this change, loading this PR necessitates a clean and rebuild and/or a restart of visual studio.

I also added a Scorer class that implements the Strategy design pattern to score experimental spectra against theoretical spectra. I still need to write tests for this class.

@codecov
Copy link

codecov bot commented Dec 1, 2022

Codecov Report

Merging #675 (b744f49) into master (623dbc4) will decrease coverage by 0.14%.
The diff coverage is 94.25%.

❗ Current head b744f49 differs from pull request most recent head f1a4f28. Consider uploading reports for the commit f1a4f28 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #675      +/-   ##
==========================================
- Coverage   81.12%   80.98%   -0.14%     
==========================================
  Files         147      148       +1     
  Lines       25690    25762      +72     
==========================================
+ Hits        20840    20863      +23     
- Misses       4850     4899      +49     
Impacted Files Coverage Δ
...b/MassSpectrometry/MzSpectra/SpectralSimilarity.cs 100.00% <ø> (ø)
...teomics/ProteolyticDigestion/ProductTypeMethods.cs 0.00% <0.00%> (ø)
mzLib/MzLibUtil/DoubleRange.cs 100.00% <ø> (ø)
mzLib/Test/TestIsotopicEnvelope.cs 0.00% <0.00%> (ø)
...lution/Algorithms/ClassicDeconvolutionAlgorithm.cs 96.29% <50.00%> (-0.89%) ⬇️
...Lib/MassSpectrometry/Deconvolution/Deconvoluter.cs 85.18% <60.00%> (+1.85%) ⬆️
.../MassSpectrometry/Deconvolution/MinimalSpectrum.cs 60.60% <60.60%> (ø)
...etry/Proteomics/Modifications/ModificationMotif.cs 66.66% <66.66%> (ø)
...b/MassSpectrometry/Deconvolution/Scoring/Scorer.cs 69.23% <69.23%> (ø)
...ectrometry/Deconvolution/DeconvoluterExtensions.cs 75.00% <75.00%> (ø)
... and 109 more

Copy link
Collaborator

@acesnik acesnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Looks like there are a few places to add tests for default value returns and exception throws.
  • Was there a dependency problem that led to needing to change where Proteomics is located?

mzLib/mzLib.nuspec Outdated Show resolved Hide resolved
@Alexander-Sol
Copy link
Contributor Author

  • Looks like there are a few places to add tests for default value returns and exception throws.

    • Was there a dependency problem that led to needing to change where Proteomics is located?

The new deconvolution method relies on generating a library of theoretical isotopic distributions for all species in a given protein database. To do so, I need access to classes inside of Proteomics (e.g. PeptideWithSetModifications)

@nbollis
Copy link
Member

nbollis commented Dec 5, 2022

Within Proteomics folder, it appears as through some things have specific namespaces while others do not:
AminoAcidPolymer -> MassSpectrometry.Proteomics.AminoAcidPolymer
Fragmentaiton -> MassSpectrometry.Proteomics.Fragmentation
Modification -> MassSpectrometry.Proteomics
Protein -> MassSpectrometry.Proteomics
ProteolyticDigestion -> MassSpectrometry.Proteomics.ProteolyticDigestion
RetentionTimePrediction -> MassSpectrometry.Proteomics.RetentionTimePrediction

Would it make more sense to keep the namespaces of each file the same as they were before the refactoring? This would enable continuous integration as we would not need to change any namespace references in programs that implement MzLib, but would still allow for them to be in the same project, as was required for your PR.

@acesnik
Copy link
Collaborator

acesnik commented Dec 6, 2022

Historically, this comes from borrowing code from https://github.com/dbaileychess/CSMSL, which has a bit more specific namespace usage.

avcarr2
avcarr2 previously approved these changes Dec 6, 2022
Copy link
Contributor

@avcarr2 avcarr2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alex and I conducted an in-person review over the basics of the algorithm and checked the outputs. The outputs make sense, the deconvolution added is very extendable. It doesn't affect how MetaMorpheus currently deconvolutes thanks to @nbollis's refactoring of the deconvolution code to support a well-executed design pattern.

I think it's a good time to merge this new set of code into master so we can continue working on deconvolution while maintaining a linear-ish commit history.

avcarr2
avcarr2 previously approved these changes Jan 20, 2023
acesnik
acesnik previously approved these changes Jan 23, 2023
Copy link
Collaborator

@acesnik acesnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. You could consider going through your comments one more time and removing ones that aren't meant for perpetuity. There are also quite a few TODOs to review; should these be completed before merging this? Or are they more for future note (could use something other than TODO)?

There's also a pretty big refactor of the Proteomics namespace. Were there any changes in there that are hard to review because of moving those files?

}
else
{
//TODO: Add some averagine bullshit here
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha this one's almost worth framing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need to add something here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not on this PR. That would come in subsequent iterations

public List<Modification> VariableModifications { get; }
public DigestionParams DigestionParams { get; }
public List<SilacLabel> SilacLabels { get; }
// TODO: convert double range to MzRange
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still need to do this?

@@ -79,14 +79,6 @@ public virtual string ToString(string format)
return $"[{Minimum.ToString(format, System.Globalization.CultureInfo.InvariantCulture)};{Maximum.ToString(format, System.Globalization.CultureInfo.InvariantCulture)}]";
}

/// <summary>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intended deletion?

<file src="ThermoRawFileReader\bin\x64\Release\net6.0\ThermoFisher.CommonCore.Data.dll" target="lib\net6.0" />
<file src="ThermoRawFileReader\bin\x64\Release\net6.0\ThermoFisher.CommonCore.MassPrecisionEstimator.dll" target="lib\net6.0" />
<file src="ThermoRawFileReader\bin\x64\Release\net6.0\ThermoFisher.CommonCore.RawFileReader.dll" target="lib\net6.0" />
<file src="UsefulProteomicsDatabases\bin\x64\Release\net6.0\UsefulProteomicsDatabases.dll" target="lib\net6.0" />
<!--net6.0-windows target with mzPlot-->
<file src="mzPlot\bin\x64\Release\net6.0-windows\mzPlot.dll" target="lib\net6.0-windows7.0" />
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This net6.0-windows section has some net6.0 stuff in it

@Alexander-Sol Alexander-Sol dismissed stale reviews from acesnik and avcarr2 via f1a4f28 January 24, 2023 19:48
@acesnik acesnik self-requested a review January 24, 2023 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants