Skip to content

Commit

Permalink
Merge pull request #137 from compomics/feature/non-tryptic
Browse files Browse the repository at this point in the history
Add non-tryptic peptide and immunopeptide prediction models
  • Loading branch information
RalfG authored Nov 12, 2021
2 parents bdf15d8 + 4af4b6c commit ea18527
Show file tree
Hide file tree
Showing 33 changed files with 384 additions and 3,520,664 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ms2pip/models/**/*.c filter=lfs diff=lfs merge=lfs -text
4 changes: 4 additions & 0 deletions .github/workflows/build_and_publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
lfs: 'true'
- name: Set up Python
uses: actions/setup-python@v2
with:
Expand All @@ -36,6 +38,8 @@ jobs:
os: [ubuntu-18.04, macos-latest]
steps:
- uses: actions/checkout@v2
with:
lfs: 'true'
- uses: actions/setup-python@v2
name: Install Python
with:
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ jobs:

steps:
- uses: actions/checkout@v2
with:
lfs: 'true'
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
data/
*_pyx.c
*_pyx_*.c
ms2pip/models_xgboost/*.xgboost

# Pytest
.pytest_cache/
Expand Down
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,26 +286,30 @@ next to the predictions for singly charged b- and y-ions.

| Model | Fragmentation method | MS² mass analyzer | Peptide properties |
| - | - | - | - |
| HCD | HCD | Orbitrap | Tryptic digest |
| HCD2019 | HCD | Orbitrap | Tryptic digest |
| HCD2021 | HCD | Orbitrap | Tryptic/ Chymotrypsin digest |
| CID | CID | Linear ion trap | Tryptic digest |
| iTRAQ | HCD | Orbitrap | Tryptic digest, iTRAQ-labeled |
| iTRAQphospho | HCD | Orbitrap | Tryptic digest, iTRAQ-labeled, enriched for phosphorylation |
| TMT | HCD | Orbitrap | Tryptic digest, TMT-labeled |
| TTOF5600 | CID | Quadrupole Time-of-Flight | Tryptic digest |
| HCDch2 | HCD | Orbitrap | Tryptic digest |
| CIDch2 | CID | Linear ion trap | Tryptic digest |
| Immuno-HCD | HCD | Orbitrap | Immunopeptides |

### Models, version numbers, and the train and test datasets used to create each model

| Model | Current version | Train-test dataset (unique peptides) | Evaluation dataset (unique peptides) | Median Pearson correlation on evaluation dataset |
| - | - | - | - | - |
| HCD | v20190107 | [MassIVE-KB](https://doi.org/10.1016/j.cels.2018.08.004) (1 623 712) | [PXD008034](https://doi.org/10.1016/j.jprot.2017.12.006) (35 269) | 0.903786 |
| HCD2019 | v20190107 | [MassIVE-KB](https://doi.org/10.1016/j.cels.2018.08.004) (1 623 712) | [PXD008034](https://doi.org/10.1016/j.jprot.2017.12.006) (35 269) | 0.903786 |
| CID | v20190107 | [NIST CID Human](https://chemdata.nist.gov/) (340 356) | [NIST CID Yeast](https://chemdata.nist.gov/) (92 609) | 0.904947 |
| iTRAQ | v20190107 | [NIST iTRAQ](https://chemdata.nist.gov/) (704 041) | [PXD001189](https://doi.org/10.1182/blood-2016-05-714048) (41 502) | 0.905870 |
| iTRAQphospho | v20190107 | [NIST iTRAQ phospho](https://chemdata.nist.gov/) (183 383) | [PXD001189](https://doi.org/10.1182/blood-2016-05-714048) (9 088) | 0.843898 |
| TMT | v20190107 | [Peng Lab TMT Spectral Library](https://doi.org/10.1021/acs.jproteome.8b00594) (1 185 547) | [PXD009495](https://doi.org/10.15252/msb.20188242) (36 137) | 0.950460 |
| TTOF5600 | v20190107 | [PXD000954](https://doi.org/10.1038/sdata.2014.31) (215 713) | [PXD001587](https://doi.org/10.1038/nmeth.3255) (15 111) | 0.746823 |
| HCDch2 | v20190107 | [MassIVE-KB](https://doi.org/10.1016/j.cels.2018.08.004) (1 623 712) | [PXD008034](https://doi.org/10.1016/j.jprot.2017.12.006) (35 269) | 0.903786 (+) and 0.644162 (++) |
| CIDch2 | v20190107 | [NIST CID Human](https://chemdata.nist.gov/) (340 356) | [NIST CID Yeast](https://chemdata.nist.gov/) (92 609) | 0.904947 (+) and 0.813342 (++) |
| HCD2021 | v20210416 | [Combined dataset] (520 579) | [PXD008034](https://doi.org/10.1016/j.jprot.2017.12.006) (35 269) | 0.932361
| Immuno-HCD | v20210316 | [Combined dataset] (460191) | [PXD005231 (HLA-I)](https://doi.org/10.1101/098780) (46 753) <br>[PXD020011 (HLA-II)](https://doi.org/10.3389/fimmu.2020.01981 ) (23 941) | 0.963736<br>0.942383

To train custom MS²PIP models, please refer to [Training new MS²PIP models](http://compomics.github.io/projects/ms2pip_c/wiki/Training-new-MS2PIP-models.html) on our Wiki pages.
24 changes: 16 additions & 8 deletions ms2pip/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,23 @@
from ms2pip.config_parser import ConfigParser
from ms2pip.exceptions import (FragmentationModelRequiredError,
InvalidModificationFormattingError,
InvalidPEPRECError,
InvalidPEPRECError, InvalidXGBoostModelError,
NoValidPeptideSequencesError,
UnknownFragmentationMethodError,
UnknownModificationError,
UnknownOutputFormatError)
UnknownOutputFormatError,
InvalidXGBoostModelError,
EmptySpectrumError)
from ms2pip.ms2pipC import MODELS, MS2PIP, SUPPORTED_OUT_FORMATS


def print_logo():
logo = """
__ __ ___ __ ___ ___ ___
| \/ / __||_ ) _ \_ _| _ \
| |\/| \__ \/__| _/| || _/
|_| |_|___/ |_| |___|_|
logo = r"""
__ __ ___ __ ___ ___ ___
| \/ / __||_ ) _ \_ _| _ \
| |\/| \__ \/__| _/| || _/
|_| |_|___/ |_| |___|_|
by CompOmics
sven.degroeve@ugent.be
ralf.gabriels@ugent.be
Expand Down Expand Up @@ -155,6 +157,12 @@ def main():
except FragmentationModelRequiredError:
root_logger.error("Please specify model in config file.")
sys.exit(1)
except InvalidXGBoostModelError:
root_logger.error(f"Could not download XGBoost model properly\nTry manual download")
sys.exit(1)
except EmptySpectrumError:
root_logger.error("Provided MGF file cannot contain empty spectra")
sys.exit(1)


if __name__ == "__main__":
Expand Down
18 changes: 9 additions & 9 deletions ms2pip/cython_modules/ms2pip_peaks_c.c
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,23 @@
#include "ms2pip_features_c_catboost.c"

#include "../models/CID.h"
#include "../models/HCD.h"
#include "../models/HCD-2019.h"
#include "../models/TTOF5600.h"
#include "../models/TMT.h"
#include "../models/iTRAQ.h"
#include "../models/iTRAQphospho.h"

float membuffer[10000];
float ions[2000];
float mzs[2000];
float ions[2000];
float mzs[2000];
float predictions[1000];

struct annotations{
float* peaks;
float* msms;
};
typedef struct annotations annotations;

//compute feature vector from peptide + predict intensities
float* get_p_ms2pip(int peplen, unsigned short* peptide, unsigned short* modpeptide, int charge, int model_id, int ce)
{
Expand Down Expand Up @@ -80,7 +80,7 @@ float* get_p_ms2pip(int peplen, unsigned short* peptide, unsigned short* modpept
predictions[2*(peplen-1)-i-1] = score_iTRAQphospho_Y(v+1+(i*fnum))+0.5;
}
}

// EThcD
// else if (model_id == 6) {
// for (i=0; i < peplen-1; i++) {
Expand Down Expand Up @@ -270,7 +270,7 @@ annotations get_t_ms2pip_all(int peplen, unsigned short* modpeptide, int numpeak
// fprintf(stderr,"m %f\n",msms[i]);
//}

for (i=0; i < 18*(peplen-1); i++) { // 2*9 iontypes: b: a -H2O -NH3 b c y: -H2O z y x
for (i=0; i < 18*(peplen-1); i++) { // 2*9 iontypes: b: a -H2O -NH3 b c y: -H2O z y x
ions[i] = -9.96578428466; //HARD CODED!!
mzs[i] = 0; //HARD CODED!!
}
Expand Down Expand Up @@ -489,14 +489,14 @@ annotations get_t_ms2pip_all(int peplen, unsigned short* modpeptide, int numpeak
mem_pos += 1;
}
}
//for (i=0; i < 18*(peplen-1); i++) { // 2*9 iontypes: b: a -H2O -NH3 b c y: -H2O z y x

//for (i=0; i < 18*(peplen-1); i++) { // 2*9 iontypes: b: a -H2O -NH3 b c y: -H2O z y x
// fprintf(stderr,"%f ",ions[i]); //HARD CODED!!
//}
//fprintf(stderr,"\n");

struct annotations r = {ions,mzs};

return r;
}

Expand Down
6 changes: 3 additions & 3 deletions ms2pip/cython_modules/ms2pip_pyx.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ cdef extern from "ms2pip_peaks_c.c":
float* msms

void init_ms2pip(char* amino_masses_fname, char* modifications_fname, char* modifications_fname_sptm)

unsigned int* get_v_ms2pip(int peplen, unsigned short* peptide, unsigned short* modpeptide, int charge)
unsigned int* get_v_ms2pip_ce(int peplen, unsigned short* peptide, unsigned short* modpeptide, int charge, int ce)
unsigned int* get_v_ms2pip_old(int peplen, unsigned short* peptide, unsigned short* modpeptide, int charge)
Expand Down Expand Up @@ -134,10 +134,10 @@ def get_targets_all(np.ndarray[unsigned short, ndim=1, mode="c"] modpeptide,
results = get_t_ms2pip_all(len(modpeptide)-2, &modpeptide[0], len(peaks), &msms[0], &peaks[0], fragerror)
result_peaks = []
for i in range(NUM_ION_TYPES_MAPPING[peaks_version]*(len(modpeptide)-3)):
result_peaks.append(results.peaks[i])
result_peaks.append(results.peaks[i])
result_mzs = []
for i in range(NUM_ION_TYPES_MAPPING[peaks_version]*(len(modpeptide)-3)):
result_mzs.append(results.msms[i])
result_mzs.append(results.msms[i])
#print(result_parsed)
#print()
return (result_mzs,result_peaks)
Expand Down
8 changes: 8 additions & 0 deletions ms2pip/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,11 @@ class InvalidModificationFormattingError(Exception):

class InvalidAminoAcidError(Exception):
pass


class EmptySpectrumError(Exception):
pass


class InvalidXGBoostModelError(Exception):
pass
File renamed without changes.
3 changes: 3 additions & 0 deletions ms2pip/models/HCD-2019/model_20190107_HCD_train_B.c
Git LFS file not shown
3 changes: 3 additions & 0 deletions ms2pip/models/HCD-2019/model_20190107_HCD_train_B2.c
Git LFS file not shown
3 changes: 3 additions & 0 deletions ms2pip/models/HCD-2019/model_20190107_HCD_train_Y.c
Git LFS file not shown
3 changes: 3 additions & 0 deletions ms2pip/models/HCD-2019/model_20190107_HCD_train_Y2.c
Git LFS file not shown
Loading

0 comments on commit ea18527

Please sign in to comment.