Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mzTab for DIA-NN #119

Closed
ypriverol opened this issue Mar 15, 2022 · 10 comments · Fixed by #205
Closed

mzTab for DIA-NN #119

ypriverol opened this issue Mar 15, 2022 · 10 comments · Fixed by #205
Assignees
Labels
enhancement New feature or request high-priority

Comments

@ypriverol
Copy link
Member

Description of feature

DIA-NN results are not only exported to MSstats, we need to be able to export the results to mzTab.

@WangHong007
Copy link
Contributor

When converting the results of DIANN to mzTab, some columns are missing in three levels.

  1. MTD
  • protein_search_engine_score[1]
  • peptide_search_engine_score[1]
  • psm_search_engine_score[1]
  • software[1]
  1. RPH
  • protein_coverage
  1. PEH
  • retention_time_window
  • spectra_ref
  • opt_global_feature_id
  1. PSH
  • calc_mass_to_charge
  • pre
  • post
  • start
  • end
  • spectra_ref
  • opt_global_spectrum_reference
  • opt_global_feature_id
  • opt_global_map_index

@ypriverol
Copy link
Member Author

ypriverol commented Apr 11, 2022

When converting the results of DIANN to mzTab, some columns are missing in three levels.

  1. MTD
  • protein_search_engine_score[1]
  • peptide_search_engine_score[1]
  • psm_search_engine_score[1]
  • software[1]

This is related with the following issue vdemichev/DiaNN#362 We first need to decided which one will be the best scores from DIANN, add them in PSI-MS and the use them in mzTab export.

  1. RPH
  • protein_coverage

For now, this value can be null. @timosachsenberg is null allowed here. ? Another thing you can do, @WangHong007 is to get as input the protein database and compute the protein coverage by using the peptides identified and the protein sequence.

  1. PEH
  • retention_time_window

@timosachsenberg how do you pick this number in proteomicsLFQ. ?

  • spectra_ref
  • opt_global_feature_id

This is not needed @WangHong007.

  1. PSH
  • calc_mass_to_charge
  • pre
  • post
  • start
  • end

Again, all of them can be null

  • spectra_ref

Spectra reference is the combination of the file index of the mzML in the mzTab and the scan reference for the spectrum in the mzML. For example, in one of the label-free experiments:

ms_run[8]:controllerType=0 controllerNumber=1 scan=17

The first part ms_run[8] correspond to the file in the metadata that has the corresponding spectrum. You can have multiple mzMLs, then the index [8] means that file 8 contains the spectrum that was used to identify the PSM.

The second part controllerType=0 controllerNumber=1 scan=17 correspond to the id of the spectrum used for the identification in the mzML. I guess DIA-NN keep also track of the scan corresponding to the peptide. Probably @vdemichev can help us to know which field is that in the ouput.

  • opt_global_spectrum_reference

This one is ony the second part of the id as decribed before.

  • opt_global_feature_id
  • opt_global_map_index

These two are not needed.

@timosachsenberg
Copy link

I think null is fine here

@WangHong007
Copy link
Contributor

Remaining issues

  1. MTD
    Wait for the following data to be added in OLS, here Add software information of DIANN into OLS. vdemichev/DiaNN#362
    protein_search_engine_score[1]
    peptide_search_engine_score[1]
    psm_search_engine_score[1]
    software[1]

  2. PEH
    This is related with the following issue Mapping between peptides in main report file and input mzml spectra vdemichev/DiaNN#350
    retention_time_window
    spectra_ref

  3. PSH
    spectra_ref
    opt_global_spectrum_reference

@vdemichev
Copy link

DIA-NN stores the scan numbers for each precursor, and these are separate for MS2 (note that for both MS1 and MS2 counting scans in DIA-NN is separate and starts with 0). The respective output column is MS2.Scan.

@ypriverol
Copy link
Member Author

Thanks for your quick response @vdemichev :

Do you mean, that the MS2.Scan is basically an index system corresponding to the order of the scan in the RAW/mzML file?

@vdemichev
Copy link

Yes, but it indexes MS2 scans only. If there's scan, say, 1000, then an MS1 scan, and then another MS2 scan, then this other MS2 scan will have index 1001, not 1002.

@WangHong007
Copy link
Contributor

So DIANN counts the scanned precursors and fragments, MS2.Scan does not refer to scan or index in the mzML file? eg. MS2.scan=73091 in main report doesn't refer to <spectrum id="controllerType=0 controllerNumber=1 scan=73091" index="73090" defaultArrayLength="441"> in mzML file.

@vdemichev
Copy link

vdemichev commented Apr 25, 2022

Yes, it does not refer to the mzML scan number. Is it a significant problem here? If there's access to the mzML, can just map one ID into another ID by counting specifically MS2 scans?

Vadim

@ypriverol
Copy link
Member Author

For the release of 1.1 we will be exporting only the protein and peptide sections as agreed with @WangHong007. Then, when DIA-NN exports the original scan in the mzML we will export the PSM table.

@ypriverol ypriverol linked a pull request Aug 3, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request high-priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants