-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable spectra_ref in mzTab #240
Conversation
|
@WangHong007 the |
got it @ypriverol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I saw that you are using chunking to load the mzml info CSV.
Is it really that big or slow? I think we should:
- speed up the code. Use less appends, less for loops, less string data
- write one file per mzml or even per mzml+level
bin/mzml_statistics.py
Outdated
@@ -20,17 +21,19 @@ def parse_mzml(file_name, file_columns): | |||
name = os.path.split(file_name)[1] | |||
id = i.getNativeID() | |||
MSLevel = i.getMSLevel() | |||
rt = i.getRT() if i.getRT() else np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about MS3 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* speed up the code. Use less appends, less for loops, less string data
This is about mzml_statistics.py or diann_convert.py?
* write one file per mzml or even per mzml+level
Will do.
What about MS3 ?
Does MS3 have the same attributes as MS2? If so, it can be included in the statement block of MS2 when determining the MSLevel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first comment was about the reading of mzml_info in pmultiqc.
You will usually only have MS3 in TMT. It is similar to MS2 but often has multiple precursors. Not sure if there are important QC metrics on them. We can do it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we only count the spectrum id
and retention time
of MS3 (same as MS1). It will also be recorded in the corresponding *_mzml_info.tsv
bin/mzml_statistics.py
Outdated
|
||
|
||
def mzml_dataframe(mzml_folder): | ||
|
||
file_columns = ["File_Name", "SpectrumID", "MSLevel", "Charge", "MS2_peaks", "Base_Peak_Intensity"] | ||
file_columns = ["File_Name", "SpectrumID", "MSLevel", "Charge", "MS2_peaks", "Base_Peak_Intensity", "Retention_Time", "Exp_Mass_To_Charge"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if that works without an update in pmultiqc.
Comments are basically addressed but I think pmultiqc needs to be updated now to make the tests pass? |
#241 I guess is better I close my PR @WangHong007 and you update yours. I did the changes in your branch @WangHong007 . |
0.0.16 -> 0.0.17
@ypriverol got it. |
No description provided.