-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed bugs #234
fixed bugs #234
Conversation
Update DIANNCONVERT
Fix/conf warnings
|
@@ -108,6 +110,24 @@ workflow DIA { | |||
|
|||
} | |||
|
|||
|
|||
// remove meta.id to make sure cache identical HashCode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why we cannot set an ID that would be unique instead (of removing the ID completely)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The id of each meta
is different. But we only get the parameter values once. When unique
or first
is used, the first run may return meta=[id:1,.. ] but when run again with resume
, it may return meta=[id:2,...], possibly due to parallel mechanism. So there is no cache success and it is recognized by nextflow as a new run.
I don't know if I explained it clearly 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is meta.id set? Do we have control? Then we can just set it to a unique value per experiment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quantms/subworkflows/local/create_input_channel.nf
Lines 76 to 78 in 1ace2fb
meta.id = file(filestr).name.take(file(filestr).name.lastIndexOf('.')) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why does it change then??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we still need to remove id for DIA, even if the experience id is added. Because different meta channels may be generated at runtime, resulting in cache failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I first run, the channel emits meta [id: 1,], but when re-run again, the channel maybe emit meta [id: 2,]. Because they are parallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think meta should look like this:
[
{mzml_ref: foo, exp_ref: designname, param_a: xyz, param_b:xyz }
{mzml_ref: bar, exp_ref: designname, param_a: xyz, param_b:xyz }
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we just use param.input as tag for those steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can rename id to mzml_id or mzml_ref, to make it clear what it is for
bin/diann_convert.py
Outdated
:rtype: float or NoneType | ||
""" | ||
if "X" in seq: | ||
seq = seq.replace("X", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have here multiple sequence possibilities, from the top of my head I remember X, B, Z. I think it is better to create a white list including the 22 amino acids. (ARNDBCEQZGHILKMFPSTWYV).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is also: Is this really completely unknown? Or does it have a mass associated with it as modification for example. Because just counting 0 for an unknown amino acid might be a bit too simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In uniprot fasta you don't have anything associated to it. Then we have two options, or we put null
in the value, or we assume 0 (which is not ideal but is the only way to compute this in mass spec terms).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But how can DIANN identify it then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea. @vdemichev How DIANN handles the unknown sequences in Uniprot, for example peptides with X
. I mean to compute the theoretical mass for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X is interpreted as selenomethionine, an unknown symbol like '*' would be interpreted as either zero mass or glycine, not sure. In general, behaviour on nonsense sequences is undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daichengxin can you change this accordingly.
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).