Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed bugs #234

Merged
merged 18 commits into from
Oct 24, 2022
Merged

fixed bugs #234

merged 18 commits into from
Oct 24, 2022

Conversation

daichengxin
Copy link
Collaborator

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs- [ ] If necessary, also make a PR on the nf-core/quantms branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@github-actions
Copy link

github-actions bot commented Oct 20, 2022

nf-core lint overall result: Passed ✅

Posted for pipeline commit 12884d4

+| ✅ 155 tests passed       |+
#| ❔   3 tests were ignored |#

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 2.6
  • Run at 2022-10-24 03:30:32

workflows/dia.nf Outdated Show resolved Hide resolved
@@ -108,6 +110,24 @@ workflow DIA {

}


// remove meta.id to make sure cache identical HashCode
Copy link
Collaborator

@jpfeuffer jpfeuffer Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why we cannot set an ID that would be unique instead (of removing the ID completely)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The id of each meta is different. But we only get the parameter values once. When unique or first is used, the first run may return meta=[id:1,.. ] but when run again with resume, it may return meta=[id:2,...], possibly due to parallel mechanism. So there is no cache success and it is recognized by nextflow as a new run.
I don't know if I explained it clearly 😂

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is meta.id set? Do we have control? Then we can just set it to a unique value per experiment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta.id = file(filestr).name.take(file(filestr).name.lastIndexOf('.'))
. It is file name

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why does it change then??

Copy link
Collaborator Author

@daichengxin daichengxin Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we still need to remove id for DIA, even if the experience id is added. Because different meta channels may be generated at runtime, resulting in cache failure.

Copy link
Collaborator Author

@daichengxin daichengxin Oct 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first run, the channel emits meta [id: 1,], but when re-run again, the channel maybe emit meta [id: 2,]. Because they are parallel

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think meta should look like this:

[
{mzml_ref: foo, exp_ref: designname, param_a: xyz, param_b:xyz }
{mzml_ref: bar, exp_ref: designname, param_a: xyz, param_b:xyz }
]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we just use param.input as tag for those steps.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can rename id to mzml_id or mzml_ref, to make it clear what it is for

:rtype: float or NoneType
"""
if "X" in seq:
seq = seq.replace("X", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have here multiple sequence possibilities, from the top of my head I remember X, B, Z. I think it is better to create a white list including the 22 amino acids. (ARNDBCEQZGHILKMFPSTWYV).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is also: Is this really completely unknown? Or does it have a mass associated with it as modification for example. Because just counting 0 for an unknown amino acid might be a bit too simple.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In uniprot fasta you don't have anything associated to it. Then we have two options, or we put null in the value, or we assume 0 (which is not ideal but is the only way to compute this in mass spec terms).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But how can DIANN identify it then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea. @vdemichev How DIANN handles the unknown sequences in Uniprot, for example peptides with X. I mean to compute the theoretical mass for example.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X is interpreted as selenomethionine, an unknown symbol like '*' would be interpreted as either zero mass or glycine, not sure. In general, behaviour on nonsense sequences is undefined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daichengxin can you change this accordingly.

@ypriverol ypriverol self-requested a review October 24, 2022 12:34
@ypriverol ypriverol merged commit bd3a3ad into bigbio:dev Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DIANN issue to convert to mzTab when peptides contain X as AA Reuse parameters from other sections in DIA
5 participants