Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

denoise-paired: expose matchIDs parameter #118

Closed
colinbrislawn opened this issue Jun 11, 2019 · 3 comments
Closed

denoise-paired: expose matchIDs parameter #118

colinbrislawn opened this issue Jun 11, 2019 · 3 comments

Comments

@colinbrislawn
Copy link

Improvement Description
Expose the matchIDs argument of the filterAndTrim() function

Current Behavior
The matchIDs argument of the filterAndTrim() function is False by default.

Proposed Behavior
A flag would be added to qiime2 so users could optionally set the matchIDs argument of the filterAndTrim() to True

Questions
Should we just change this to True by default and not expose the setting?

References
User story:
https://forum.qiime2.org/t/mismatched-forwward-and-reverse-sequence-files-sequences-imported-by-casava-1-8-paired-end-demultiplexed-fastq-format/10181/2

@colinbrislawn
Copy link
Author

I'm gamely interested in working on this issue.

@benjjneb
Copy link
Collaborator

Should we just change this to True by default and not expose the setting?

I would not recommend this because the behavior is non-robust and will probably lead to errors if the id lines are in non-standard-illumina format, even slight modifications can break the parsing.

This is definitely a useful flag when paired Illumina fastqs have been inconsistently modified upstream, but it is a bit of a niche case. I wonder, is there a Q2 plugin focused on filtering fastq files at this point? Something like this functionality would fit there. But adding a flag to denoise-paired would probably be OK too.

@ebolyen
Copy link
Member

ebolyen commented Oct 10, 2019

We are currently thinking that qiime2/q2-types#207 is perhaps a better way to address this issue in 99% of the cases where it will come up (we're not aware of anything that shuffles the reads, so we're assuming that it's a result of filtering in a non-paired-end aware way). These read mismatches cause issues elsewhere, so it would be best to just disallow them in the format.

Now that importing validates the entire contents of the archive, we think we can realistically prevent this situation.

Feel free to re-open if there's other reasons that matchIDs would be useful.

@ebolyen ebolyen closed this as completed Oct 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants