Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for unique lanes in sample when parsing TSV files from FASTQ samples #93

Closed
maxulysse opened this issue Jan 27, 2020 · 3 comments
Assignees
Labels
enhancement New feature or request input validation
Milestone

Comments

@maxulysse
Copy link
Member

maxulysse commented Jan 27, 2020

Issue by @maxulysse, moved from SciLifeLab#626

We should have a warning when at least one sample has more than 2 FASTQ with the same lane.
Right now, it's failing at the merging BAMs step.

Describe the solution you'd like

Add some checks when parsing the TSV file

Describe alternatives you've considered

No alternatives considered at the moment

Additional context

Issue discovered by Katja when running some WES project

Issue by @nicweb, moved from SciLifeLab#714

When there is no lane information given in the sample.tsv markduplicates will fail with

htsjdk.samtools.SAMFormatException: Error parsing SAM header. Problem
parsing @rg key:value pair.

as readgroup ID is empty.

Would be nice to check TSV before execution or autoset empty lane with sampleID or similar to avoid downstream execution halt, or update documentation that field should not be empty.

@maxulysse maxulysse added the enhancement New feature or request label Jan 27, 2020
@maxulysse maxulysse self-assigned this Jan 27, 2020
@maxulysse maxulysse added this to the 2.6 milestone Feb 4, 2020
@maxulysse maxulysse modified the milestones: 2.6, 3.0 Feb 28, 2020
@FriederikeHanssen
Copy link
Contributor

I'll reopen it, this was not actually closed.

@asp8200
Copy link
Contributor

asp8200 commented Jul 6, 2022

Message from @maxulysse in Slack-thread Sarek-3.0 (https://nfcore.slack.com/archives/C02MDBZAYJK/p1656668331195689): for the uniqueness of lanes and samples.
For a single patient, we can have several times the same sample as long as they have different lanes
And I think a sample should be unique (we should not have the same sample in different patients)

@asp8200
Copy link
Contributor

asp8200 commented Jul 6, 2022

This GitHub-issue seem to consists of two parts (SciLifeLab#626 and SciLifeLab#714). The latter part may already have been fixed. I tried running dev-Sarek with an input csv-like this:

patient,sex,sample,lane,fastq_1,fastq_2
NA12878,XX,NA12878,,/faststorage/home/aspe/test_data/85gnb0878-NA12878-DNA_Blood-WGS_v2_SUB_R1.fastq.gz,/faststorage/home/aspe/test_data/85gnb0878-NA12878-DNA_Blood-WGS_v2_SUB_R2.fastq.gz

that is, one where the lane-field is just empty, and Sarek crashed with the error message :

WARN  nextflow.Nextflow - Missing or unknown field in csv file header. Please check your samplesheet

I got the same error-message when removing the lane-column from the csv-file:

patient,sex,sample,fastq_1,fastq_2
NA12878,XX,NA12878,/faststorage/home/aspe/test_data/85gnb0878-NA12878-DNA_Blood-WGS_v2_SUB_R1.fastq.gz,/faststorage/home/aspe/test_data/85gnb0878-NA12878-DNA_Blood-WGS_v2_SUB_R2.fastq.gz

Isn't this behaviour okay?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request input validation
Projects
None yet
Development

No branches or pull requests

4 participants