-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sniff_format is inconsistant. #1539
Comments
I also can't find an example of a pipeline running the |
Dear all, I've found a similar problem in a pipeline I wrote starting from nf-core template. For what I understand this problem is raised by this call in if not sniffer.has_header(peek): which accordingly with csv.Sniffer.has_header documentation: This method is a rough heuristic and may produce both false positives and negatives. This is the simplest example I can produce: import io
import csv
test = """sample,fastq_1,fastq_2
200-1-5,1_ID2101_200-1-5-H9H05KWZ-H1_S1_L001_R1_001.fastq.gz,1_ID2101_200-1-5-H9H05KWZ-H1_S1_L001_R2_001.fastq.gz
201-1-9,1_ID2101_201-1-9-H9H05KWZ-A2_S1_L001_R1_001.fastq.gz,1_ID2101_201-1-9-H9H05KWZ-A2_S1_L001_R2_001.fastq.gz
202-1-10,1_ID2101_202-1-10-H9H05KWZ-B2_S1_L001_R1_001.fastq.gz,1_ID2101_202-1-10-H9H05KWZ-B2_S1_L001_R2_001.fastq.gz\n203-1-12,1_ID2101_203-1-12-H9H05KWZ-C2_S1_L001_R1_001.fastq.gz,1_ID2101_203-1-12-H9H05KWZ-C2_S1_L001_R2_001.fastq.gz"""
# read data into array to test with different line combinations
handle = io.StringIO(test)
lines = handle.readlines()
sniffer= csv.Sniffer()
# this will return False
sniffer.has_header("".join(lines)) # False
# however, I can have true with three rows
sniffer.has_header("".join(lines[:3])) # True
# adding a row break the test
sniffer.has_header("".join(lines[:4])) # False
# however is not a problem of 4th row
sniffer.has_header("".join(lines[:1]+lines[3:])) #True The python documentation describe the heuristic behind this function. For what I understand, renaming sample names with numbers solves this problem. I understand that this issue is unpredictable and occurs in very few cases, so I don't like to propose adopting a standard for header format, however could be possible to add a parameter to |
solved in #2194 |
Description of the bug
I'm getting a "no header" error on this csv? It's the default check samplesheet.
CI: https://github.com/nf-osi/viralintegration/runs/6267396130?check_suite_focus=true
Samplesheet: https://github.com/nf-core/test-datasets/blob/viralintegration/samplesheet/samplesheet.csv
https://github.com/nf-osi/viralintegration/blob/dev/bin/check_samplesheet.py
It works with this samplesheet https://github.com/nf-core/test-datasets/blob/rnaseq/samplesheet/v3.4/samplesheet_test.csv
It also works if you remove either of the samples, but not if you have both.
Command used and terminal output
System information
No response
The text was updated successfully, but these errors were encountered: