Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain sample names produce a missing header error on INPUT_CHECK:SAMPLESHEET_CHECK step #163

Closed
cnluzon opened this issue May 9, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@cnluzon
Copy link

cnluzon commented May 9, 2023

Description of the bug

Hi! First of all thanks for developing and maintaining the nf-core/hic pipeline!

I have got a very strange issue that has driven me crazy for a while. Maybe there is something very obvious I am not seeing, so if that is the case, please disregard all of this 😅

After a lot of trial and error I have isolated how to reproduce it, but I am still a bit puzzled as to why exactly it happens. It seems to have something to do with similar group names, which I see is not necessarily ideal, but there are many circumstances I can think of where this would happen, so that is why I thought it would be useful to report. If there is a reason why not to allow names like this in the downstream process, then I would hope for a more informative error message.

So my minimal reproducible design table example (design_error.csv in the attached zip) looks like this:

sample,fastq_1,fastq_2
group_1_mES,./fastq/group_1_1.fastq.gz,./fastq/group_1_2.fastq.gz
group_10_mES,./fastq/group_10_1.fastq.gz,./fastq/group_10_2.fastq.gz

And in my ./fastq directory I have in fact those files:

➜ ls fastq/ -1
group_10_1.fastq.gz
group_10_2.fastq.gz
group_1_1.fastq.gz
group_1_2.fastq.gz

I run the pipeline and I get an error in the NFCORE_HIC:HIC:INPUT_CHECK:SAMPLESHEET_CHECK process that reads:

[CRITICAL] The given sample sheet does not appear to contain a header. 

Now the interesting part is that if I change ever so slightly the naming of the groups (design_success.csv in the attached zip):

sample,fastq_1,fastq_2
group_01_mES,./fastq/group_1_1.fastq.gz,./fastq/group_1_2.fastq.gz
group_10_mES,./fastq/group_10_1.fastq.gz,./fastq/group_10_2.fastq.gz

Note that I only added a leading zero in the first line, so sample name group_1_mES now is group_01_mES.

Success! It is running:

executor >  local (3)
[55/5cb224] process > NFCORE_HIC:HIC:INPUT_CHECK:SAMPLESHEET_CHECK (design.csv)  [100%] 1 of 1 ✔

I have reproduced this locally with docker on my computer, but I got the same exact error on uppmax with -profile uppmax option

Command used and terminal output

nextflow run nf-core/hic -profile docker --outdir ./mydata -r 2.0.0 --input design_error.csv --digestion mboi --genome mm10

ERROR ~ Error executing process > 'NFCORE_HIC:HIC:INPUT_CHECK:SAMPLESHEET_CHECK (design.csv)'                                                                                                                                                                 
                                                                                                                                                                                                                                                              
Caused by:                                                                                                                                                                                                                                                    
  Process `NFCORE_HIC:HIC:INPUT_CHECK:SAMPLESHEET_CHECK (design.csv)` terminated with an error exit status (1)                                                                                                                                                
                                                                                                                                                                                                                                                              
Command executed:                                                                                                              
                                                                                                                               
  check_samplesheet.py \                                                                                                       
      design.csv \                                                                                                             
      samplesheet.valid.csv                                                                                                    
                                                                                                                               
  cat <<-END_VERSIONS > versions.yml                                                                                           
  "NFCORE_HIC:HIC:INPUT_CHECK:SAMPLESHEET_CHECK":                                                                              
      python: $(python --version | sed 's/Python //g')                                                                         
  END_VERSIONS                                                                                                                 
                                                                                                                               
Command exit status:                                                                                                           
  1                                                                                                                            
                                                                                                                               
Command output:                                                                                                                
  (empty)                                                                                                                      
                                                                                                                               
Command error:                                                                                                                 
  [CRITICAL] The given sample sheet does not appear to contain a header.          
                                                                                                                               
Work dir:                                                                                                                      
  /home/carmen/work/experiments/230509_reproduce_error/work/b7/09d1541d09cf52115ea54ccb0ff0f2
                                                                                                                               
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
                                                                                                                               
 -- Check '.nextflow.log' file for details

Relevant files

mre_data.zip
nextflow.log

System information

Nextflow version 23.04.1
Desktop Dell Precision 5820 Tower
Executor: local
Container engine: Docker
OS: Ubuntu 22.04.2 LTS
(but also HPC - Uppmax + Singularity)
nf-core/hic version 2.0.0

@cnluzon cnluzon added the bug Something isn't working label May 9, 2023
@cnluzon
Copy link
Author

cnluzon commented May 9, 2023

Sorry, I just realised this seems to be the same issue as #152 , feel free to close it if it is redundant.

@nservant
Copy link
Collaborator

yes, this is related to the nf-core template and has been fixed recently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants