Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A special samplesheet can delete all my sequences #395

Closed
Microbion opened this issue Jul 5, 2024 · 3 comments
Closed

A special samplesheet can delete all my sequences #395

Microbion opened this issue Jul 5, 2024 · 3 comments
Assignees
Labels
bug Something isn't working major

Comments

@Microbion
Copy link

Description of the bug

If the input fasta file ends with .fa suffix, and the sample name is same with the file name without suffix. The module bioawk can clear my input fasta. I think it's a serious question because the lost input files can't be recoved.

Command used and terminal output

nextflow run nf-core-funcscan_1.1.5/1_1_5 \
		 -profile singularity \
		 --outdir funcscan \
		 -params-file funcscan_params.yaml \
		 -with-tower

and the input is (first 5 lines for short):
sample,fasta
R1_1_genome.7,dereplicated_genomes/R1_1_genome.7.fa
R1_2_genome.3,dereplicated_genomes/R1_2_genome.3.fa
R1_2_genome.7,dereplicated_genomes/R1_2_genome.7.fa
R1_3_genome.4,dereplicated_genomes/R1_3_genome.4.fa

Relevant files

The exit status of the task that caused the workflow execution to fail was: 139

Error executing process > 'NFCORE_FUNCSCAN:FUNCSCAN:BIOAWK (R21_3_genome.2)'

Caused by:
Process NFCORE_FUNCSCAN:FUNCSCAN:BIOAWK (R21_3_genome.2) terminated with an error exit status (139)

Command executed:

bioawk
-c fastx '{print ">" $name ORS length($seq)}'
R21_3_genome.2.fa
> R21_3_genome.2.fa

gzip R21_3_genome.2.fa

LONGEST=$(zcat R21_3_genome.2.fa.gz | grep -v '>' | sort -n | tail -n 1)

cat <<-END_VERSIONS > versions.yml
"NFCORE_FUNCSCAN:FUNCSCAN:BIOAWK":
bioawk: 1.0
END_VERSIONS

Command exit status:
139

Command output:
(empty)

Command error:
INFO: Converting SIF file to temporary sandbox...
.command.sh: line 5: 41 Segmentation fault (core dumped) bioawk -c fastx '{print ">" $name ORS length($seq)}' R21_3_genome.2.fa > R21_3_genome.2.fa
INFO: Cleaning up image...

Work dir:
/home/jovyan/work/.nextflow/work/10/d1db29084d8efba80fb88abe75fb1d

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

System information

23.10.0 build 5889
executor: local
container engine: singularity
OS ubuntu20
nf-core/funscan 1.1.5

@Microbion Microbion added the bug Something isn't working label Jul 5, 2024
@jfy133
Copy link
Member

jfy133 commented Jul 5, 2024

Thank you @Microbion !

Indeed that is a major bug and I'm sorry about that!

Bioawk has been removed from the upcoming 2.0 release and replaced with seqkit. I will double check that this does not happen.

I will investigate if I can do a very fast patch release on Monday as 2.0 will still be 2-3 weeks away.

@jfy133 jfy133 self-assigned this Jul 5, 2024
@jfy133 jfy133 added the major label Jul 5, 2024
@jfy133
Copy link
Member

jfy133 commented Jul 5, 2024

So indeed we are safe in upcoming 2.0 as there is a check for such a clash:

https://github.com/nf-core/modules/blob/27e170816808aedbbac23f9a1f2c7488d4b6d342/modules/nf-core/seqkit/seq/main.nf#L31

I'll try to prepare a patch release for existing v1

@jfy133
Copy link
Member

jfy133 commented Jul 8, 2024

Fixed and release in https://github.com/nf-core/funcscan/releases/tag/1.1.6 :)

The pipeline will now stop with an error if the inpout and output files have the same name, so the overwriting can't occur.

@jfy133 jfy133 closed this as completed Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working major
Projects
None yet
Development

No branches or pull requests

2 participants