You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When input FASTX file names include a dot (.) that is not a file extension suffix (example: testfile.1.fastq.gz), split_on_adapter will read .1.fastq.gz as the whole suffix, instead of .fastq.gz. Thus, the output file will be called testfile.fastq.gz, instead of testfile.1_split.fastq.gz. This can break processes downstream in pipelines, because the output file name is not as expected when new naming schemes are introduced.
This is due to lines 123-126 in split_on_adapter.py.
Essentially the current code is just overwriting its own addition of '_split' when an unexpected "suffix" occurs.
Accounting for these unexpected suffixes with the --pattern flag can be quite difficult (what would work for this case, assuming there will be more files in the folder named .2.fastq.gz, ..., .600.fastq.gz?), so this seems a pertinent change.
File names that include non-suffix dots can happen due to a variety of reasons. For example, when FASTQ files are split into multiple files with N number of reads in each, for better memory management.
The text was updated successfully, but these errors were encountered:
Thanks @groodri, I would agree with you that it would be sensible to fix this.
In the meantime, if this is an issue that needs an immediate workaround (and for the benefit of other people who may need a fix), please feel free to rename the files like below:
When input FASTX file names include a dot (
.
) that is not a file extension suffix (example:testfile.1.fastq.gz
),split_on_adapter
will read.1.fastq.gz
as the whole suffix, instead of.fastq.gz
. Thus, the output file will be calledtestfile.fastq.gz
, instead oftestfile.1_split.fastq.gz
. This can break processes downstream in pipelines, because the output file name is not as expected when new naming schemes are introduced.This is due to lines 123-126 in
split_on_adapter.py
.For example:
Can be solved with this example:
Essentially the current code is just overwriting its own addition of
'_split'
when an unexpected "suffix" occurs.Accounting for these unexpected suffixes with the
--pattern
flag can be quite difficult (what would work for this case, assuming there will be more files in the folder named.2.fastq.gz, ..., .600.fastq.gz
?), so this seems a pertinent change.File names that include non-suffix dots can happen due to a variety of reasons. For example, when FASTQ files are split into multiple files with N number of reads in each, for better memory management.
The text was updated successfully, but these errors were encountered: