You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In it's current form, fetchngs does not download the relevant files required for re-processing single-cell experiments from the 10X Genomics platforms.
As discussed on the Slack channel, 10X data currently gets downloaded only as a single FastQ file. However, 10X data typically contains the the cell ID and UMI data in Read 1 (~28 bp), Read 2 is the RNA insert (~91 bp). Read 3 tends to be the Illumina multiplexing index (mostly irrelevant as they should all belong to a single sample anyway. Read 1 is flagged as a technical, so it doesn't get included when using fasterq-dump currently, rendering the single-cell experiment into one single big bulk RNA-seq dataset.
Note:
It is also worth noting that the ENA does not serve out technical reads at all, so 10X raw data can only be obtained via the SRA (prefetch, or fasterq-dump + accession).
Here is a description of the bug:
This is the command run by fetchngs with a 10X sample accession SRR9320616:
This is the error message that brings the whole process down:
Unknown method invocation `getName` on ArrayList type
-- Check script '.nextflow/assets/FelixKrueger/fetchngs/./workflows/sra.nf' at line: 128 or see 'nf-62eTOEybyloWFq.log' file for more details
WARN: Failed to publish file: s3://altos-lab-nextflow/scratch/5c32VUHOyVZskM/aa/b062914e17b4b9d68ae187ffb920a7/SRX6088086_SRR9320616_2.fastq.gz; to: s3://testbucket/results/fastq/SRX6088086_SRR9320616_2.fastq.gz [copy] -- See log file for details
It could be really trivial to get the getName() method to work in the new data structure, but I am currently at a loss how to fix it.
Many thanks for your kind attention!
Command used and terminal output
No response
Relevant files
No response
System information
No response
The text was updated successfully, but these errors were encountered:
Description of the bug
In it's current form,
fetchngs
does not download the relevant files required for re-processing single-cell experiments from the 10X Genomics platforms.As discussed on the Slack channel, 10X data currently gets downloaded only as a single FastQ file. However, 10X data typically contains the the cell ID and UMI data in Read 1 (~28 bp), Read 2 is the RNA insert (~91 bp). Read 3 tends to be the Illumina multiplexing index (mostly irrelevant as they should all belong to a single sample anyway. Read 1 is flagged as a technical, so it doesn't get included when using
fasterq-dump
currently, rendering the single-cell experiment into one single big bulk RNA-seq dataset.Note:
It is also worth noting that the ENA does not serve out technical reads at all, so 10X raw data can only be obtained via the SRA (
prefetch
, orfasterq-dump
+ accession).Here is a description of the bug:
This is the command run by fetchngs with a 10X sample accession SRR9320616:
it gives the following output:
This output is arguably useless for single-cell (re-)analysis.
Proposal:
This is the command required for 10X data. It uses both
--split-files
and--include-technical
:It gives the following output:
Read 1 is the cell barcode +UMI:
Read 2 is the RNA insert read:
Read3 is the multiplexing index read (not strictly required but doesn’t hurt, can always be deleted afterwards if desired):
Adding these options to the pipeline, either as config file or straight within the fasterq-dump process works fine.
Download, extraction into 3 files as well as the
pigz
compression appear to have worked well:I have changed the file pattern recognition to:
However the files then never get published, and I suspect it has to do with how the read names are extracted afterwards:
https://github.com/FelixKrueger/fetchngs/blob/62b2bc840b14465a0ff551f614d613a15fdef582/workflows/sra.nf#L120-L132
sra.nf
This is the error message that brings the whole process down:
It could be really trivial to get the getName() method to work in the new data structure, but I am currently at a loss how to fix it.
Many thanks for your kind attention!
Command used and terminal output
No response
Relevant files
No response
System information
No response
The text was updated successfully, but these errors were encountered: