Enable download of files form 10X Genomics experiments #145

FelixKrueger · 2023-04-25T12:40:18Z

PR checklist

This comment contains a description of changes (with reason).
If you've fixed a bug or added code that should be tested, add tests!
If you've added a new tool - have you followed the pipeline conventions in the contribution docs- [ ] If necessary, also make a PR on the nf-core/fetchngs branch on the nf-core/test-datasets repository.
Make sure your code lints (nf-core lint).
Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
CHANGELOG.md is updated.
README.md is updated (including new tool citations and authors/contributors).

We have previously opened an issue because, currently, fetchngs fails to download data from 10X genomics experiments (#144). The issue in question has a lot more details of what goes wrong, and how to fix it.

Changes

In essence, we have changed the following things (for the PREFETCH_FASTERQDUMP_SRATOOLS workflow only):

increased the number of files that can be downloaded from 1 (single-end) or 2 (paired-end), to potentially 3 or 4 FastQ file (e.g. single or dual indexing when the index reads are marked as 'technical reads'
added the options --split-files and --include-technical to fasterq-dump
changed the way file names are recognised in the main workflow (the data structure changes to a list object if more than 3 or more files were present

Tests

We have carried out tests using single-end as well as paired-end files, both using the ENA (default) and SRATOOLS options; the pipelines and resulting files are all in working order, as before.

For a 10X genomics test dataset, the new version results in 3 output files (see #144 for additional details) using the SRATOOLS route. In ENA mode, only a single bulk (and meaningless) file is produced, as before.

I am afraid I am not able to add any meaningful CI tests (don't really know how to), but maybe you would be able to find a minimal test case that works?

NOTE:

We have not changed anything for the workflow downloading data from the ENA (which is the default of fetchngs). The ENA does not serve out read that are marked as 'technical' at all, so all 10X Genomics data will appear as a single FastQ file - which means that the cell-ID and UMI read is missing. Thus, for 10X data you have to force downloads via the sratoolkit route - or end up with one single, bulk file.

Many thanks to @wzheng0520 for figuring this out, and the nf-core community for their constant support!

Dev -> Master for 1.4 release

Dev -> Master for 1.5 release

Dev -> Master for 1.6 release

Dev -> Master for v1.7 release

Dev -> Master for v1.8 release

Dev -> Master for v1.9 release

maxulysse · 2023-04-25T12:44:46Z

modules/nf-core/sratools/fasterqdump/main.nf

+    fastq = meta.single_end ? '*.fastq.gz' : '*_{1,2,3,4}.fastq.gz'
    def outfile = meta.single_end ? "${prefix}.fastq" : prefix
    """
    export NCBI_SETTINGS="\$PWD/${ncbi_settings}"

    fasterq-dump \\
        $args \\
+        --split-files \\
+        --include-technical \\


modules should be modified in nf-core modules, or patched in the pipeline.

maxulysse · 2023-04-25T12:45:31Z

how is your PR coming from your master and not your dev branch?

FelixKrueger · 2023-04-25T14:35:32Z

Hmm, it seems I only have a master branch in my private fork....

drpatelh · 2023-04-26T01:17:30Z

Will be fixed in #146

drpatelh and others added 14 commits November 9, 2021 15:26

Merge pull request nf-core#54 from nf-core/dev

0c43cc7

Dev -> Master for 1.4 release

Merge pull request nf-core#59 from nf-core/dev

c318ae1

Dev -> Master for 1.5 release

Merge pull request nf-core#92 from nf-core/dev

7b7ab2f

Dev -> Master for 1.6 release

Merge pull request nf-core#103 from nf-core/dev

b79cde2

Dev -> Master for v1.7 release

Merge pull request nf-core#126 from nf-core/dev

249210f

Dev -> Master for v1.8 release

Merge pull request nf-core#135 from nf-core/dev

084e5ef

Dev -> Master for v1.9 release

added 3 and 4 to pattern match

b76749d

added required options for fasterq-dump

62b2bc8

add one more file

e3ced4f

test

44bc4b1

Include Winnie's changes instanceoff List

5b256a4

change

b259c61

update

72f9ea4

Added 10X comment to CHANGELOG

71a8753

FelixKrueger requested a review from drpatelh April 25, 2023 12:42

maxulysse reviewed Apr 25, 2023

View reviewed changes

drpatelh closed this Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable download of files form 10X Genomics experiments #145

Enable download of files form 10X Genomics experiments #145

FelixKrueger commented Apr 25, 2023

maxulysse Apr 25, 2023

maxulysse commented Apr 25, 2023

FelixKrueger commented Apr 25, 2023

drpatelh commented Apr 26, 2023 •

edited

Loading

Enable download of files form 10X Genomics experiments #145

Enable download of files form 10X Genomics experiments #145

Conversation

FelixKrueger commented Apr 25, 2023

PR checklist

Changes

Tests

NOTE:

maxulysse Apr 25, 2023

Choose a reason for hiding this comment

maxulysse commented Apr 25, 2023

FelixKrueger commented Apr 25, 2023

drpatelh commented Apr 26, 2023 • edited Loading

drpatelh commented Apr 26, 2023 •

edited

Loading