PR for Release 2.0.0 #105

fmalmeida · 2022-06-09T18:46:01Z

Here as discussed with Alex, I open a PR from dev to master so we can have our last checkings and run full_test to make sure it works before first release.

We will probably still have to fix some tests that run on PRs to master. It seems to be running some tests that are not available or relevant anymore (e.g. -profile test_kallisto).

Feat dsl2

DSL2 version with working alevin workflow

Use filtered gtf

fmalmeida · 2022-06-13T06:18:24Z

The problem saying that genomic coordinates are not available in the DNA fasta is yet persisting using new files.

I was able to overcome this problem when I used the complete ("Genome sequence (GRCh38.p13)") fasta instead of the primary assembly. Both available at: https://www.gencodegenes.org/human/release_32.html

I've been able to test already:

Alevin
Star
Kallisto
Cellranger

fmalmeida · 2022-06-14T08:53:13Z

Also, when I try it with singularity I get the following:

-[nf-core/scrnaseq] Pipeline completed with errors-
Error executing process > 'NFCORE_SCRNASEQ:SCRNASEQ:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet_2.0_full.csv)'

Caused by:
  Process `NFCORE_SCRNASEQ:SCRNASEQ:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet_2.0_full.csv)` terminated with an error exit status (255)

Command executed:

  check_samplesheet.py \
      samplesheet_2.0_full.csv \
      samplesheet.valid.csv
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SCRNASEQ:SCRNASEQ:INPUT_CHECK:SAMPLESHEET_CHECK":
      python: $(python --version | sed 's/Python //g')
  END_VERSIONS

Command exit status:
  255

Command output:
  (empty)

Command error:
  INFO:    Converting SIF file to temporary sandbox...
  INFO:    Cleaning up image...
  FATAL:   container creation failed: failed to add  as session directory: path . is not an absolute path

Work dir:
  /opt/projects/1357_BIP/2022-09_gODMSinglecell/scratch.NOBACKUP/work/56/cb96a32ae75de25046f368e1fe5f10

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Any idea on what would be raising it?

grst · 2022-06-14T11:03:07Z

I tested the workflow on one of my datasets (8 samples of unpublished 10x v3 data from mouse).

cellranger with precomputed reference ✔️
cellranger with gtf/fasta ✔️
kallisto with gtf/fasta ✔️
alevin with gtf/fasta ✔️
starsolo ❌

Starsolo fails with

Command output:
  Jun 14 11:04:54 ..... started STAR run
  Jun 14 11:04:56 ..... loading genome
  Jun 14 11:05:09 ..... processing annotations GTF
  Jun 14 11:05:26 ..... inserting junctions into the genome indices
  Jun 14 11:06:55 ..... started mapping

Command error:
  
  ReadAlignChunk_processChunks.cpp:202:processChunks EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or > 
  
  Jun 14 11:06:57 ...... FATAL ERROR, exiting

I suspect that it might not handle the gzipped fastq files (--readFilesCommand gzip -cdf is missing)? However how is it possible that the CI tests with compressed fastq files passes?

grst · 2022-06-14T11:06:41Z

no idea about the singularity issue, but it seems like a problem with your system rather than the pipeline.
Are you able to run singularity shell docker://<path to image>?

grst · 2022-06-14T11:21:39Z

Also (re-)discovered this issue:
#60

grst · 2022-06-14T11:35:59Z

I suspect that it might not handle the gzipped fastq files (--readFilesCommand gzip -cdf is missing)? However how is it possible that the CI tests with compressed fastq files passes?

It seems that my custom configuration (though not for STAR_ALIGN) was overriding the default settings from modules.config. Possibly related to nextflow-io/nextflow#2422, but I don't think it's an issue with this pipeline.

fmalmeida · 2022-06-14T11:51:10Z

no idea about the singularity issue, but it seems like a problem with your system rather than the pipeline. Are you able to run singularity shell docker://<path to image>?

Yeah, I am thinking that is the issue as well. I am not being able to start the image manually as well.

fmalmeida · 2022-06-15T06:29:53Z

Hi @grst and @apeltzer,

Regarding the error you faced using your data with STARSOLO, I didn't face it using the test_full profile (see screenshot). Maybe something with your data?

And now that this has passed as well, I was able to execute test_full with all aligners as shown in this comment but, to properly work we have to change the available genome in S3 Bucket as I was only able to do it when I change the genome fasta as discussed in the same comment.

Cheers.

ggabernet · 2022-06-15T09:02:06Z

Hi, I can add this reference on the s3 bucket. Just following the docs by cellranger on the references though (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/tutorial_mr) and they recommend using Ensembl references. Did you check if the Ensembl reference as they recommend works? If not, does using this gencode reference also produce comparable results to the provided cellranger indices?
Just wondering if this is the way to go...

fmalmeida · 2022-06-15T10:15:43Z

Hi @ggabernet,
I haven't tried with Ensembl files but I can. I will try it will aligners again so we actually decide which reference files to keep.

But at least with these gencode I was able to test the pipeline in a full_size dataset and it actually succeed to run all 4 aligners, which is great news 🥳

As soon as I check with Ensembl I get back to you:

Alevin
Star
Kallisto
Cellranger

grst · 2022-06-15T11:31:15Z

The problem with STARsolo and the compressed files was a problem with my configuration (and, ultimately, a bug in nextflow, that should be resolved soon).

However #60 is still an issue, I think.

removing repeated definitions of parameters

apeltzer · 2022-06-16T10:40:54Z

Hi! Back from quick leave - can we align on which reference files to use for running the full tests? I was unable to use the ones that 10X said we should use in general, always getting weird failures no matter if I run them from what they provide nor if I create these with the script they provide (same type of error). (2020-A from here https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest) - @grst which ones did you use?

fmalmeida · 2022-06-16T12:22:23Z

Hi Alex, I am using for now the Ensembl ones as Gisela commented to, and so far it is working.

ggabernet · 2022-06-16T12:29:29Z

I'm also doing some tests at the moment, I am also uploading already those 2 to the s3 bucket (Ensembl fasta and GTF) and we can directly do the tests on s3. I think the reason why it was not working for you @apeltzer, might be that when using the mkgtf command as we are doing in the pipeline, then the modifications in the gtf file provided here might not be needed any more. Let's see.

ggabernet · 2022-06-16T12:46:36Z

Updates in this PR: #112

update CI and full size test

Update nextflow.config

apeltzer · 2022-06-16T18:05:54Z

Guess just #60 needs a bugfix, will try to find out what is wrong there and fix it

grst · 2022-06-16T19:34:26Z

@grst which ones did you use?

FWIW, I tried with both

refdata-gex-mm10-2020-A (as downloaded from 10x website) and
GrCH38 primary assembly (fasta) with GENCODE v38 (gtf)

But it seems we have a working solution now anyway?

fmalmeida · 2022-06-17T06:19:37Z

Hi @ggabernet,

As promised previously I am just updating that using the Ensembl files the pipeline ran for all tools.

:)

ggabernet · 2022-06-17T06:39:38Z

yes, the tests passed on all tools as well on AWS with the Ensembl references 🎉

Fix for STAR chemistry issue #60

apeltzer

Looks all good to me 🥳

grst

I admit that I haven't checked all 110 files again, but I'd say we are good to go!
This version is definitely already better than the 1.1.0 release.

apeltzer · 2022-06-17T11:46:25Z

alexthiery1 and others added 30 commits February 18, 2021 09:25

init dsl2 branch with cellranger modules and subworkflow

7822d1f

add test data for cellranger subworkflow

ba4f5b1

add config for module params and workflow testing

1c640c3

init cellranger subworkflow tests

a5f6e98

edit fastq input into cellranger count

bbe48dd

init csv metadata module

e3eec29

add versions and emit statements to modules

528977a

initial refactoring

552e79c

initial refactoring

9e8eea1

started adding modules and building pipeline

4cf0282

added more modules

107acff

move star to local modules

4793ddb

refactored into different workflows

a8f33ba

Merge remote-tracking branch 'origin/dev' into feat-dsl2

cd0a61b

Merge pull request #1 from KevinMenden/feat-dsl2

e125b91

Feat dsl2

added local alevin modules

e0a5b07

half-working alevin dsl2 version

271d07d

more updates

404b1a9

added multiqc, get software versions

2ff52d1

added workflow groovy class

4924218

working alevin version

d14c7a7

removed 'type' parameter

d017531

added whitelist to salmon_alevin

92f045f

added completion and utils classes

19cc78f

updated schema; started with star

fa8009f

working STARsolo pipeline

01bb273

added multiqc to starsolo; fixed bug

9c9c649

Merge pull request #55 from KevinMenden/dev

5e86a46

DSL2 version with working alevin workflow

started with kb pipeline

3782c25

working kallistobustools version

7a71690

Update test_full.config

27fdb8b

Use filtered gtf

Update nextflow.config

8766c4a

removing repeated definitions of parameters

update CI and full size test

ec6628f

ggabernet and others added 2 commits June 16, 2022 16:49

Merge pull request #112 from ggabernet/dev

3698303

update CI and full size test

Merge pull request #111 from nf-core/removing-repeatitions

e4447fd

Update nextflow.config

apeltzer and others added 2 commits June 17, 2022 08:29

Fix for #60

c73f5cd

Merge pull request #113 from nf-core/fix-for-60

b2072d5

Fix for STAR chemistry issue #60

apeltzer approved these changes Jun 17, 2022

View reviewed changes

grst approved these changes Jun 17, 2022

View reviewed changes

apeltzer merged commit 0bf83a8 into master Jun 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR for Release 2.0.0 #105

PR for Release 2.0.0 #105

fmalmeida commented Jun 9, 2022 •

edited

Loading

fmalmeida commented Jun 13, 2022 •

edited

Loading

fmalmeida commented Jun 14, 2022

grst commented Jun 14, 2022

grst commented Jun 14, 2022

grst commented Jun 14, 2022

grst commented Jun 14, 2022

fmalmeida commented Jun 14, 2022

fmalmeida commented Jun 15, 2022

ggabernet commented Jun 15, 2022

fmalmeida commented Jun 15, 2022 •

edited

Loading

grst commented Jun 15, 2022

apeltzer commented Jun 16, 2022

fmalmeida commented Jun 16, 2022

ggabernet commented Jun 16, 2022

ggabernet commented Jun 16, 2022 •

edited

Loading

apeltzer commented Jun 16, 2022

grst commented Jun 16, 2022

fmalmeida commented Jun 17, 2022

ggabernet commented Jun 17, 2022

apeltzer left a comment

grst left a comment

apeltzer commented Jun 17, 2022

PR for Release 2.0.0 #105

PR for Release 2.0.0 #105

Conversation

fmalmeida commented Jun 9, 2022 • edited Loading

fmalmeida commented Jun 13, 2022 • edited Loading

fmalmeida commented Jun 14, 2022

grst commented Jun 14, 2022

grst commented Jun 14, 2022

grst commented Jun 14, 2022

grst commented Jun 14, 2022

fmalmeida commented Jun 14, 2022

fmalmeida commented Jun 15, 2022

ggabernet commented Jun 15, 2022

fmalmeida commented Jun 15, 2022 • edited Loading

grst commented Jun 15, 2022

apeltzer commented Jun 16, 2022

fmalmeida commented Jun 16, 2022

ggabernet commented Jun 16, 2022

ggabernet commented Jun 16, 2022 • edited Loading

apeltzer commented Jun 16, 2022

grst commented Jun 16, 2022

fmalmeida commented Jun 17, 2022

ggabernet commented Jun 17, 2022

apeltzer left a comment

Choose a reason for hiding this comment

grst left a comment

Choose a reason for hiding this comment

apeltzer commented Jun 17, 2022

fmalmeida commented Jun 9, 2022 •

edited

Loading

fmalmeida commented Jun 13, 2022 •

edited

Loading

fmalmeida commented Jun 15, 2022 •

edited

Loading

ggabernet commented Jun 16, 2022 •

edited

Loading