Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STAR always uses V2 chemistry #60

Closed
2 tasks done
jeremyadamsfisher opened this issue Jun 29, 2021 · 5 comments
Closed
2 tasks done

STAR always uses V2 chemistry #60

jeremyadamsfisher opened this issue Jun 29, 2021 · 5 comments
Labels
bug Something isn't working

Comments

@jeremyadamsfisher
Copy link

jeremyadamsfisher commented Jun 29, 2021

Check Documentation

I have checked the following places for your error:

Description of the bug

STARsolo uses 10X-V2 chemistry, regardless of what is specified.

Steps to reproduce

Steps to reproduce the behaviour:

  1. Run nextflow run nf-core/scrnaseq -r 1.1.0 -params-file nf-params.json

nf-params.json

{
    "chemistry": "V3",
    "input": "./data/*_{1,2}.fastq.gz",
    "fasta": "./data/genome.fa",
    "gtf": "./data/genes.gtf",
    "aligner": "star",
}

Where: genome.fa is from http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M27/GRCm39.genome.fa.gz; genes.gtf is from http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M27/gencode.vM27.annotation.gtf.gz; and the fastq files are from https://www.ncbi.nlm.nih.gov/sra/?term=SRR14597268

[13/99028a] process > get_software_versions     [100%] 1 of 1 ✔
[a5/5fcfb4] process > unzip_10x_barcodes (V3)   [100%] 1 of 1 ✔
[-        ] process > extract_transcriptome     -
[-        ] process > build_salmon_index        -
[8e/00790d] process > makeSTARindex (genome.fa) [100%] 1 of 1 ✔
[-        ] process > build_kallisto_index      -[-        ] process > build_gene_map            -
[-        ] process > build_txp2gene            -
[-        ] process > alevin                    -[-        ] process > alevin_qc                 -
[bd/676904] process > star (SRR14597268_1)      [100%] 2 of 2, failed: 2, retri..
[-        ] process > kallisto                  -[-        ] process > bustools_correct_sort     -
[-        ] process > bustools_count            -[-        ] process > bustools_inspect          -
[-        ] process > multiqc                   -
[ae/de0382] process > output_documentation      [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/scrnaseq] Pipeline completed with errors-
[8a/4103a7] NOTE: Process `star (SRR14597268_1)` terminated with an error exit status (104) -- Execution is retried (1)
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Error executing process > 'star (SRR14597268_1)'

Caused by:
  Process requirement exceed available memory -- req: 128 GB; avail: 124.4 GB

Command executed:

  STAR --genomeDir star \
        --sjdbGTFfile genes.gtf \
        --readFilesIn SRR14597268_2.fastq.gz SRR14597268_1.fastq.gz  \
        --runThreadN 10 \
        --twopassMode Basic \
        --outWigType bedGraph \
        --outSAMtype BAM SortedByCoordinate --limitBAMsortRAM 137338953472 \
        --readFilesCommand zcat \
        --runDirPerm All_RWX \
        --outFileNamePrefix SRR14597268_1  \
        --soloType Droplet \
        --soloCBwhitelist 10x_V3_barcode_whitelist
  
  samtools index SRR14597268_1Aligned.sortedByCoord.out.bam

Command exit status:
  -

Command output:
  Jun 29 21:52:14 ..... started STAR run
  Jun 29 21:52:15 ..... loading genome
  Jun 29 21:52:30 ..... processing annotations GTF
  Jun 29 21:52:39 ..... inserting junctions into the genome indices
  Jun 29 21:54:04 ..... started 1st pass mapping
  Jun 29 21:54:05 ..... finished 1st pass mapping
  Jun 29 21:54:05 ..... inserting junctions into the genome indices
  Jun 29 21:55:34 ..... started mapping

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  
  EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 28 not equal to expected 26
  Read ID=@SRR14597268.1 1 N 0   Sequence=CAGGCNAGTCCAACGCCCTTCTGCCTTT
  SOLUTION: make sure that the barcode read is the second in --readFilesIn and check that is has the correct formatting
            If UMI+CB length is not equal to the barcode read length, specify barcode read length with --soloBarcodeReadLength
  
  Jun 29 21:55:35 ...... FATAL ERROR, exiting

Expected behaviour

According to the STAR readme,

The default barcode lengths (CB=16b, UMI=10b) work for 10X Chromium V2. For V3, specify:

--soloUMIlen 12

This option is not specified by the pipeline. The STAR script should differ by the chemistry, as per https://www.biostars.org/p/462568/

10x v1

Whitelist, 737K-april-2014_rc.txt
CB length, 14
UMI start, 15
UMI length, 10 (courtesy ATpoint)

10X v2

Whitelist, 737K-august-2016.txt
CB length, 16
UMI start, 17
UMI length, 10

10x v3

Whitelist, 3M-Feb_2018_V3.txt
CB length, 16
UMI start, 17
UMI length, 12

Log files

nextflow.log

System

  • Hardware: AWS r5.4xlarge
  • Executor: local
  • OS: ubuntu
  • Version 20.04

Nextflow Installation

  • Version: 21.04.1.5556

Container engine

  • Engine: Docker
  • version: Docker version 20.10.7, build f0df350
  • Image tag: nfcore/scrnaseq:1.1.0

Additional context

Would be happy to write a PR

@jeremyadamsfisher jeremyadamsfisher added the bug Something isn't working label Jun 29, 2021
@grst
Copy link
Member

grst commented Mar 7, 2022

This should be fixed in the latest dev version.

@apeltzer apeltzer closed this as completed Jun 8, 2022
@grst grst reopened this Jun 14, 2022
@grst
Copy link
Member

grst commented Jun 14, 2022

This is actually NOT fixed in the latest dev version, as I get

EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 28 not equal to expected 26                                                                                                                                                            
Read ID=@A01174:218:HKWM7DSX2:4:1101:1036:1063 ;  Sequence=CTCATTACACGTACATGCGGGTTTGCCG                                                                                                                                                                                           
SOLUTION: check the formatting of input read files.                                                                                                                                                                                                                               
If UMI+CB length is not equal to the barcode read length, specify barcode read length with --soloBarcodeReadLength                                                                                                                                                                
To avoid checking of barcode read length, specify --soloBarcodeReadLength 0                                                                                                                                                                                                       
Jun 14 13:07:28 ...... FATAL ERROR, exiting                                        

with a v3 library.

(v3 has 28nt barcode+umi, compared to 26 in v2)

@apeltzer
Copy link
Member

We should get this fix into 2.0.0 too in my opinion :-(

@grst
Copy link
Member

grst commented Jun 16, 2022

The default barcode lengths (CB=16b, UMI=10b) work for 10X Chromium V2. For V3, specify:

--soloUMIlen 12

(https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#running-starsolo-for-10x-chromium-scrna-seq-data)

I think this parameter is missing. Should be somehow generated by the java code in the lib folder.

apeltzer added a commit that referenced this issue Jun 17, 2022
apeltzer added a commit that referenced this issue Jun 17, 2022
Fix for STAR chemistry issue #60
@apeltzer
Copy link
Member

Should be fixed in #113 now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants