Releases: lilab-bcb/cumulus
2.1.0
New Features
- Add CellBender workflow for ambient RNA removal.
- CellRanger:
- For ATAC-Seq data, add
ARC-v1
chemistry keyword for analyzing only the ATAC part of 10x multiome data. See CellRanger scATAC-seq sample sheet section for details. - For antibody/hashing/citeseq/crispr data, add
multiome
chemistry keyword for the feature barcoding on 10x multiome data.
- For ATAC-Seq data, add
- STARsolo:
- In workflow output, besides
mtx
format gene-count matrices, the workflow also generates matrices in 10x-compatiblehdf5
format.
- In workflow output, besides
Improvements
- CellRanger: For antibody/hashing/citeseq/crispr data,
- cumulus_feature_barcoding
v0.9.0+
now supports multi-threading and faster gzip file I/O.
- cumulus_feature_barcoding
- Workflows check if
output_directory
is a valid Cloud URI based on the givenbackend
value before execution. (Feature request #322 )
Updates
- Genome Reference:
- Add Cellranger VDJ v7.0.0 genome references:
GRCh38_vdj_v7.0.0
andGRCm38_vdj_v7.0.0
in CellRanger scIR-seq sample sheet section.
- Add Cellranger VDJ v7.0.0 genome references:
- Default version upgrade:
2.0.0
Overall:
-
Cumulus workflows are now released on Dockstore:
- Add the tutorial on importing Cumulus workflows to Terra.
- Archive the legacy versions on Broad Method Registry.
-
Add support on multiple platforms via backend input:
gcp
for Google Cloud,aws
for Amazon AWS,local
for local machine. Enable Google Cloud support by default. -
For Amazon AWS backend, add awsMaxRetries input to set the maximum retries allowed for job execution at runtime. By default, use
5
. -
Update the command-line job submission tutorial to work with Altocumulus v2.0.0 or later.
-
On Examples:
- Update gene expression, hashing and CITE-Seq example tutorial.
- Add tutorial on 10x CellPlex analysis using Cumulus workflows on Cloud.
Workflow-specific:
-
Add STARsolo_create_reference workflow to build genome references for STARsolo counting. See its documentation for details.
-
On Cellranger workflow:
- Add support for 10x Cell Ranger version
6.1.1
and6.1.2
, and use6.1.2
by default. See Cell Ranger v6.1 release notes. - Add support for 10x Cell Ranger ARC version
2.0.1
, and use it by default. See Cell Ranger ARC v2.0 release notes for the release notes. - Upgrade cumulus_feature_barcoding to version
0.7.0
to allow manually set barcode starting position (via input crispr_barcode_pos). - Add support for non 10x CRISPR assays. See the description of
crispr
DataType value in this section for details. - For input data consisting of fastq files, it's able to handle folder structure of both flat (all fastq files in one folder) and nested (one subfolder per sample listed in the input sample sheet) forms.
- Add fastq_outputs to workflow output, which contains mkfastq step output folders for samples listed in the input sample sheet.
- Add count_outputs to workflow output, which contains count step output folderrs for samples listed in the input sample sheet.
- Add support for 10x Cell Ranger version
-
On Spaceranger workflow:
- Add support for 10x Space Ranger version
1.3.0
and1.3.1
, and use1.3.1
by default. See Space Ranger v1.3 release notes for the release notes. - For input data consisting of fastq files, it's able to handle folder structure of both flat (all fastq files in one folder) and nested (one subfolder per library) forms.
- Add output section for the workflow. See here for details.
- Retire old genome references:
- Keep
GRCh38-2020-A
andmm10-2020-A
. - Retire
GRCh38
,mm10
,GRCh38-2020-A-premrna
andmm10-2020-A-premrna
. Users can still reach out to Cumulus team to ask for URIs to these old references, but they are not provided by default.
- Keep
- In the description of ReorientImages field of input sample sheet, add the information on its valid values.
- Add support for 10x Space Ranger version
-
On STARsolo workflow:
- Add support for STAR version
2.7.9a
, and use it by default. See STAR v2.7.9a release notes for the release notes. - Reorganize the workflow by exposing more inputs to users.
- Add support on more protocols: 10x multiome, 10x 5' (both SC5P-R2 and SC5P-PE), Slide-Seq and Share-Seq. See here for details.
- Use input read1_fastq_pattern and read2_fastq_pattern to support fastq files generated by Cell Ranger or SeqWell, as well as Sequence Read Archive (SRA) data.
- For input data consisting of fastq files, it's able to handle folder structure of both flat (all fastq files in one folder) and nested (one subfolder per library) forms.
- Do not attach filename prefix to output files to avoid the incorrect SJ raw feature.tsv symlink error, which would cause the folder delocalization fail. (see discussion with STAR team)
- Add STAR log file to workflow output. This is the Log.out file if running STAR locally, which can be used for tracking the process and sharing with STAR team when opening an issue there.
- Retire old genome references:
- Keep
GRCh38-2020-A
,mm10-2020-A
, andGRCh38-and-mm10-2020-A
. - Retire old references listed here. Users can still reach out to Cumulus team to ask for URIs to them, but they are not provided by default.
- Keep
- Add support for STAR version
-
On Demultiplexing workflow:
- Upgrade demuxEM to version
0.1.7
for bug fix.
- Upgrade demuxEM to version
-
On Cellranger_create_reference workflow:
- Add the generated reference file to the workflow output.
- Bug fix in using input memory.
- Update documentation to suggest only using Cell Ranger version
6.1.1
or later for building reference, as v6.0.1 has issues which leave the job running without terminating.
-
On Cellranger_atac_create_reference workflow:
- Add the generated reference file to the workflow output.
-
On Cellranger_vdj_create_reference workflow:
- Add the generated reference file to the workflow output.
1.5.0
- On demultiplexing workflow
- Update demuxEM to
v0.1.6
.
- Update demuxEM to
- On cumulus workflow
- Add Nonnegative Matrix Factorization (NMF) feature:
run_nmf
andnmf_n
inputs. - Add integrative NMF (iNMF) data integration method:
inmf
option incorrection_method
input; the number of expected factors is also specified bynmf_n
input. - When NMF or iNMF is enabled, word cloud plots and gene program UMAP plots of NMF/iNMF results will be generated.
- Update Pegasus to
v1.4.2
.
- Add Nonnegative Matrix Factorization (NMF) feature:
1.4.0
- On cellranger workflow
- Add support for multiomics analysis using linked samples.
cellranger-arc count
,cellranger multi
andcellranger count
will be automatically triggered based on the sample sheet - Add support for
cellranger
version 6.0.1 and 6.0.0 - Add support for
cellranger-arc
version 2.0.0, 1.0.1, 1.0.0 - Add support for
cellranger-atac
version 2.0.0 - Add support for
cumulus_feature_barcoding
version 0.6.0, which handles CellPlex CMO tags - Add
GRCh38-2020-A_arc_v2.0.0
,mm10-2020-A_arc_v2.0.0
,GRCh38-2020-A_arc_v1.0.0
andmm10-2020-A_arc_v1.0.0
references forcellranger-arc
. - Fixed bugs in
cellranger_atac_create_reference
- Add delete undetermined FASTQs option for
mkfastq
- Add support for multiomics analysis using linked samples.
- On demultiplexing workflow
- Replace
demuxlet
withpopscle
, which includes both demuxlet and freemuxlet
- Replace
- On cumulus workflow
- Fix bug that
remap_singlets
andsubset_singlets
don't work when input is in sample sheet format.
- Fix bug that
- Modified workflows to remove trailing spaces and support spaces within output_directory
1.3.0
1.2.0
- Add spaceranger workflow:
- Wrap up spaceranger version 1.2.1
- On cellranger workflow:
- Fix workflow WDL to support both single index and dual index
- Add support for cellranger version 5.0.0 and 5.0.1
- Add support for targeted gene expression analysis
- Add support for
–-include-introns
and–-no-bam
options for cellranger count - Remove
–-force-cells
option for cellranger vdj as noted in cellranger 5.0.0 release note - Add
GRCh38_vdj_v5.0.0
andGRCm38_vdj_v5.0.0
references
- Bug fix on cumulus workflow.
- Reorganize the sidebar of Cumulus documentation website.
1.1.0
- On cumulus workflow:
- Add CITE-Seq data analysis back. (See section Run CITE-Seq analysis for details)
- Add doublet detection. (See
infer_doublets
,expected_doublet_rate
, anddoublet_cluster_attribute
input fields) - For tSNE visualization, only support FIt-SNE algorithm. (see
run_tsne
andplot_tsne
input fields) - Improve efficiency on log-normalization and DE tests.
- Support multiple marker JSON files used in cell type annotation. (see
organism
input field) - More preset gene sets provided in gene score calculation. (see
calc_signature_scores
input field)
- Add star_solo workflow (see STARsolo section for details):
- Use STARsolo to generate count matrices from FASTQ files.
- Support chemistry protocols such as 10X-V3, 10X-V2, DropSeq, and SeqWell.
- Update the example of analyzing hashing and CITE-Seq data (see Example section) with the new workflows.
- Bug fix.
1.0.0
- Add
demultiplexing
workflow for cell-hashing/nucleus-hashing/genetic-pooling analysis. - Add support for CellRanger version 4.0.0.
- Update cumulus workflow with Pegasus version 1.0:
- Use
zarr
file format to handle data, which has a better I/O performance in general. - Support focus analysis on Unimodal data, and appending other Unimodal data to it. (
focus
andappend
inputs in cluster step). - Quality-Control: Change
percent_mito
default from 10.0 to 20.0; by default remove bounds on UMIs (min_umis
andmax_umis
inputs in cluster step). - Quality-Control: Automatically figure out name prefix of mitochondrial genes for
GRCh38
andmm10
genome reference data. - Support signature / gene module score calculation. (
calc_signature_scores
input in cluster step) - Add
Scanorama
method to batch correction. (correction_method
input in cluster step). - Cell embedings: by default calculate UMAP embedding, instead of FIt-SNE.
- Differential Expression (DE) analysis: remove inputs
mwu
andauc
as they are calculated by default. And cell-type annotation uses MWU test result by default.
- Use
- Stop supporting
cumulus_subcluster
workflow.
0.15.0
- Update all workflows to OpenWDL version 1.0.
- Cumulus now supports multi-job execution from Terra data table input.
- Cumulus generates Cirrocumulus input in
.cirro
folder, instead of a huge.parquet
file. - Fix bug in Cumulus WDL on using user-specified marker JSON file for annotating cell types.
- Fix bugs.