Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lk add starsolo barcode metrics #1044

Merged
merged 29 commits into from
Jul 21, 2023
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
b50dc6f
Update StarAlign.wdl
ekiernan Jul 7, 2023
f6f68cc
Update StarAlign.wdl
ekiernan Jul 10, 2023
a46feeb
Update StarAlign.wdl
ekiernan Jul 10, 2023
7669519
Update StarAlign.wdl
ekiernan Jul 14, 2023
a38dd02
Update StarAlign.wdl
ekiernan Jul 15, 2023
6fa2a02
fixed duplicate variable name
ekiernan Jul 17, 2023
3414772
Gathering cell_reads
ekiernan Jul 17, 2023
ff95806
Update Optimus.wdl
ekiernan Jul 17, 2023
1619773
added commas for cell_reads
ekiernan Jul 17, 2023
351fdf4
Update StarAlign.wdl
ekiernan Jul 17, 2023
017c839
Update StarAlign.wdl
ekiernan Jul 17, 2023
32b27ec
Update StarAlign.wdl
ekiernan Jul 17, 2023
16d005f
added aligner metrics output to Optimus
ekiernan Jul 17, 2023
cacadd9
Update StarAlign.wdl
ekiernan Jul 17, 2023
8143147
Update StarAlign.wdl
ekiernan Jul 17, 2023
d9196a7
Update StarAlign.wdl
ekiernan Jul 17, 2023
8b3baab
Update Optimus.wdl
ekiernan Jul 17, 2023
9930db0
Update StarAlign.wdl
ekiernan Jul 17, 2023
792607e
Update StarAlign.wdl
ekiernan Jul 17, 2023
98af472
Update StarAlign.wdl
ekiernan Jul 17, 2023
507ddbb
Merge branch 'develop' into lk-add-starsolo-barcode-metrics
ekiernan Jul 18, 2023
7902a40
made changelog updates
ekiernan Jul 18, 2023
c1b7a05
Merge branch 'lk-add-starsolo-barcode-metrics' of https://github.com/…
ekiernan Jul 18, 2023
45d75dc
Merge branch 'develop' into lk-add-starsolo-barcode-metrics
ekiernan Jul 18, 2023
34ffa95
deleted TAR of metrics
ekiernan Jul 19, 2023
e564494
Updated Optimus Readme to include aligner_metrics description in outputs
ekiernan Jul 20, 2023
8e70715
Merge branch 'develop' into lk-add-starsolo-barcode-metrics
ekiernan Jul 20, 2023
0cb153a
Merge branch 'develop' into lk-add-starsolo-barcode-metrics
ekiernan Jul 20, 2023
d3d7f5e
Added all aligner metrics to a final TAR
ekiernan Jul 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions pipelines/skylab/multiome/Multiome.changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# 1.0.1
2023-07-11 (Date of Last Commit)
# 1.0.1
2023-07-23 (Date of Last Commit)

* Added STARsolo v2.7.10b metric outputs as an optional pipeline output and an output of the STARalign and MergeSTAR tasks

* Updated the CountAlignments task in the FeatureCounts.wdl to use a new docker image. This change does not affect the Multiome pipeline

Expand Down
5 changes: 5 additions & 0 deletions pipelines/skylab/optimus/Optimus.changelog.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@

# 5.8.4
2023-07-18 (Date of Last Commit)

* Added STARsolo v2.7.10b metric outputs as an optional pipeline output and an output of the STARalign and MergeSTAR tasks

# 5.8.3
2023-06-23 (Date of Last Commit)

Expand Down
5 changes: 4 additions & 1 deletion pipelines/skylab/optimus/Optimus.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ workflow Optimus {

# version of this pipeline

String pipeline_version = "5.8.3"
String pipeline_version = "5.8.4"

# this is used to scatter matched [r1_fastq, r2_fastq, i1_fastq] arrays
Array[Int] indices = range(length(r1_fastq))
Expand Down Expand Up @@ -173,6 +173,7 @@ workflow Optimus {
barcodes = STARsoloFastq.barcodes,
features = STARsoloFastq.features,
matrix = STARsoloFastq.matrix,
cell_reads = STARsoloFastq.cell_reads,
input_id = input_id
}
if (counting_mode == "sc_rna"){
Expand Down Expand Up @@ -209,6 +210,7 @@ workflow Optimus {
barcodes = STARsoloFastq.barcodes_sn_rna,
features = STARsoloFastq.features_sn_rna,
matrix = STARsoloFastq.matrix_sn_rna,
cell_reads = STARsoloFastq.cell_reads_sn_rna,
input_id = input_id
}
call H5adUtils.SingleNucleusOptimusH5adOutput as OptimusH5adGenerationWithExons{
Expand Down Expand Up @@ -245,6 +247,7 @@ workflow Optimus {
File gene_metrics = GeneMetrics.gene_metrics
File? cell_calls = RunEmptyDrops.empty_drops_result
File? picard_metrics = DropseqMetrics.metric_output
File? aligner_metrics = MergeStarOutputs.cell_reads_out
kayleemathews marked this conversation as resolved.
Show resolved Hide resolved
# h5ad
File h5ad_output_file = final_h5ad_output
}
Expand Down
5 changes: 5 additions & 0 deletions pipelines/skylab/slideseq/SlideSeq.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 1.0.10
2023-07-18 (Date of Last Commit)

* Added STARsolo v2.7.10b metric outputs as an optional pipeline output and an output of the STARalign and MergeSTAR tasks. This does not impact the Slideseq pipeline

# 1.0.9
2023-06-14 (Date of Last Commit)

Expand Down
2 changes: 1 addition & 1 deletion pipelines/skylab/slideseq/SlideSeq.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import "../../../tasks/skylab/MergeSortBam.wdl" as Merge

workflow SlideSeq {

String pipeline_version = "1.0.9"
String pipeline_version = "1.0.10"

input {
Array[File] r1_fastq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
# 1.2.25
2023-07-11 (Date of Last Commit)
2023-07-18 (Date of Last Commit)

* Added STARsolo v2.7.10b metric outputs as an optional pipeline output and an output of the STARalign and MergeSTAR tasks. This does not impact the snSS2 pipeline
* Updated the CountAlignments task in the FeatureCounts.wdl to use a new docker image. This change does not affect the MultiSampleSmartSeq2SingleNucleus pipeline


# 1.2.24
2023-06-23 (Date of Last Commit)

Expand Down
47 changes: 45 additions & 2 deletions tasks/skylab/StarAlign.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,8 @@ task STARsoloFastq {
--soloUMIdedup 1MM_Directional_UMItools \
--outSAMtype BAM SortedByCoordinate \
--outSAMattributes UB UR UY CR CB CY NH GX GN \
--soloBarcodeReadLength 0
--soloBarcodeReadLength 0 \
--soloCellReadStats Standard
fi

STAR \
Expand All @@ -337,36 +338,58 @@ task STARsoloFastq {
--soloUMIdedup 1MM_Directional_UMItools \
--outSAMtype BAM SortedByCoordinate \
--outSAMattributes UB UR UY CR CB CY NH GX GN \
--soloBarcodeReadLength 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we typically run star twice? Is it possible to run it just once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run it twice for the count_exons = true. With the latest version of STAR, we should be able to run both parameters in a single STAR run. We haven't yet played with that functionality.

--soloBarcodeReadLength 0 \
--soloCellReadStats Standard

touch barcodes_sn_rna.tsv
touch features_sn_rna.tsv
touch matrix_sn_rna.mtx
touch CellReads_sn_rna.stats
touch Features_sn_rna.stats
touch Summary_sn_rna.csv
touch UMIperCellSorted_sn_rna.txt

if [[ "~{counting_mode}" == "sc_rna" ]]
then
mv "Solo.out/Gene/raw/barcodes.tsv" barcodes.tsv
mv "Solo.out/Gene/raw/features.tsv" features.tsv
mv "Solo.out/Gene/raw/matrix.mtx" matrix.mtx
mv "Solo.out/Gene/CellReads.stats" CellReads.stats
mv "Solo.out/Gene/Features.stats" Features.stats
mv "Solo.out/Gene/Summary.csv" Summary.csv
mv "Solo.out/Gene/UMIperCellSorted.txt" UMIperCellSorted.txt
elif [[ "~{counting_mode}" == "sn_rna" ]]
then
if ! [[ ~{count_exons} ]]
then
mv "Solo.out/GeneFull_Ex50pAS/raw/barcodes.tsv" barcodes.tsv
mv "Solo.out/GeneFull_Ex50pAS/raw/features.tsv" features.tsv
mv "Solo.out/GeneFull_Ex50pAS/raw/matrix.mtx" matrix.mtx
mv "Solo.out/GeneFull_Ex50pAS/CellReads.stats" CellReads.stats
mv "Solo.out/GeneFull_Ex50pAS/Features.stats" Features.stats
mv "Solo.out/GeneFull_Ex50pAS/Summary.csv" Summary.csv
mv "Solo.out/GeneFull_Ex50pAS/UMIperCellSorted.txt" UMIperCellSorted.txt
else
mv "Solo.out/GeneFull_Ex50pAS/raw/barcodes.tsv" barcodes.tsv
mv "Solo.out/GeneFull_Ex50pAS/raw/features.tsv" features.tsv
mv "Solo.out/GeneFull_Ex50pAS/raw/matrix.mtx" matrix.mtx
mv "Solo.out/GeneFull_Ex50pAS/CellReads.stats" CellReads.stats
mv "Solo.out/GeneFull_Ex50pAS/Features.stats" Features.stats
mv "Solo.out/GeneFull_Ex50pAS/Summary.csv" Summary.csv
mv "Solo.out/GeneFull_Ex50pAS/UMIperCellSorted.txt" UMIperCellSorted.txt
mv "Solo.out/Gene/raw/barcodes.tsv" barcodes_sn_rna.tsv
mv "Solo.out/Gene/raw/features.tsv" features_sn_rna.tsv
mv "Solo.out/Gene/raw/matrix.mtx" matrix_sn_rna.mtx
mv "Solo.out/Gene/CellReads.stats" CellReads_sn_rna.stats
mv "Solo.out/Gene/Features.stats" Features_sn_rna.stats
mv "Solo.out/Gene/Summary.csv" Summary_sn_rna.csv
mv "Solo.out/Gene/UMIperCellSorted.txt" UMIperCellSorted_sn_rna.txt
fi
else
echo Error: unknown counting mode: "$counting_mode". Should be either sn_rna or sc_rna.
fi
mv Aligned.sortedByCoord.out.bam ~{output_bam_basename}.bam
#tar -zcvf ~{output_bam_basename}.star_metrics.tar *.stats *.txt *.csv

>>>

Expand All @@ -389,6 +412,15 @@ task STARsoloFastq {
File barcodes_sn_rna = "barcodes_sn_rna.tsv"
File features_sn_rna = "features_sn_rna.tsv"
File matrix_sn_rna = "matrix_sn_rna.mtx"
File cell_reads = "CellReads.stats"
File align_features = "Features.stats"
File summary = "Summary.csv"
File umipercell = "UMIperCellSorted.txt"
File cell_reads_sn_rna = "CellReads_sn_rna.stats"
File align_features_sn_rna = "Features_sn_rna.stats"
File summary_sn_rna = "Summary_sn_rna.csv"
File umipercell_sn_rna = "UMIperCellSorted_sn_rna.txt"
#File aligner_metrics = "~{output_bam_basename}.star_metrics.tar"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still want this commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching it. I'm going to delete it entirely.

}
}

Expand All @@ -398,6 +430,8 @@ task MergeStarOutput {
Array[File] barcodes
Array[File] features
Array[File] matrix
Array[File]? cell_reads

String input_id

#runtime values
Expand All @@ -424,6 +458,14 @@ task MergeStarOutput {
declare -a barcodes_files=(~{sep=' ' barcodes})
declare -a features_files=(~{sep=' ' features})
declare -a matrix_files=(~{sep=' ' matrix})
declare -a cell_reads_files=(~{sep=' ' cell_reads})

for cell_read in "${cell_reads_files[@]}"; do
if [ -f "$cell_read" ]; then
cat "$cell_read" >> "~{input_id}_cell_reads.txt"
fi
done


# create the compressed raw count matrix with the counts, gene names and the barcodes
python3 /usr/gitc/create-merged-npz-output.py \
Expand All @@ -446,6 +488,7 @@ task MergeStarOutput {
File row_index = "~{input_id}_sparse_counts_row_index.npy"
File col_index = "~{input_id}_sparse_counts_col_index.npy"
File sparse_counts = "~{input_id}_sparse_counts.npz"
File? cell_reads_out = "~{input_id}_cell_reads.txt"
}
}

Expand Down
Loading