Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat Robinson2023 #10

Merged
merged 54 commits into from
Apr 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
f72ee41
skeleton for running Fu2022
wir963 Jul 12, 2022
4101bee
forgot to save run snakemake
wir963 Jul 12, 2022
cc89b8b
fixes for running Fu2022
wir963 Jul 13, 2022
a6658be
add Pelka2021 to datasets to be processed
wir963 Jul 22, 2022
ec9fa28
Merge branch 'feat-Fu2022' of github.com:ruppinlab/scRNA-seq-microbe-…
wir963 Jul 22, 2022
793e4c2
add scripts for running
wir963 Jul 22, 2022
98e6c5f
add scripts for running
wir963 Jul 22, 2022
a00a462
update sratoolkit module and use key for dbgap
wir963 Jul 22, 2022
db61629
Fu2022 bug fixes form biowulf
wir963 Jul 22, 2022
f28d25e
use sample names from publication
wir963 Jul 25, 2022
1631cae
fixed which sra files corresponds to which cellranger file
wir963 Jul 25, 2022
2d9f82a
fix lane issue
wir963 Jul 25, 2022
60c70bd
add -1 for barcode for compatibility
wir963 Jul 28, 2022
d776d70
first attempt to run Ma2021
wir963 Aug 19, 2022
641fd0e
bug fixes to run on biowulf
wir963 Aug 22, 2022
b2b39c3
add sample prefix
wir963 Aug 22, 2022
a5518fb
use -1 for all barcodes
wir963 Aug 22, 2022
cfa92e2
run on biowulf
wir963 Aug 26, 2022
3492030
run SRPRISM for MW
wir963 Aug 26, 2022
af89a94
added sample level information
wir963 Sep 6, 2022
011d61b
add 10x data from Robinson2022 to process
wir963 Sep 16, 2022
fa33e77
run SRPRISM for viruses on Ma2021
wir963 Sep 16, 2022
dd2c21d
bug fixes to run on biowulf
wir963 Sep 16, 2022
6caa880
fix path for S4
wir963 Sep 19, 2022
4b42842
Merge branch 'feat-Fu2022' of github.com:ruppinlab/scRNA-seq-microbe-…
wir963 Sep 19, 2022
071d463
add units for one sample
wir963 Sep 20, 2022
230da7a
Merge branch 'feat-Fu2022' of github.com:ruppinlab/scRNA-seq-microbe-…
wir963 Sep 20, 2022
623242a
add missing column
wir963 Sep 20, 2022
38bad5b
Merge branch 'feat-Fu2022' of github.com:ruppinlab/scRNA-seq-microbe-…
wir963 Sep 20, 2022
57e4983
add cell-types for all samples
wir963 Sep 20, 2022
bf5880e
changes made on biowulf
wir963 Sep 20, 2022
3864b76
Merge branch 'feat-Fu2022' of github.com:ruppinlab/scRNA-seq-microbe-…
wir963 Sep 20, 2022
51b5e9d
run Bukavina2022
wir963 Sep 20, 2022
52092e4
split Fu bam by cell barcodes
wir963 Sep 21, 2022
5b7810a
add correct cellbarcodes for CTpos sample
wir963 Sep 22, 2022
8ca3982
added rules for analyzing each sample
wir963 Sep 23, 2022
9a6b1c3
add rule for featurecounts
wir963 Sep 23, 2022
8c9f0d6
add rules for counting mouse reads
wir963 Sep 23, 2022
6f88548
initial commit for Zhang2021
wir963 Sep 23, 2022
fe4f9d8
replace plus sign to avoid regex hell
wir963 Sep 23, 2022
90c24cd
use pos and neg for units and samples
wir963 Sep 24, 2022
d75fecf
add WIP for Fu2022
wir963 Sep 27, 2022
8d01c86
add Robinson2022-SS2 directory
wir963 Oct 4, 2022
1b08513
update path to raw files
wir963 Oct 4, 2022
69e22b3
updates made to Zhang to run on biowulf
wir963 Oct 13, 2022
909cd8d
run qc on zhang2021
wir963 Oct 13, 2022
23bae2b
use SRPRISM on Robinson
wir963 Oct 31, 2022
c3cc241
run SRPRISM mapping for more samples
wir963 Nov 10, 2022
dbb9306
add Stone2023
wir963 Dec 1, 2022
3935619
fix temp path
wir963 Dec 2, 2022
a6292e2
add units and run CR
wir963 Dec 2, 2022
e01b270
use mouse reference
wir963 Dec 2, 2022
226a092
changes for biowulf for publication
wir963 Feb 15, 2023
da289d2
add code for processing Robinson2023, Pelka2021 and Zhang2021
wir963 Apr 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
data/*
**/.snakemake
**/.DS_Store
**/output
Expand Down
2 changes: 1 addition & 1 deletion Ben-Moshe2019/run-SRPRISM.smk
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ rule add_CR_tags_SRPRISM:
SRPRISM_CB_UMI_TABLE,
SRPRISM_CB_UMI_COUNT
script:
"src/add_CR_tags_to_SRPRISM_bam.py"
"../src/add_CR_tags_to_SRPRISM_bam.py"

# get a read count per gene per sample file
rule intersect_BAM_GFF:
Expand Down
61 changes: 0 additions & 61 deletions Lee2020/Snakefile

This file was deleted.

10 changes: 0 additions & 10 deletions Lee2020/data/patients.tsv

This file was deleted.

173 changes: 0 additions & 173 deletions Lee2020/data/samples.tsv

This file was deleted.

27,415 changes: 0 additions & 27,415 deletions Lee2020/data/units.tsv

This file was deleted.

85 changes: 85 additions & 0 deletions Ma2021/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
from os.path import join
import pandas as pd
from snakemake.utils import min_version
##### set minimum snakemake version #####
min_version("5.1.2")

##### load config and sample sheets #####
configfile: "config/PathSeq-config.yaml"



cells = pd.read_csv(config["units"], sep="\t").set_index(["sample", "barcode", "patient"], drop=False)

samples = pd.read_csv(config["samples"], sep="\t").set_index(["patient", "sample"], drop=False)
# samples = samples.loc[samples["Maximum read length"] == "57bp"]
cells = cells.loc[cells["sample"].isin(samples["sample"])]
# samples = cells[["patient", "sample", "library_id", "sample_prefix"]].reset_index(drop=True).drop_duplicates().set_index(["patient", "sample"], drop=False)
# samples = samples.iloc[0:2]

# cells = cells.loc[cells.patient.isin(["H08", "H37"])]
wildcard_constraints:
patient="|".join(samples["patient"]),
sample="|".join(samples["sample"])

# Snakemake includes
# include: "../RNA-snakemake-rules/rules/cellranger.smk"
include: "../pathogen-discovery-rules/rules/PathSeq-10x.smk"

# Directories
CR_SAMPLE_ODIR = "{patient}-{sample}"
FASTQ_DIR = "/data/Robinson-SB/scRNAseq_46samples_fastq/{library}"

# CellRanger Files
CR_BAM_FILE = join(CR_SAMPLE_ODIR, "outs", "possorted_genome_bam.bam")

PATIENT_FASTQ_DIR = join("FASTQ", "raw", "{patient}")

# cellranger complains when you pass directory as --id
CR_SAMPLE_ODIR = "{patient}-{sample}"




# PathSeq files
PATHSEQ_BAM = join("output", "PathSeq", "{patient}-{sample}", "pathseq.bam")
PATHSEQ_CELL_SCORE = join("output", "PathSeq", "{patient}-{sample}-{cell}", "pathseq.txt")

rule all:
input:
# expand(PATHSEQ_BAM, zip, patient=samples["patient"], sample=samples["sample"]),
expand(PATHSEQ_CELL_SCORE, zip, patient=cells["patient"], sample=cells["sample"], cell=cells["barcode"])
#expand(SRPRISM_TAG_BAM, zip, patient=samples["patient"], sample=samples["sample"], genome=samples["genome"])

def get_cellranger_input_directory(wildcards):
library_id = samples.loc[(wildcards.patient, wildcards.sample), "library_id"]
return {
"dir": expand(FASTQ_DIR, library=library_id)
}

def get_library_id(wildcards):
return samples.loc[(wildcards.patient, wildcards.sample), "sample_prefix"]

# expected input format for FASTQ file
rule cellranger_count:
input:
unpack(get_cellranger_input_directory)
params:
PATIENT_FASTQ_DIR,
CR_SAMPLE_ODIR,
config["CellRanger"]["genome_dir"],
config["CellRanger"]["chemistry"],
get_library_id
output:
CR_BAM_FILE
shell:
"module load cellranger/5.0.1 && "
# snakemake auto creates directories for output files but cellranger expects existing directories to pipestance directory
"rm -rf {params[1]} && "
"cellranger count --id={params[1]} "
"--fastqs={input[0]}/ " # this is the path to the directory containing the FASTQ files
"--sample={params[4]} " # this is the sample to use
"--transcriptome={params[2]} "
"--localcores=$SLURM_CPUS_PER_TASK "
"--chemistry={params[3]} "
"--localmem=60"
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,6 @@ patients: data/patients.tsv
samples: data/samples.tsv
units: data/units.tsv

trimming:
skip: True

PathSeq:
bam_file: "output/BAM/{patient}-{sample}-unaligned.bam"
microbe_fasta: "../data/microbev1.fa"
Expand All @@ -20,7 +17,7 @@ VecScreen:
contaminant_hits: "../data/microbev1-vecscreen-combined-matches.bed"

params:
PathSeq: "--min-clipped-read-length 50 --min-base-quality 1 --max-masked-bases 10 --dust-t 24 "
PathSeq: "--min-clipped-read-length 38 --min-base-quality 1 --max-masked-bases 10 --dust-t 24"
PathSeqScore: ""

CellRanger:
Expand Down
12 changes: 11 additions & 1 deletion Lee2020/config/cluster.json → Ma2021/config/cluster.json
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,11 @@
"mem": "150g",
"time": "4:00:00"
},
"add_CR_tags_SRPRISM":
{
"mem": "150g",
"time": "4:00:00"
},
"split_PathSeq_BAM_by_CB_UB":
{
"mem": "8g",
Expand All @@ -45,6 +50,11 @@
{
"mem": "32g",
"time": "4:00:00",
"nthreads": 16
"nthreads": 1
},
"run_CAMMiQ_species_long_reads":
{
"mem": "650g",
"partition": "largemem"
}
}
47 changes: 47 additions & 0 deletions Ma2021/data/samples.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
patient sample S_ID library_id sample_prefix Maximum read length 10x_chemistry cancer_type
C25 C25 S038 9_XW_SC_1-26-2017 9_XW_SC_1-26-2017 57bp v2 CCA
C26 C26a S020 Sample_CS023169_Wang_0823_1626-12-23_SCAF350 CS023169_Wang_0823_1626-12-23_SCAF350 98bp v2 CCA
C26 C26b S037 8_XW_SC_1-26-2017 8_XW_SC_1-26-2017 57bp v2 CCA
C29 C29 S040 11_XW_SC_3-1-2017 11_XW_SC_3-1-2017 57bp v2 CCA
C35 C35 S045 20_XW_SC_7-26-2017 20_XW_SC_7-26-2017 57bp v2 CCA
C39 C39 S044 19_XW_SC_7-27-2017 19_XW_SC_7-27-2017 57bp v2 CCA
C42 C42 S025 Sample_CS023169_Wang_0823_1642_SCAF355 CS023169_Wang_0823_1642_SCAF355 98bp v2 CCA
C46 C46a S027 Sample_CS023169_Wang_0823_1646-12-1_SCAF357 CS023169_Wang_0823_1646-12-1_SCAF357 98bp v2 CCA
C46 C46b S028 Sample_CS023169_Wang_0824_1646-1-17_SCAF358 CS023169_Wang_0824_1646-1-17_SCAF358 98bp v2 CCA
C52 C52 S016 Sample_CS023169_Wang_0731_1652_SCAF303 CS023169_Wang_0731_1652_SCAF303 98bp v2 CCA
C56 C56 S018 Sample_CS023169_Wang_0731_1656_SCAF305 CS023169_Wang_0731_1656_SCAF305 98bp v2 CCA
C60 C60 S015 Sample_CS023169_Wang_0730_1660r_SCAF300 CS023169_Wang_0730_1660r_SCAF300 98bp v2 CCA
C66 C66 S034 Sample_CS023169_Wang_0824_1666_SCAF365 CS023169_Wang_0824_1666_SCAF365 98bp v2 CCA
C76 C76 S010 CS025253_SCAF880_LCS1676 CS025253_SCAF880_LCS1676 98bp v2 CCA
H08 H08 S001 CS024371_SCAF529_LCS1608 CS024371_SCAF529_LCS1608 98bp v2 HCC
H18 H18 S043 16_XW_SC_6-13-2017 16_XW_SC_6-13-2017 57bp v2 HCC
H21 H21 S035 2_XW_SC_10-25-2016 2_XW_SC_10-25-2016 57bp v2 HCC
H23 H23 S039 10_XW_SC_2-2-2017 10_XW_SC_2-2-2017 57bp v2 HCC
H28 H28 S036 7_XW_SC_1-12-2017 7_XW_SC_1-12-2017 57bp v2 HCC
H30 H30 S041 12_XW_SC_3-2-2017 12_XW_SC_3-2-2017 57bp v2 HCC
H34 H34b S021 Sample_CS023169_Wang_0823_1634-1-10_SCAF352 CS023169_Wang_0823_1634-1-10_SCAF352 98bp v2 HCC
H34 H34a S022 Sample_CS023169_Wang_0823_1634-12-1_SCAF351 CS023169_Wang_0823_1634-12-1_SCAF351 98bp v2 HCC
H34 H34c S023 Sample_CS023169_Wang_0823_1634-4-15_SCAF353 CS023169_Wang_0823_1634-4-15_SCAF353 98bp v2 HCC
H37 H37 S046 21_XW_SC_8-1-2017 21_XW_SC_8-1-2017 57bp v2 HCC
H38 H38 S042 15_XW_SC_6-8-2017 15_XW_SC_6-8-2017 57bp v2 HCC
H41 H41 S024 Sample_CS023169_Wang_0823_1641_SCAF354 CS023169_Wang_0823_1641_SCAF354 98bp v2 HCC
H43 H43 S026 Sample_CS023169_Wang_0823_1643_SCAF356 CS023169_Wang_0823_1643_SCAF356 98bp v2 HCC
H49 H49b S008 CS025253_SCAF872_LCS1649 CS025253_SCAF872_LCS1649 98bp v2 HCC
H49 H49a S029 Sample_CS023169_Wang_0824_1649_SCAF359 CS023169_Wang_0824_1649_SCAF359 98bp v2 HCC
H54 H54 S014 Sample_CS023169_Wang_0730_1654_SCAF301 CS023169_Wang_0730_1654_SCAF301 98bp v2 HCC
H55 H55 S017 Sample_CS023169_Wang_0731_1655_SCAF304 CS023169_Wang_0731_1655_SCAF304 98bp v2 HCC
H58 H58c S012 SCAF637 SCAF637 98bp v2 HCC
H58 H58a S030 Sample_CS023169_Wang_0824_1658-5-31_SCAF361 CS023169_Wang_0824_1658-5-31_SCAF361 98bp v2 HCC
H58 H58b S031 Sample_CS023169_Wang_0824_1658-7-11_SCAF362 CS023169_Wang_0824_1658-7-11_SCAF362 98bp v2 HCC
H62 H62 S019 Sample_CS023169_Wang_0731_1662r_SCAF307 CS023169_Wang_0731_1662r_SCAF307 98bp v2 HCC
H63 H63 S032 Sample_CS023169_Wang_0824_1663_SCAF363 CS023169_Wang_0824_1663_SCAF363 98bp v2 HCC
H65 H65 S033 Sample_CS023169_Wang_0824_1665_SCAF364 CS023169_Wang_0824_1665_SCAF364 98bp v2 HCC
H68 H68a S011 SCAF372 SCAF372 98bp v2 HCC
H68 H68b S013 SCAF589 SCAF589 98bp v2 HCC
H70 H70 S002 CS023169_SCAF592 CS023169_SCAF592 98bp v2 HCC
H72 H72 S003 CS023169_SCAF672 CS023169_SCAF672 98bp v3 HCC
H73 H73a S004 CS023169_SCAF694 CS023169_SCAF694 98bp v3 HCC
H73 H73b S006 CS025253_SCAF765_LCS1673_2 CS025253_SCAF765_LCS1673_2 98bp v2 HCC
H74 H74 S005 CS025253_SCAF764_LCS1674 CS025253_SCAF764_LCS1674 98bp v2 HCC
H75 H75 S007 CS025253_SCAF850_LCS1675 CS025253_SCAF850_LCS1675 98bp v2 HCC
H77 H77 S009 CS025253_SCAF873_LCS1677 CS025253_SCAF873_LCS1677 98bp v2 HCC
Loading