Update pipeline to download and use SRAs from GEO (#1)

* rework from sra-download * linting * through sras and trimming * through making rsem-star reference * adding velocity calculations * working through making a_obs * copying new results * finish pipeline through making dataframes * comment out testing * update input link * add kinase results to all to prompt full run
CellProfiling · Jun 9, 2021 · cb4a070 · cb4a070
1 parent 5c77ce1
commit cb4a070
Show file tree

Hide file tree

Showing 25 changed files with 1,892 additions and 204 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,5 +1,5 @@
 .snakemake
-ensembl
-output
+resources/ensembl
+results
 input
-ESCG_data
+**/.DS_Store
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,3 @@
+[submodule "SingleCellProteogenomics"]
+	path = SingleCellProteogenomics
+	url = https://github.com/CellProfiling/SingleCellProteogenomics.git
diff --git a/README.md b/README.md
@@ -6,19 +6,21 @@ This repository contains the _snakemake_ pipeline for analyzing the RNA sequenci
 
 ## Single-cell sequencing files
 
-The single-cell RNA-Seq data is available at GEO SRA under project number [GSE146773](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE146773). 
+The single-cell RNA-Seq data is available at GEO SRA under project number [GSE146773](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE146773).
+
+This data is downloaded automatically in this pipeline.
 
 ## Updating the Ensembl version
 
-The genome and ensembl versions are located at the top of the file `Snakefile`.
+The genome and Ensembl versions are located at the top of the file `Snakefile`.
 These can be updated, and the references will be downloaded automatically.
 
 ## Usage
 
+1) Clone repository and initialize submodules: `git clone --recurse-submodules https://github.com/CellProfiling/FucciSingleCellSeqPipeline.git && cd FucciSingleCellSeqPipeline`
 1) Install conda: https://docs.conda.io/en/latest/miniconda.html
-2) Create the conda environment: `conda env create --file environment.yaml --name cellquant`
-3) Activate the conda environment: `conda activate cellquant`
-4) Run the workflow: recommended command is `snakemake --cores 24 --resources mem_mb=100000`, where you can subsitute the max number of cores and max memory allocation. The memory allocation should be at least 50000 MB if possible. It might work with 32000 MB, but no guarantees.
+2) Install snakemake using conda: `conda install -c conda-forge snakemake-minimal`
+4) Run the workflow: `snakemake --use-conda --cores 24 --resources mem_mb=100000`, where you can subsitute the max number of cores and max memory allocation. At least 54 GB of free memory should be available.
 
 ## Citation
 

diff --git a/SingleCellProteogenomics b/SingleCellProteogenomics
diff --git a/Snakefile b/Snakefile
diff --git a/environment.yaml b/environment.yaml
diff --git a/data/ERCC.fa → resources/ERCC.fa b/data/ERCC.fa → resources/ERCC.fa