Pipelines and comparison for clustering approach in single cell RNA seq data. There are totally 6 approaches and 7 protocols for comparison on a single cell RNA sequencing benchmark dataset GSE118767:
- Mixture of H2228, H1975 and HCC827 human lung cancer cell lines: SRR6782112
- Mixture of H2228, H1975, A549, H838 and HCC827 human lung cancer cell lines: SRR8606521
From cell ranger to count matrix
Raw SRA data -> fastq files -> count matrices
Follow the instructions from cell_ranger_pipelines
to transform raw .SRA files to count matrix produced by cell ranger pipelines.
Performing clustering analysis on count matrices:
-
Data preprocessing and Benchmarking: After getting raw count matrix, use the following files for data preprocessing and running the clustering algorithms:
-> 3 cell lines
-> 5 cell lines
-> subsampling(Reference for individual methods could be looked up in methods/)
-
Compare them in two jupyter notebooks
-> sc10x-3c
-> sc10x-5c
-> sc10x-3c-subsampling
-> sc10x-5c-subsampling