The following datasets will be used for testing normalization methods. For each only the data from lung will be used.
- LungMAP — Human data from a broad age healthy donor group.
- Tabula Sapiens - All Cells
- COVID-19 immune features revealed by a large-scale single-cell transcriptome atlas
bash scripts/download/download_data.sh
bash ./scripts/processing/subsample_lung.sh
bash ./scripts/processing/drop_low_coverage_cells.sh
This will subsample cells to a max of 100K cell per dataset, the sampling is done per cell type so that all cell tpyes are preserved with the same original proportions
bash ./scripts/processing/subsample_h5ad.sh
bash ./scripts/normalizing/rankit.sh
bash ./scripts/normalizing/inmf.sh
bash ./scripts/normalizing/z_scores.sh
mkdir -p ./data/normalized/scvi
python3 ./scripts/normalizing/scvi.py
mkdir -p ./data/marker_genes/
python3 ./scripts/processing/find_marker_genes.py rankit ./data/normalized/scvi/all.h5ad ./data/marker_genes/marker_genes_rankit.tsv
python3 ./scripts/processing/find_marker_genes.py scvi ./data/normalized/scvi/all.h5ad ./data/marker_genes/marker_genes_scvi.tsv
Save scvi normalization of data only from lung map
mkdir -p ./data/for_notebook/
python3 ./scripts/processing/save_mtx_obs_var.py all ./data/normalized/scvi/lung_map.h5ad ./data/for_notebook/lung_map
Save all normalization methods of all datasets
mkdir -p ./data/for_notebook/
python3 ./scripts/processing/save_mtx_obs_var.py all ./data/normalized/scvi/all.h5ad ./data/for_notebook/all_datasets