Analysis of the healthy human airways at single-cell level.

The code is structured in 6 steps, each with specific inputs and outputs (available in the data section).

1. Primary data analysis

Example code of the individual exploratory analysis performed on each of the 35 samples composing the atlas, with use of the Seurat v3 R package. From this initial analysis, repeated on the 35 samples, we obtained a rough estimate of the cell type composition of the atlas (to be refined later in the analysis of the complete dataset), and a list of robust marker genes found expressed in each cell types identified across all 35 samples.

Output from this step : Robust_marker_genes.tsv
Example D344_Biop_Int1

2. Pre-processing of the data

Pre-processed of the complete dataset in parallel in 3 different scripts:

Consensus cells and genes filtering across the 35 samples, normalisation of all the cell count to 10000 UMIs and merging of all the samples in a single dataset producing a large and preliminary-processed count table. [Pre-processing_preliminary_dataset.ipynb]
Iterative preliminary analysis of the dataset includes progressive cell filtering of small clusters composed of 'low quality'/'peculiar' cells (high mitochondrial cluster-cells, ...). [Preliminary analysis_v1 ...v4]
Consensus cells- and genes-filtering across the 35 samples (without normalisation) and merging of all the datasets in a single one to produce a large and unique raw count table. [PreProcessing_raw_dataset.ipynb]
Identification/Inference of doublet cells across all the 35 samples independently, and further analysis of the dataset to estimate the proportion of inferred doublet cells in the resulting clusters and corresponding cell filtering. [PreProcessing_doublets.ipynb; Pre-analysis_doublets.ipynb; Preliminary_analysis_doublet_metadata.Rmd]
Pre-processing of the background in gene expression across all samples to produce a 'background free' raw count table. [Pre-processing_gene background_dataset.Rmd; Preliminary_background_analysis]

Input files for this step All the 10x output files from the 35 samples (available for download on GSE) RB_genes (list of the filtered out ribosomal genes)

Output files from this step: Preliminary Analysis datasets :

PreProcessed_preliminary_dataset
Preliminary_analysis_v1...v4

PreProcessed raw count Table:

PreProcessed_raw_dataset

Doublet analysis:

metadata_doublet

Background free datasets :

background_features
background_metadata
SoupX_raw_dataset
SoupX_strained_dataset (Advance SoupX correction, not used in the following analysis)

3. Data Normalization and integration (batch correction)

To appropriately normalize the complete dataset, it is again filtered of the 'low-quality cells' and then normalised using scran method (Lun & Haghverdi) on both raw counts and soupX corrected counts. Lastly, the normalised counts data are integrated to produce a batch-free PCA matrix that will be used in the following analysis.

Output files

scranNorm_dataset
fastMNN_PCA
SoupX_norm_dataset

Integration process The data integration process will progressively map one dataset onto another in the following order : Intermediate samples, Tracheal (Proximal) samples, Distal samples and Nasal samples. The order was defined based on the results of the preliminary analysis, which established the relative homogeneity of the samples based on their sampling location. For all the samples from the 'same' initial sampling site (level), they are aggregated from the larger to the smaller datasets (the ones with more cells first).

4. Analysis

The dataset can now be fully analysed including umap embedding (computed on the integrated PCA), clustering of the cells, marker genes identification and specific sub-clustering of each 'key' cell-cluster. The many sub-clustering steps were done to improve the precision of the final cell labelling [Annotated_dataset_metadata].

Output files -Annotated_dataset_v1 (v1 because the cell types names will be progressively updated as our understanding improves) -Focus_XXX_cells -markers_XXX_cells

5. Detailed_Analysis

This repertory contains the scripts used for the detailed analysis of some cell types. It includes :

Differential analysis of the similar cell types identified in both Nasal and Bronchial samples (Secretory, Multiciliated, Suprabasal), followed by Gene Set Enrichment Analysis;
Trajectory inference of the epithelial cell types from the Nasal or Bronchial area using PAGA;
Inference of the Transcription Factors activity using SCENIC to identify the regulons.

Output files -DA_bulk_XXX -XXX_fgseaRes_GO_bp -anndata_v6_Paga -score_TF

6. Figures

All the scripts used for the Figures found in the paper Deprez et al. Scripts are labelled by figure numbers. All the necessary files are in the data repository.

**Final AnnData object and metadata : **

Annotated_dataset.h5ad
Annotated_dataset_metadata.tsv

Count tables: raw or normalized -raw_exprMatrix.Rda
-SoupX_raw_dataset.Rda
-scranNorm_dataset.Rda
-SoupX_norm_dataset.Rda

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
1_Primary_data_analysis		1_Primary_data_analysis
2_Pre_Processing		2_Pre_Processing
3_Normalization_Integration		3_Normalization_Integration
4_Analysis		4_Analysis
5_Detailed_Analysis		5_Detailed_Analysis
6_Figures		6_Figures
Data		Data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis of the healthy human airways at single-cell level.

1. Primary data analysis

2. Pre-processing of the data

3. Data Normalization and integration (batch correction)

4. Analysis

5. Detailed_Analysis

6. Figures

About

Releases

Packages

Languages

becavin-lab/HCA_analysis

Folders and files

Latest commit

History

Repository files navigation

Analysis of the healthy human airways at single-cell level.

1. Primary data analysis

2. Pre-processing of the data

3. Data Normalization and integration (batch correction)

4. Analysis

5. Detailed_Analysis

6. Figures

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages