Comparing cell groupings between experiments across species

In Single-cell Expression Atlas we're interested in relating cell groupings (clusters, cell types) between experiments and across species boundaries. There is a naive way of doing this using marker genes, and more advanced ways such as SAMap. This workflow executes both of those methods (note: the workflow was written before most recent SAMap developments, so is a little of of date in that regard), and produces comparisons of the number of the predicted cross-species intersecting cell groups. See this presentation for some initial results.

Note that SAMap is executed with a workflow that parallelises a BLAST step.

Naive marker-driven method

It is possible to compare cell groups (clusters, cell types) across species via the 'marker' genes of each group. Since each set of marker genes is a product of the context in which it was derived (sorted cell population, sub-tissue, tissue, whole organism), that context must be matched for comparison of marker gene sets (via ortholog relationships) to be valid. With that in mind this workflow will:

Take anndata objects from SCXA analysis for two experiments containing comparable 'organism' parts, even if they're not labelled at the same granularity.
Match the organism parts beween experiments, using the Uberon ontology to re-label where the granularity of organism part annotation is not consistent.
Subset the experiments to only the common organism parts.
Re-filter, re-normalise, and re-derive marker genes using Scanpy.
Derive mapped pairs of cell groupings between the two experiments.

Snakemake workflow

The Snakemake workflow in this repository performs the above steps given the following inputs:

Two annData files with .project.h5ad extensions, stored in an 'inputs' directory
The two species names
A .obo ontology file in the inputs directory.
An ortholog mapping file in the inputs directory.

Ortholog mappings can be derived from BiomaRt and should look like:

homo_sapiens_gene_id	homo_sapiens_gene_name	mus_musculus_gene_id	mus_musculus_gene_name
ENSG00000198888	MT-ND1	ENSMUSG00000064341	mt-Nd1
ENSG00000198763	MT-ND2	ENSMUSG00000064345	mt-Nd2
ENSG00000198804	MT-CO1	ENSMUSG00000064351	mt-Co1
ENSG00000198712	MT-CO2	ENSMUSG00000064354	mt-Co2
ENSG00000198899	MT-ATP6	ENSMUSG00000064357	mt-Atp6
ENSG00000198938	MT-CO3	ENSMUSG00000064358	mt-Co3
ENSG00000198840	MT-ND3	ENSMUSG00000064360	mt-Nd3
ENSG00000212907	MT-ND4L	ENSMUSG00000065947	mt-Nd4l
ENSG00000198886	MT-ND4	ENSMUSG00000064363	mt-Nd4

So the file should be tab-delimited and have 'gene_id' fields prefixed by the two input species names.

The Snakemake config file can then be constructed like the example, and the pipeline run like:

Snakemake --use-conda --cores 2

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
bin		bin
envs		envs
example_outputs		example_outputs
marker_context		marker_context
marker_selection_params		marker_selection_params
samap-workflow @ f959ad0		samap-workflow @ f959ad0
scripts		scripts
.gitmodules		.gitmodules
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml
dag.pdf		dag.pdf
dag.png		dag.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparing cell groupings between experiments across species

Naive marker-driven method

Snakemake workflow

About

Releases

Packages

Languages

ebi-gene-expression-group/cross-species-cellgroup-comparison

Folders and files

Latest commit

History

Repository files navigation

Comparing cell groupings between experiments across species

Naive marker-driven method

Snakemake workflow

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages