SpatialSSL: Whole-Brain Spatial Transcriptomics in the Mouse Brain with Self-Supervised Learning (NeurIPS 2023 AI for Science Workshop)

Link to full paper: https://openreview.net/forum?id=bfD3Fzy7Mb

Authors

Till Richter
Anna Schaar
Francesca Drummer
Cheng Wei
Leopold Endres
Fabian Theis

Visual Abstract

Introduction

SpatialSSL is a research project that explores different methods of self-supervised learning applied to spatial transcriptomics data in the mouse brain. Self-supervised learning is a type of machine learning where models are trained to predict certain aspects of the data without explicit human annotations. The project focuses on modeling spatial data as graphs and applying graph-based machine learning and pretraining methods. The goal of this project is to analyse how different pretrained models perform on downstream tasks.

Provided Data

The dataset used in SpatialSSL leverages the BICCN 2.0 [1] dataset, which contains spatial data of approximately 4 million brain cells, measuring the expression of 550 genes. Spatial transcriptomics data combines positional information ( usually in the form of x-y coordinates) with gene expression for each cell. The dataset is organized into 59 "images," each containing spatially independent cells. Depending on the annotation levels, there are 15, 30, or 35 different cell types present.

Dataset Construction - Two Methods

Create Single Graph for Each Image in the Dataset:
- Parameters: Radius, threshold distance between cells to draw edges between them.
- This method results in relatively large graphs spanning from 30k to 60k nodes.
Splicing Graphs into Subgraphs Using Egonet of Each Node:
- Parameters: Radius, threshold distance between cells to draw edges between them; k_hop: the number of "hops" for subgraph creation.
- This method results in approximately 4 million small graphs, with the size depending on the radius and k_hop parameters.

Masking Techniques for Pretraining

For both datasets, several masking methods were explored during the pretraining phase:

Masking all gene expression values of 20% of cells.
Masking specific cell types.

In the final experiments, we masked all gene expression values of 20% of cells.

Pretraining

Pretraining of the models was performed using graph neural networks with graph attention and convolution. The task was to reconstruct the masked gene expression values.

Downstream Task: Cell Type Prediction

SpatialSSL develops various models, including both pre-trained models and models that are not pre-trained. Pre-trained weights are utilized for fine-tuning the model specifically for cell type prediction. Subsequently, the performance of these models is assessed in the context of predicting cell types based on gene expression values.

Citation

If you use this code or our paper in your research, please cite the following paper:

@inproceedings{
   richter2023spatialssl,
   title={Spatial{SSL}: Whole-Brain Spatial Transcriptomics in the Mouse Brain with Self-Supervised Learning},
   author={Till Richter and Anna Schaar and Francesca Drummer and Cheng-Wei Liao and Leopold Endres and Fabian Theis},
   booktitle={NeurIPS 2023 AI for Science Workshop},
   year={2023},
   url={https://openreview.net/forum?id=bfD3Fzy7Mb}
}

References

[1] Zhang M, Pan X, Jung W, Halpern A, Eichhorn SW, Lei Z, Cohen L, Smith KA, Tasic B, Yao Z, Zeng H, Zhuang X. A molecularly defined and spatially resolved cell atlas of the whole mouse brain. bioRxiv [Preprint]. 2023 Mar 7:2023.03.06.531348. doi: 10.1101/2023.03.06.531348. PMID: 36945367; PMCID: PMC10028822.

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
images		images
notebooks		notebooks
rules		rules
scripts		scripts
spatialSSL		spatialSSL
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml
dataloader_test_1.ipynb		dataloader_test_1.ipynb
environment.yml		environment.yml
graph_model_test.ipynb		graph_model_test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpatialSSL: Whole-Brain Spatial Transcriptomics in the Mouse Brain with Self-Supervised Learning (NeurIPS 2023 AI for Science Workshop)

Authors

Visual Abstract

Introduction

Provided Data

Dataset Construction - Two Methods

Masking Techniques for Pretraining

Pretraining

Downstream Task: Cell Type Prediction

Citation

References

About

Releases

Packages

Contributors 3

Languages

theislab/spatial_atlas_ssl

Folders and files

Latest commit

History

Repository files navigation

SpatialSSL: Whole-Brain Spatial Transcriptomics in the Mouse Brain with Self-Supervised Learning (NeurIPS 2023 AI for Science Workshop)

Authors

Visual Abstract

Introduction

Provided Data

Dataset Construction - Two Methods

Masking Techniques for Pretraining

Pretraining

Downstream Task: Cell Type Prediction

Citation

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages