FMs-for-spatialomics

Tool to apply RNASeq and pathology foundation models to spatial transcriptomics. Currently, this tool only works for extracting expression and histology based features from 10x visium samples and assumes data have been processed by their Spaceranger pipeline.

Setup

Set the environment variable:

export REPO_ROOT=/path/to/FMs-for-spatialomics

Create conda environment:
```
conda env create -f $REPO_ROOT/env.yml
```

Clone foundation model repos:

UNI:
- Request access to UNI on huggingface.
- Clone the UNI huggingface repo locally (this method uses an ssh key with huggingface. Alternate methods to clone the repo are also possible e.g. with an access token and/or the HF CLI):
```
# make sure git-lfs is already installed
git clone git@hf.co:MahmoodLab/UNI $REPO_ROOT/models/UNI
```

UCE:

Clone the UCE github repo locally:

git clone git@github.com:snap-stanford/UCE.git $REPO_ROOT/models/UCE

Download the model files:

wget https://figshare.com/ndownloader/articles/24320806/versions/5 -O $REPO_ROOT/models/UCE/model_files/temp.zip

Unzip model files:

unzip $REPO_ROOT/models/UCE/model_files/temp.zip -d $REPO_ROOT/models/UCE/model_files

Untar additional model files:

tar -xvf $REPO_ROOT/models/UCE/model_files/protein_embeddings.tar.gz -C $REPO_ROOT/models/UCE/model_files

Extract features

Feature extraction of expression and histology data using modality specific foundation models. The following instructions extract features per capture area (which we refer to as a slide) and should be repeated for each slide of interest.

Activate conda environemnt:
```
conda activate spatialFM
```

Prepare per-slide data:

python $REPO_ROOT/1-convert-to-anndata.py \
--spatial_path /path/to/spaceranger/output/outs \
--slide_path /path/to/full/resolution/slide.tif \
--tile_width N \
--output_h5ad /path/to/converted.h5ad

where the tile_width should typically be an integer within 1 pixel of spot_diameter.

Extract expression features:
```
python $REPO_ROOT/2-extract-expression-features.py \
--input_h5ad /path/to/converted.h5ad \
--output_h5ad /path/to/expr.h5ad \
--model uce_4 \
--species human
```
where non-standard --species may also need be configured with the UCE model, and --model can be one of uce_4 or uce_33 to use the 4 or 33 layer UCE models, respectively.

Extract histology features:

python $REPO_ROOT/3-extract-histology-features.py \
--input_h5ad /path/to/converted.h5ad \
--output_h5ad /path/to/hist.h5ad

Unify features:
Differing inclusion criteria between the foundation models result in minor differences in which barcoded-spots actually get processed. This final step is to take the intersection of those spots for further analysis.
```
python $REPO_ROOT/4-combine-data.py \
--source_h5ad /path/to/converted.h5ad \
--expr_h5ad /path/to/expr.h5ad \
--hist_h5ad /path/to/hist.h5ad \
--output_h5ad /path/to/extracted.h5ad
```

Clean up (optional):

rm /path/to/converted.h5ad
rm /path/to/uce.h5ad
rm /path/to/uni.h5ad

Automating feature extraction

An example script to run all steps of the feature extraction pipeline is located at run-extract.sh. It should similarly be run with the conda environment activated.

Hardware acceleration

The above inference scripts will default to use the first available GPU, if detected. To disable or alter this behavior, the simplest method is to set the environment variable CUDA_VISIBLE_DEVICES. If running out of GPU memory, consider tuning the batch size with the --batch_size flags to the feature extraction scripts. The default batch sizes were tuned for a single V100 16GB GPU.

Analysis

An example notebook for downstream analysis using extracted features can be found in the evaluations subdirectory of this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
evaluations		evaluations
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
1-convert-to-anndata.py		1-convert-to-anndata.py
2-extract-expression-features.py		2-extract-expression-features.py
3-extract-histology-features.py		3-extract-histology-features.py
4-combine-data.py		4-combine-data.py
LICENSE		LICENSE
README.md		README.md
env.yml		env.yml
requirements.txt		requirements.txt
run-extract.sh		run-extract.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FMs-for-spatialomics

Setup

Extract features

Automating feature extraction

Hardware acceleration

Analysis

About

Releases

Packages

Languages

License

StevenSong/FMs-for-spatialomics

Folders and files

Latest commit

History

Repository files navigation

FMs-for-spatialomics

Setup

Extract features

Automating feature extraction

Hardware acceleration

Analysis

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages