Streamline datasets for documenation #700

LucaMarconato · 2024-09-03T14:19:12Z

We should make the usage of datasets more heterogenous across the notebooks in the docs.

Practically:

select 1, max 3, small datasets (<1 GB each, ideally ~100 MB), use these datasets in all the notebooks across the repos:
- spatialdata
- spatialdata-plot
- napari-spatialdata
in particular, remove the non-bio datasets from the docs (e.g. remove the raccoon dataset from the transformation notebook, and the blobs dataset from the aggregation and rasterize notebooks)
implement a dataset class, like in squidpy, to automatically download the datasets

The text was updated successfully, but these errors were encountered:

LucaMarconato · 2024-09-03T15:15:53Z

We will use the following datasets:

Visium: P2 CRC from here https://www.10xgenomics.com/products/visium-hd-spatial-gene-expression/dataset-human-crc (subsampled)
Xenium: XOA 3.0 5K https://www.10xgenomics.com/datasets/xenium-prime-fresh-frozen-mouse-brain (subsampled)
Visium HD: P2 CRC from here https://www.10xgenomics.com/products/visium-hd-spatial-gene-expression/dataset-human-crc (subsampled)
CosMx: NSCLC dataset (we need to subsample)
MERFISH prototype pipeline
MIBITOF (the one coming from squidpy, converted in spatialdata-sandbox)

For citations (used in the readme of spatialdata-notebooks/datasets and when we use the data in the tutorial notebooks), here are the referneces:

VIsium and Visium HD: https://pages.10xgenomics.com/WEB-2024-05-TECH-LIT-VIS-HD-PREPRINT-PROMO_LP.html
- we should parse the SpaceRanger clusters into the data (they are not parsed by spatialdata-io)
Xenium: not sure there is a paper, but we link the data page above
CosMx, MERFISH and MIBITOF: the paper is already linked in the readme in spatialdata-notebooks/datasets

Technology notebooks Visium: visium mouse brain -> Visium
Technology notebooks MIBI-TOF: no change
Technology notebooks MERFISH: no change, but:
- rename the title to "MERFISH prototype pipeline"
- explain in the notebook that this is not MERFISH from Vizgen, but indeed the prototype pipeline
Technology notebooks CosMx: no change
Technology notebooks Visium HD: visium hd mouse intenstine -> Visium HD
Technology notebooks Xenium: XOA 2.0.x -> Xenium
Implicit performance improvements when plotting raster data: visium breast cancer -> Visium

Analyse MibiTOF in Napari-SpatialData: no change
Analyse Nanostring data in Napari-SpatialData: nanostring cosmx -> CosMx (it's the same dataset, but here I mean to use the subsampled dataset and the download API)
Using Napari-SpatialData: nanostring cosmx -> CosMx
Use the Scatterwidget with AnnData from Notebook scatterwidget.ipynb: visium_hne_adata AnnData format -> choose a bigger object and SpatialData object: use Visium
Use the Scatterwidget with AnnData from Notebook scatterwidget_annotation.ipynb: same as above
annotation widget notebook: Visium

tasks

LucaMarconato added the docs 📜 Documentation-related issues label Sep 3, 2024

LucaMarconato mentioned this issue Sep 3, 2024

Improvement to the docs #631

Open

2 tasks

LucaMarconato mentioned this issue Sep 9, 2024

parquet support for visium HD scverse/scanpy#2992

Open

3 tasks