Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline datasets for documenation #700

Open
6 tasks
LucaMarconato opened this issue Sep 3, 2024 · 1 comment
Open
6 tasks

Streamline datasets for documenation #700

LucaMarconato opened this issue Sep 3, 2024 · 1 comment
Labels
docs 📜 Documentation-related issues

Comments

@LucaMarconato
Copy link
Member

We should make the usage of datasets more heterogenous across the notebooks in the docs.

Practically:

  • select 1, max 3, small datasets (<1 GB each, ideally ~100 MB), use these datasets in all the notebooks across the repos:
    • spatialdata
    • spatialdata-plot
    • napari-spatialdata
  • in particular, remove the non-bio datasets from the docs (e.g. remove the raccoon dataset from the transformation notebook, and the blobs dataset from the aggregation and rasterize notebooks)
  • implement a dataset class, like in squidpy, to automatically download the datasets

CC @timtreis @melonora

@LucaMarconato LucaMarconato added the docs 📜 Documentation-related issues label Sep 3, 2024
@LucaMarconato
Copy link
Member Author

LucaMarconato commented Sep 3, 2024

We will use the following datasets:

For citations (used in the readme of spatialdata-notebooks/datasets and when we use the data in the tutorial notebooks), here are the referneces:

spatialdata notebooks

  • I. Use SpatialData with your data: the SpatialData object. No dataset used -> the SpatialData object. No dataset used
  • II. Use SpatialData with your data: SpatialElements and tables: custom dataset -> no change
  • (needed for the workshops) Transformations and coordinate systems: raccoon -> Xenium
  • Spatial query: visium mouse brain -> Visium
  • Annotating regions of interest with napari: visium breast cancer -> Visium
  • Use landmark annotations to align multiple -omics layers: visium + xenium breast cancer -> Visium + Visium HD
  • Working with annotations in SpatialData: blobs -> Xenium
  • Integrate/aggregate signals across spatial layers: blobs and custom dataset -> Xenium
  • Interchangeability between raster and vector representations: blobs + custom binned dataset -> Xenium + Visium HD (for the bins)
  • Squidpy integration: xenium breast cancer -> Xenium
  • (needed for the workshops) Deep learning example on image tiles: xenium breast cancer (annotated with the xenium_visium_00 paper notebook) + visium breast cancer
    • move the notebook into the paper reproducibillity notebook
    • link the old notebook in the one
    • the new notebook is going to be very lightweight and not training a model (just dataloader): using Visium

spatialdata-plot notebooks

  • Technology notebooks Visium: visium mouse brain -> Visium
  • Technology notebooks MIBI-TOF: no change
  • Technology notebooks MERFISH: no change, but:
    • rename the title to "MERFISH prototype pipeline"
    • explain in the notebook that this is not MERFISH from Vizgen, but indeed the prototype pipeline
  • Technology notebooks CosMx: no change
  • Technology notebooks Visium HD: visium hd mouse intenstine -> Visium HD
  • Technology notebooks Xenium: XOA 2.0.x -> Xenium
  • Implicit performance improvements when plotting raster data: visium breast cancer -> Visium

napari-spatialdata notebooks

  • Analyse MibiTOF in Napari-SpatialData: no change
  • Analyse Nanostring data in Napari-SpatialData: nanostring cosmx -> CosMx (it's the same dataset, but here I mean to use the subsampled dataset and the download API)
  • Using Napari-SpatialData: nanostring cosmx -> CosMx
  • Use the Scatterwidget with AnnData from Notebook scatterwidget.ipynb: visium_hne_adata AnnData format -> choose a bigger object and SpatialData object: use Visium
  • Use the Scatterwidget with AnnData from Notebook scatterwidget_annotation.ipynb: same as above
  • annotation widget notebook: Visium

tasks

  • cosmx, mibitof and merfish are already available in spatialdata-sandbox, add the 3 missing datasets.
  • we add the missing datasets to the Readme in spatialdata-notebooks/datatasets.
    • add a disclaimer that the data is a subet
  • and for the elected 6 datasets above, we say that we use them in the docs
  • add a job in the data CI
    • that converts the raw data to SpatialData
    • subsets the data
    • write them to Zarr
    • upload them to S3.
  • write a small frontend (like the squidpy one), so that we download the data via code, better for the user
  • add a disclaimer that the data is a subset also in each notebook

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs 📜 Documentation-related issues
Projects
None yet
Development

No branches or pull requests

1 participant