Merfish #33

quentinblampey · 2023-04-20T09:21:29Z

Adding a MERFISH reader (latest vizgen folder structure, with parquet boundary files).

Comments:

Vigzen has a post-processing tool called VPT (https://vizgen.github.io/vizgen-postprocessing/) that helps the user improve the default segmentation. Since it's very flexible, the user can define in which folder the new files will be stored, and can also define a new name for each file. For now, I thought that the easiest way to deal with that is: (i) if the user is using the default segmentation, then vpt_outputs remains None as default ; (ii) if the user is using VPT with standard file names, it only has to provide a directory path as vpt_outputs ; (iii) if the user has chosen new filenames for VPT, then vpt_outputs is a dict with all the file paths.
Sometimes, vizgen data contains some "blank" gene names/fields (but still has some counts), so I moved such fields into adata.obsm to not disturb the analysis performed on adata.X
The images are pseudo-3D because its composed of seven 2D-images separated by few microns on the z-axis. For now, I made multiple 2D-images, whose name are z<i> where <i> is the z-layer index.

Important note:
For now, it seems that the points are not aligned on the images/shapes when using napari-spatialdata. The transformation seems correct and work perfectly fine for shapes, but not for points. I tried switching the axis, inversing the transformation, but didn't fixed the issue. I don't understand why it works for shapes but not for points although they are defined on the same space and have the same x/y coordinates range. (Currently discussing about this with @LucaMarconato)

for more information, see https://pre-commit.ci

codecov-commenter · 2023-04-20T09:26:28Z

Codecov Report

Merging #33 (71f075a) into main (664fa44) will decrease coverage by 1.02%.
The diff coverage is 43.01%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #33      +/-   ##
==========================================
- Coverage   42.98%   41.96%   -1.02%     
==========================================
  Files          14       16       +2     
  Lines         663      853     +190     
==========================================
+ Hits          285      358      +73     
- Misses        378      495     +117

Impacted Files	Coverage Δ
src/spatialdata_io/readers/merscope.py	`27.39% <27.39%> (ø)`
src/spatialdata_io/__init__.py	`100.00% <100.00%> (ø)`
src/spatialdata_io/_constants/_constants.py	`100.00% <100.00%> (ø)`

... and 3 files with indirect coverage changes

LucaMarconato · 2023-04-20T12:25:37Z

Thanks, look amazing, and I think that the use case around vpt_outputs is a very handy one.

…nsformations

for more information, see https://pre-commit.ci

giovp

hi @quentinblampey thank you so much for this, this is super useful and looks really good! apologies that it took us so long to get back to you.

A major question I have regarding the scope is the name of the module/function. As in, if it's only specific to merscope type of platform, that I would rename everything to merscope as it is done for the other platforms (we call it curio, not slide-seq for example).

if it's more general then just merscope, then I would split it in 2 modules, one for merscope, and one for general merfish (some pointers would be useful to what kind of tech that would be). They could ofc share majority of the code (if not all) but the user would see 2 separate functions.

wdyt ? also @LucaMarconato @kevinyamauchi

also, looks like most the pre-commits missing are type errors, let me know if you need any help in tackling those as they can be quite painful.

thanks again and looking forward to get this in!

src/spatialdata_io/readers/merfish.py

quentinblampey · 2023-06-12T08:47:50Z

Hey @giovp, thanks for your comments on the PR! I will make a new commit soon

Yep, the function can be renamed "merscope" (the reader is indeed specific to the merscope). As far as I know, only the merscope is merfish-based, so this was why I used this name, but I agree that this is ambiguous (and new merfish technologies may be developed) :)

…to merfish

giovp · 2023-06-12T13:33:25Z

Hey @giovp, thanks for your comments on the PR! I will make a new commit soon

Yep, the function can be renamed "merscope" (the reader is indeed specific to the merscope). As far as I know, only the merscope is merfish-based, so this was why I used this name, but I agree that this is ambiguous (and new merfish technologies may be developed) :)

sounds great! I would then rename also the module and follow same structure as the one for other readers. Thanks a lot again for this and looking forward to getting this in!

quentinblampey · 2023-06-13T13:35:53Z

Hi again @giovp, I renamed the function and fixed the mypy/flake8 pre-commit errors
We now still have to decide on the three remaining comments you did above

Another thing: for now I supposed that all cell boundary polygons were the same on all z-levels, and I save only one ShapesModel instance for all z-layers. Indeed, all the seven z-level are very similar, so I personally always perform a segmentation that is common to all z-layers. But I don't know if some people run a segmentation on each z-layer separately (to aggregate the polygons in a 3D shape), and I don't know if spatialdata can handle 3D boundaries?

giovp · 2023-06-13T14:22:34Z

Hi @quentinblampey , spatialdata cannot support 3d boundaries on shapes. What is a good resolution to save the polygons at a fixed z layer? I wonder if all of them should be loaded/saved?

quentinblampey · 2023-06-13T14:50:19Z

Hi @quentinblampey , spatialdata cannot support 3d boundaries on shapes. What is a good resolution to save the polygons at a fixed z layer? I wonder if all of them should be loaded/saved?

Ok, I was just anticipating if some users want to use 3D boundaries. Maybe it's okay to only save the polygons of one z-layer (i.e., what I currently implemented)
Indeed, the segmentation is 2D by default, and I don't know if people are really making 3D boundaries on MERSCOPE data.

Also, I'm not sure I understood your first question, what do you mean exactly?

giovp · 2023-06-13T16:24:58Z

ah sorry, I meant indeed what is the best z stack to choose for the 2D polygons, which I understand it is what you are doing now. I replied in line to the comment, I think it makes sense to alow user to selectively load parts, all or None of the image, what do you think>?

quentinblampey · 2023-06-14T09:07:54Z

As you suggested @giovp, I added an argument to choose between no image, or one z-layer, or a list of z-layers
By default, I load only images of z_layer=3. I think this is a good default value, as this is also the one by default in the official Vizgen visualizer

For the transcripts, I also now use only one 2D PointsModel object

And for the polygons, I supposed that users are not performing 3D boundaries. In this case, the vigzen preprocessing tool simply duplicates the same polygons on all z-layers, so I just select the polygons from z_layer=0 (actually, I'm not even sure that the vizgen segmentation can handle/produce 3D boundaries, so it should be fine for now)

NB: why did mypy complained about the python 3.10 union typing (with | instead of Union) while other readers use the 3.10 syntax?

giovp

it looks really good, some very minor things on documentation.

Genral question, is there any way we can get a unique id for the dataset (similar to 10x genomics library_id?). this way it would be easier to name elements in a unique way that then is useful when concatenated. E.g. right now shapes are defined like this

shapes = {"polygons": polygons}

it would be better to have something like this

shapes = {f"{dataset_id}_polygons": polygons}

if it's not possible to infer it from the dataset, maybe we can ask the user to input it as an argument? which then will be used to populate the name of all elements (e.g. f"{dataset_id}_images" etc). What do you think?

Regarding mypy, I think you were missing a from future import annotations, can you try with that one added?

src/spatialdata_io/readers/merscope.py

quentinblampey · 2023-06-19T15:46:09Z

Indeed it's great to add a dataset_id!
We can easily infer it given the provided directory path. This path typically looks like 202305021210_XXX_VMSC09302/region_0, where region_0 is the name of the region of interest, and 202305021210_XXX_VMSC09302 is the name of the slide (i.e. one MERSCOPE run).
Most of the time, there is only one region per slide, but it's possible to have multiple (for instance, it's possible to have multiple patients for one MERSCOPE run). In case there are multiple regions, it raises another question: should we load all the regions at once? For now, I ask for the region directory, but I can ask for the slide directory and iterate over all "region_X"

I added arguments for the region and slide name, but by default, I try to infer it using the provided directory path
Then, I simply define dataset_id = f"{slide_name}_{region_name}"

Thanks for helping me solve the mypy issue :)

giovp · 2023-06-19T16:23:45Z

great! happy to get it merged, one very last thing, can you add merscope to the API docs, see in docs/api.md you can also see how docs are rendered in this PR if you click to details int he rtd CI

it might be that they fail cause a newline is missing for the bulletpoint in the params. I find that bulletpoint usually works if they are indetened and with 1 line space before and after the bulletpoint block

quentinblampey · 2023-06-20T08:46:04Z

Yep, this is added in the docs! I also updated the README so that the "MERSCOPE reader" is not in the "Coming soon" section anymore

I checked the docs, and it looks all good to me 👍

giovp

fantastic work @quentinblampey ! LGTM!

LucaMarconato · 2023-07-14T14:47:53Z

Thanks again @quentinblampey for the PR!

Important note:
For now, it seems that the points are not aligned on the images/shapes when using napari-spatialdata. The transformation seems correct and work perfectly fine for shapes, but not for points. I tried switching the axis, inversing the transformation, but didn't fixed the issue. I don't understand why it works for shapes but not for points although they are defined on the same space and have the same x/y coordinates range. (Currently discussing about this with @LucaMarconato)

I just found out the cause, it was a bug in PointsModels, not in the transformations code. This is the code of PointsModel.parse(), the bug is in one of the last lines, I explain it in a comment.

    @parse.register(pd.DataFrame)
    @parse.register(DaskDataFrame)
    @classmethod
    def _(
        cls,
        data: pd.DataFrame,
        coordinates: Mapping[str, str],
        feature_key: str | None = None,
        instance_key: str | None = None,
        transformations: MappingToCoordinateSystem_t | None = None,
        **kwargs: Any,
    ) -> DaskDataFrame:
        if "npartitions" not in kwargs and "chunksize" not in kwargs:
            kwargs["npartitions"] = cls.NPARTITIONS
        ndim = len(coordinates)
        axes = [X, Y, Z][:ndim]
        if isinstance(data, pd.DataFrame):
            table: DaskDataFrame = dd.from_pandas(  # type: ignore[attr-defined]
                pd.DataFrame(data[[coordinates[ax] for ax in axes]].to_numpy(), columns=axes), **kwargs
            )
            if feature_key is not None:
                feature_categ = dd.from_pandas(
                    data[feature_key].astype(str).astype("category"),
                    **kwargs,
                )  # type: ignore[attr-defined]
                table[feature_key] = feature_categ
        elif isinstance(data, dd.DataFrame):  # type: ignore[attr-defined]
            table = data[[coordinates[ax] for ax in axes]]
            table.columns = axes
            if feature_key is not None and data[feature_key].dtype.name != "category":
                table[feature_key] = data[feature_key].astype(str).astype("category")
        if instance_key is not None:
            table[instance_key] = data[instance_key]
		# BUG HERE: before the code was without ", *coordinates.keys()"
		# since coordinates was set to {'x': 'global_x', 'y': 'global_y'}, we had that table['x'] and table['y'] were 
		# correctly set to data['global_x'] and data['global_y'], but then the for loop below, was replacing table['x'] and table['y']
		# with data['x'] and data['y'], causing the misalignment of the data. I will make a PR for correcting this (also adding tests)
        for c in set(data.columns) - {feature_key, instance_key, *coordinates.values(), *coordinates.keys()}:
            table[c] = data[c]
        return cls._add_metadata_and_validate(
            table, feature_key=feature_key, instance_key=instance_key, transformations=transformations
        )

I will make a PR in spatialdata to fix the bug and I will merge the PR in spatialdata-sandbox to download and transform to Zarr the merfish data. Now everything works.

quentinblampey · 2023-07-17T11:45:03Z

Thanks @LucaMarconato for fixing this, I confirm that now everything is aligned as intended!

quentinblampey and others added 5 commits April 19, 2023 14:48

merfish reader v0 (handle vpt outputs)

8df3521

Merge branch 'main' into merfish

1ae190c

merfish reader: minor fixes (assert and KEY.value)

713b37c

add staining channel names to merfish reader

511412d

[pre-commit.ci] auto fixes from pre-commit.com hooks

71a958e

for more information, see https://pre-commit.ci

quentinblampey mentioned this pull request Apr 21, 2023

Download vizgen_merscope sample giovp/spatialdata-sandbox#24

Merged

LucaMarconato and others added 5 commits April 21, 2023 20:49

points to categorical, splitting points by z value, debugging the tra…

a0dc6ec

…nsformations

[pre-commit.ci] auto fixes from pre-commit.com hooks

ca89463

for more information, see https://pre-commit.ci

merfish reader support dict paths inputs

663a406

Merge branch 'main' into merfish

6370193

fix some (but not all) pre-commit checks

8453fa7

giovp requested changes Jun 9, 2023

View reviewed changes

Merge branch 'merfish' of github.com:quentinblampey/spatialdata-io in…

262e9f5

…to merfish

quentinblampey added 2 commits June 13, 2023 14:40

merscope: renaming + solving mypy type errors

7bc0efd

fix flake8 errors

9257b55

quentinblampey added 2 commits June 14, 2023 10:39

z-layers argument + consider only 2D stacks

ca7ae96

back to py3.9 Union typing syntax

2452a79

giovp self-requested a review June 19, 2023 13:25

giovp requested changes Jun 19, 2023

View reviewed changes

src/spatialdata_io/readers/merscope.py Show resolved Hide resolved

src/spatialdata_io/readers/merscope.py Outdated Show resolved Hide resolved

src/spatialdata_io/readers/merscope.py Outdated Show resolved Hide resolved

src/spatialdata_io/readers/merscope.py Outdated Show resolved Hide resolved

quentinblampey added 2 commits June 19, 2023 17:29

infer the dataset id using the directory path

0226e53

fix flake8 unindent warning

71f075a

adding the merscope reader to the docs API

c488ccb

giovp approved these changes Jun 20, 2023

View reviewed changes

giovp merged commit 07053ae into scverse:main Jun 20, 2023

LucaMarconato mentioned this pull request Jul 14, 2023

fixed bugs in points models scverse/spatialdata#322

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merfish #33

Merfish #33

quentinblampey commented Apr 20, 2023

codecov-commenter commented Apr 20, 2023 •

edited

Loading

LucaMarconato commented Apr 20, 2023

giovp left a comment

quentinblampey commented Jun 12, 2023

giovp commented Jun 12, 2023

quentinblampey commented Jun 13, 2023

giovp commented Jun 13, 2023

quentinblampey commented Jun 13, 2023

giovp commented Jun 13, 2023

quentinblampey commented Jun 14, 2023

giovp left a comment

quentinblampey commented Jun 19, 2023

giovp commented Jun 19, 2023

quentinblampey commented Jun 20, 2023

giovp left a comment

LucaMarconato commented Jul 14, 2023 •

edited

Loading

quentinblampey commented Jul 17, 2023

Merfish #33

Merfish #33

Conversation

quentinblampey commented Apr 20, 2023

codecov-commenter commented Apr 20, 2023 • edited Loading

Codecov Report

LucaMarconato commented Apr 20, 2023

giovp left a comment

Choose a reason for hiding this comment

quentinblampey commented Jun 12, 2023

giovp commented Jun 12, 2023

quentinblampey commented Jun 13, 2023

giovp commented Jun 13, 2023

quentinblampey commented Jun 13, 2023

giovp commented Jun 13, 2023

quentinblampey commented Jun 14, 2023

giovp left a comment

Choose a reason for hiding this comment

quentinblampey commented Jun 19, 2023

giovp commented Jun 19, 2023

quentinblampey commented Jun 20, 2023

giovp left a comment

Choose a reason for hiding this comment

LucaMarconato commented Jul 14, 2023 • edited Loading

quentinblampey commented Jul 17, 2023

codecov-commenter commented Apr 20, 2023 •

edited

Loading

LucaMarconato commented Jul 14, 2023 •

edited

Loading