-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merfish #33
Merfish #33
Conversation
for more information, see https://pre-commit.ci
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #33 +/- ##
==========================================
- Coverage 42.98% 41.96% -1.02%
==========================================
Files 14 16 +2
Lines 663 853 +190
==========================================
+ Hits 285 358 +73
- Misses 378 495 +117
|
Thanks, look amazing, and I think that the use case around |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @quentinblampey thank you so much for this, this is super useful and looks really good! apologies that it took us so long to get back to you.
A major question I have regarding the scope is the name of the module/function. As in, if it's only specific to merscope type of platform, that I would rename everything to merscope
as it is done for the other platforms (we call it curio, not slide-seq for example).
if it's more general then just merscope, then I would split it in 2 modules, one for merscope, and one for general merfish (some pointers would be useful to what kind of tech that would be). They could ofc share majority of the code (if not all) but the user would see 2 separate functions.
wdyt ? also @LucaMarconato @kevinyamauchi
also, looks like most the pre-commits missing are type errors, let me know if you need any help in tackling those as they can be quite painful.
thanks again and looking forward to get this in!
Hey @giovp, thanks for your comments on the PR! I will make a new commit soon Yep, the function can be renamed "merscope" (the reader is indeed specific to the merscope). As far as I know, only the merscope is merfish-based, so this was why I used this name, but I agree that this is ambiguous (and new merfish technologies may be developed) :) |
sounds great! I would then rename also the module and follow same structure as the one for other readers. Thanks a lot again for this and looking forward to getting this in! |
Hi again @giovp, I renamed the function and fixed the mypy/flake8 pre-commit errors Another thing: for now I supposed that all cell boundary polygons were the same on all z-levels, and I save only one |
Hi @quentinblampey , spatialdata cannot support 3d boundaries on shapes. What is a good resolution to save the polygons at a fixed z layer? I wonder if all of them should be loaded/saved? |
Ok, I was just anticipating if some users want to use 3D boundaries. Maybe it's okay to only save the polygons of one z-layer (i.e., what I currently implemented) Also, I'm not sure I understood your first question, what do you mean exactly? |
ah sorry, I meant indeed what is the best z stack to choose for the 2D polygons, which I understand it is what you are doing now. I replied in line to the comment, I think it makes sense to alow user to selectively load parts, all or None of the image, what do you think>? |
As you suggested @giovp, I added an argument to choose between no image, or one z-layer, or a list of z-layers For the transcripts, I also now use only one 2D And for the polygons, I supposed that users are not performing 3D boundaries. In this case, the vigzen preprocessing tool simply duplicates the same polygons on all z-layers, so I just select the polygons from NB: why did mypy complained about the python 3.10 union typing (with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks really good, some very minor things on documentation.
Genral question, is there any way we can get a unique id for the dataset (similar to 10x genomics library_id?). this way it would be easier to name elements in a unique way that then is useful when concatenated. E.g. right now shapes are defined like this
shapes = {"polygons": polygons}
it would be better to have something like this
shapes = {f"{dataset_id}_polygons": polygons}
if it's not possible to infer it from the dataset, maybe we can ask the user to input it as an argument? which then will be used to populate the name of all elements (e.g. f"{dataset_id}_images"
etc). What do you think?
Regarding mypy, I think you were missing a from future import annotations, can you try with that one added?
Indeed it's great to add a I added arguments for the region and slide name, but by default, I try to infer it using the provided directory path Thanks for helping me solve the mypy issue :) |
great! happy to get it merged, one very last thing, can you add merscope to the API docs, see in it might be that they fail cause a newline is missing for the bulletpoint in the params. I find that bulletpoint usually works if they are indetened and with 1 line space before and after the bulletpoint block |
Yep, this is added in the docs! I also updated the README so that the "MERSCOPE reader" is not in the "Coming soon" section anymore I checked the docs, and it looks all good to me 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fantastic work @quentinblampey ! LGTM!
Thanks again @quentinblampey for the PR!
I just found out the cause, it was a bug in @parse.register(pd.DataFrame)
@parse.register(DaskDataFrame)
@classmethod
def _(
cls,
data: pd.DataFrame,
coordinates: Mapping[str, str],
feature_key: str | None = None,
instance_key: str | None = None,
transformations: MappingToCoordinateSystem_t | None = None,
**kwargs: Any,
) -> DaskDataFrame:
if "npartitions" not in kwargs and "chunksize" not in kwargs:
kwargs["npartitions"] = cls.NPARTITIONS
ndim = len(coordinates)
axes = [X, Y, Z][:ndim]
if isinstance(data, pd.DataFrame):
table: DaskDataFrame = dd.from_pandas( # type: ignore[attr-defined]
pd.DataFrame(data[[coordinates[ax] for ax in axes]].to_numpy(), columns=axes), **kwargs
)
if feature_key is not None:
feature_categ = dd.from_pandas(
data[feature_key].astype(str).astype("category"),
**kwargs,
) # type: ignore[attr-defined]
table[feature_key] = feature_categ
elif isinstance(data, dd.DataFrame): # type: ignore[attr-defined]
table = data[[coordinates[ax] for ax in axes]]
table.columns = axes
if feature_key is not None and data[feature_key].dtype.name != "category":
table[feature_key] = data[feature_key].astype(str).astype("category")
if instance_key is not None:
table[instance_key] = data[instance_key]
# BUG HERE: before the code was without ", *coordinates.keys()"
# since coordinates was set to {'x': 'global_x', 'y': 'global_y'}, we had that table['x'] and table['y'] were
# correctly set to data['global_x'] and data['global_y'], but then the for loop below, was replacing table['x'] and table['y']
# with data['x'] and data['y'], causing the misalignment of the data. I will make a PR for correcting this (also adding tests)
for c in set(data.columns) - {feature_key, instance_key, *coordinates.values(), *coordinates.keys()}:
table[c] = data[c]
return cls._add_metadata_and_validate(
table, feature_key=feature_key, instance_key=instance_key, transformations=transformations
) I will make a PR in |
Thanks @LucaMarconato for fixing this, I confirm that now everything is aligned as intended! |
Adding a MERFISH reader (latest vizgen folder structure, with parquet boundary files).
Comments:
vpt_outputs
remainsNone
as default ; (ii) if the user is using VPT with standard file names, it only has to provide a directory path asvpt_outputs
; (iii) if the user has chosen new filenames for VPT, thenvpt_outputs
is adict
with all the file paths.adata.obsm
to not disturb the analysis performed onadata.X
z<i>
where<i>
is the z-layer index.Important note:
For now, it seems that the points are not aligned on the images/shapes when using
napari-spatialdata
. The transformation seems correct and work perfectly fine for shapes, but not for points. I tried switching the axis, inversing the transformation, but didn't fixed the issue. I don't understand why it works for shapes but not for points although they are defined on the same space and have the same x/y coordinates range. (Currently discussing about this with @LucaMarconato)