Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bt.io.prep(sdata) error: operands could not be broadcast together with shapes (2,) (3,) #161

Closed
wgmao opened this issue Oct 14, 2024 · 2 comments · Fixed by #158
Closed

Comments

@wgmao
Copy link

wgmao commented Oct 14, 2024

Thank you for developing this wonderful tool! I am working with a Xenium dataset but came across an error when running bt.io.prep following the tutorial. The zarr object was prepared using spatialdata_io(). I have attached the log and the object I am working on has the following summary. I think the problem is caused by the 3D coordinates of the transcripts. Is there an easy way to modify the object and only keep the x&y coordinates? Thank you!
bento.log

PS: I have tried sdata_crop.points['transcripts'] = sdata_crop.points['transcripts'].drop('z', axis = 1). transcripts now becomes (2D points) but the error persists.

SpatialData object, with associated Zarr store: /path_to_data.zarr/data.zarr
├── Images
│     └── 'morphology_focus': DataTree[cyx] (1, 54533, 14990), (1, 27266, 7495), (1, 13633, 3747), (1, 6816, 1873), (1, 3408, 936)
├── Labels
│     ├── 'cell_labels': DataTree[yx] (54533, 14990), (27266, 7495), (13633, 3747), (6816, 1873), (3408, 936)
│     └── 'nucleus_labels': DataTree[yx] (54533, 14990), (27266, 7495), (13633, 3747), (6816, 1873), (3408, 936)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 11) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (90808, 1) (2D shapes)
│     └── 'nucleus_boundaries': GeoDataFrame shape: (90808, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (90808, 377)
with coordinate systems:
    ▸ 'global', with elements:
        morphology_focus (Images), cell_labels (Labels), nucleus_labels (Labels), transcripts (Points), cell_boundaries (Shapes), nucleus_boundaries (Shapes)
@ckmah
Copy link
Collaborator

ckmah commented Oct 14, 2024

Hi @wgmao, thanks for reporting this it's a little tedious but here is the current solution. You are on the right track, we drop the 3rd dimension on the points like so (you may need to set feature_key/instance_key values to your dataset's):

# Get the scale vector for x and y coordinates from the transcripts points
xyz_scale_vector = sd.transformations.get_transformation(
    sdata.points["transcripts"]
).to_scale_vector(["x", "y"])

# Create a Scale transformation for x and y axes
xy_scale = sd.transformations.Scale(scale=xyz_scale_vector, axes=["x", "y"])

# Get the current transformation attributes
transform = sdata.points["transcripts"].attrs

# Update the global transformation with the new xy_scale
transform["transform"]["global"] = xy_scale

# Parse the transcripts data into a PointsModel
sdata.points["transcripts"] = sd.models.PointsModel.parse(
    sdata.points["transcripts"].compute().reset_index(drop=True),
    coordinates={"x": "x", "y": "y"},
    feature_key="feature_name",
    instance_key="cell_id",
)

# Set the updated transformation attributes back to the transcripts points
sdata.points["transcripts"].attrs = transform

@ckmah ckmah linked a pull request Oct 15, 2024 that will close this issue
1 task
@wgmao
Copy link
Author

wgmao commented Oct 22, 2024

Thank you for your prompt response! I will try it out and see if the problem gets resolved. Meanwhile, I tried to bypass this error following the Data Prep Guide. The goal here is to construct a SpatialData manually.

from spatialdata.models import PointsModel, ShapesModel

shapes = dict(
    cell_boundaries = ShapesModel.parse(sdata_crop.shapes['cell_boundaries']),
    nucleus_boundaries = ShapesModel.parse(sdata_crop.shapes['nucleus_boundaries'])
)

points = dict(
    transcripts = PointsModel.parse(
        sdata_crop.points['transcripts'].compute(), coordinates = {"x": "x", "y":"y"}, feature_key = "feature_name"
    )
)

sdata_2 = sd.SpatialData(
    shapes = shapes,
    points = points,
    tables = sdata_crop.tables
)

Then I followed the main guide with sdata_2 as input. sdata_2 worked well with bt.io.prep(), but there were panels missing when using bt.pl.shape_stats() (see attached). Further when running bt.tl.flux(), there was an error stating that

File "python3.10/site-packages/bento/_utils.py", line 93, in get_points
    raise ValueError(f"Points key {points_key} not found in sdata.points")
ValueError: Points key cell_boundaries_raster not found in sdata.points

I looked into the _flux.py a little and noticed that cells, gene_names and cell_composition were all empty objects around line 148 (inside the flux function). My takeaway is that there are some formatting issues embedded somewhere.

output

@ckmah ckmah mentioned this issue Dec 18, 2024
@ckmah ckmah closed this as completed in ef986c2 Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants