Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bounding box query vs polygon query for circles #669

Open
LucaMarconato opened this issue Aug 9, 2024 · 2 comments
Open

Bounding box query vs polygon query for circles #669

LucaMarconato opened this issue Aug 9, 2024 · 2 comments

Comments

@LucaMarconato
Copy link
Member

LucaMarconato commented Aug 9, 2024

Bounding box query selects circles whose center is contained inside the box, polygon query selects circles that intersect the box in any point.

You can reproduce with this code:

##
from spatialdata.datasets import blobs
from napari_spatialdata import Interactive
import matplotlib.pyplot as plt
import spatialdata_plot
import spatialdata as sd
from array import array
from shapely.geometry import Polygon
from geopandas import GeoDataFrame

plt.style.use('dark_background')

# define square to use as query geometry
x_coords = array("d", [243.331349706879, 348.864955663264, 348.8649556632623, 243.3313497068773, 243.331349706879])
y_coords = array("d", [392.5609882574645, 392.56098825746835, 214.4730282060675, 214.47302820606367, 392.5609882574645])

minx, maxx = min(x_coords), max(x_coords)
miny, maxy = min(y_coords), max(y_coords)

# rearrange to the vars needs to a bounding_box_query
min_coordinates = [minx, miny]
max_coordinates = [maxx, maxy]

# turn into shapely polygon
polygon = Polygon(zip(x_coords, y_coords))

# Print the results
print("Min coordinates:", min_coordinates)
print("Max coordinates:", max_coordinates)
print("Polygon:", polygon)

##
sdata = blobs().subset(["blobs_polygons", "blobs_circles", "blobs_image"])
sdata["box"] = sd.models.ShapesModel.parse(GeoDataFrame(geometry=[polygon]))

##
sdata.pl.render_images().pl.render_shapes("blobs_circles", color="red").pl.render_shapes(
    "blobs_polygons", color="white"
).pl.render_shapes("box", color="#FFFFFF66").pl.show()
plt.show()

## bounding box query
queried_bb = sdata.query.bounding_box(
    axes=("x", "y"), min_coordinate=min_coordinates, max_coordinate=max_coordinates, target_coordinate_system="global"
)

queried_bb.pl.render_images().pl.render_shapes("blobs_circles", color="red").pl.render_shapes(
    "blobs_polygons", color="white"
).pl.render_shapes("box", color="#FFFFFF66").pl.show()
plt.show()

## polygon query
queried_pol = sdata.query.polygon(polygon, target_coordinate_system='global')
queried_pol.pl.render_images().pl.render_shapes("blobs_circles", color="red").pl.render_shapes(
    "blobs_polygons", color="white"
).pl.render_shapes("box", color="#FFFFFF66").pl.show()
plt.show()

Which gives these three plots.

Full data

image

Bounding box query

image

Polygon query

image

@LucaMarconato
Copy link
Member Author

LucaMarconato commented Aug 9, 2024

A solution would be to benchmark the polygon query and bounding box implementation for points and shapes and see if one is significantly faster than the other. Ideally the polygon query is fast enough and we can drop the bounding box query implementation for points and shapes.

Note that polygon query implementation for raster types is simply calling the implementation of bounding box query for raster types; this will not change.

@giovp
Copy link
Member

giovp commented Sep 8, 2024

Ideally the polygon query is fast enough and we can drop the bounding box query implementation for points and shapes.

good point, I can see that it's faster with geopandas already vectorized methods, but would be good to benchmark indeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants