-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPIC] Enable mixed geometry type visualization #491
Comments
After looking into the stacktrace, it looks like
|
Using this code results in another error: from geoarrow.rust.core import read_parquet
table = read_parquet(osm_data_path)
viz(table)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[52], line 1
----> 1 viz(table)
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:150, in viz(data, scatterplot_kwargs, path_kwargs, polygon_kwargs, map_kwargs)
138 layers = [
139 create_layer_from_data_input(
140 item,
(...)
146 for i, item in enumerate(data)
147 ]
148 else:
149 layers = [
--> 150 create_layer_from_data_input(
151 data,
152 _viz_color=color_ordering[0],
153 scatterplot_kwargs=scatterplot_kwargs,
154 path_kwargs=path_kwargs,
155 polygon_kwargs=polygon_kwargs,
156 )
157 ]
159 map_kwargs = {} if not map_kwargs else map_kwargs
161 if "basemap_style" not in map_kwargs.keys():
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:204, in create_layer_from_data_input(data, **kwargs)
202 if hasattr(data, "__arrow_c_stream__"):
203 data = cast("ArrowStreamExportable", data)
--> 204 return _viz_geoarrow_table(pa.table(data), **kwargs)
206 # Anything with __geo_interface__
207 if hasattr(data, "__geo_interface__"):
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/table.pxi:5221, in pyarrow.lib.table()
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/ipc.pxi:880, in pyarrow.lib.RecordBatchReader._import_from_c_capsule()
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/error.pxi:154, in pyarrow.lib.pyarrow_internal_check_status()
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/error.pxi:88, in pyarrow.lib.check_status()
File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/geoarrow/pyarrow/_type.py:58, in GeometryExtensionType.__arrow_ext_deserialize__(cls, storage_type, serialized)
55 schema = lib.SchemaHolder()
56 storage_type._export_to_c(schema._addr())
---> 58 c_vector_type = lib.CVectorType.FromStorage(
59 schema, cls._extension_name.encode("UTF-8"), serialized
60 )
62 return cls(c_vector_type)
File src/geoarrow/c/_lib.pyx:496, in geoarrow.c._lib.CVectorType.FromStorage()
File src/geoarrow/c/_lib.pyx:375, in geoarrow.c._lib.CVectorType._move_from_ctype()
ValueError: Failed to initialize GeoArrowSchemaView: Expected valid list type for coord parent 1 for extension 'geoarrow.multipoint' |
File for tests: |
It's correct that we don't currently support mixed geometry types as input. It would be good to have a helper to do this. It looks like from typing import List
from shapely import GeometryType
def split_gdf(gdf: gpd.GeoDataFrame) -> List[gpd.GeoDataFrame]:
type_ids = np.array(shapely.get_type_id(gdf.geometry))
unique_type_ids = set(np.unique(type_ids))
if GeometryType.GEOMETRYCOLLECTION in unique_type_ids:
raise ValueError("GeometryCollections not currently supported")
if GeometryType.LINEARRING in unique_type_ids:
raise ValueError("LinearRings not currently supported")
if len(unique_type_ids) == 1:
return [gdf]
if len(unique_type_ids) == 2:
if unique_type_ids == {GeometryType.POINT, GeometryType.MULTIPOINT}:
return [gdf]
if unique_type_ids == {GeometryType.LINESTRING, GeometryType.MULTILINESTRING}:
return [gdf]
if unique_type_ids == {GeometryType.POLYGON, GeometryType.MULTIPOLYGON}:
return [gdf]
gdfs = []
point_indices = np.where(
(type_ids == GeometryType.POINT) | (type_ids == GeometryType.MULTIPOINT)
)[0]
if len(point_indices) > 0:
gdfs.append(gdf.iloc[point_indices])
linestring_indices = np.where(
(type_ids == GeometryType.LINESTRING)
| (type_ids == GeometryType.MULTILINESTRING)
)[0]
if len(linestring_indices) > 0:
gdfs.append(gdf.iloc[linestring_indices])
polygon_indices = np.where(
(type_ids == GeometryType.POLYGON) | (type_ids == GeometryType.MULTIPOLYGON)
)[0]
if len(polygon_indices) > 0:
gdfs.append(gdf.iloc[polygon_indices])
return gdfs In the long run, we'll support mixed geometry types on the JS side as well, but I think unblocking Python users is good enough for now.
Can you make a separate issue for this? This could be a bug in geoarrow-rs or geoarrow-pyarrow. |
Thanks @kylebarron for some guidance. For now, I've used duckdb to get the geometry type ( import duckdb
import pyarrow as pa
import pyarrow.compute as pc
from geoarrow.pyarrow import io
from lonboard import viz
duckdb.install_extension("spatial")
duckdb.load_extension("spatial")
try:
from osmnx import geocode_to_gdf
import quackosm as qosm
region = geocode_to_gdf("Chicago") # will donwload 279 mb file, works on a 16 gb machine
# region = geocode_to_gdf("Greater London") # doesn't work on a 16 gb machine, can display points and linestrings as separate maps, but not polygons
data_path = qosm.convert_geometry_to_gpq(region.unary_union)
geoparquet_path = data_path
except Exception:
geoparquet_path = "monaco_nofilter_noclip_compact.geoparquet" # use file from the zip attached, small example
geometry_types = (
duckdb.read_parquet(str(geoparquet_path))
.project("ST_GeometryType(ST_GeomFromWKB(geometry)) t")
.to_arrow_table()["t"]
)
polygon_values = pa.array(["POLYGON", "MULTIPOLYGON"])
linestring_values = pa.array(["LINESTRING", "MULTILINESTRING"])
points_values = pa.array(["POINT", "MULTIPOINT"])
array_filters = (points_values, linestring_values, polygon_values)
geoarrow_tbl = io.read_geoparquet_table(geoparquet_path)
arrays = [
geoarrow_tbl.filter(pc.is_in(geometry_types, value_set=search_values))
for search_values in array_filters
]
viz(arrays) Is lonboard currently optimized to display big polygon files? Points using the scatterplot layer are blazingly fast, but the polygon layer takes a toll on machine resources. |
Polygons are necessarily much more consuming to draw than points. As is, Lonboard draws every polygon without any simplification. Your options to improve performance are:
|
Longer term we may additionally implement an optional tiled approach, where e.g. I port https://github.com/mapbox/geojson-vt to Rust + Arrow and then we dynamically fetch tiles from Python to JS. That would allow a general use case of having more data in Python than you want to send to the browser. See #414 |
By the way I think we've been thinking that GeoParquet files should still be saved with a |
### Change list - Handle mixed geometry types in `viz` for GeoDataFrames, Shapely arrays, and `__geo_interface__` inputs - Handle mixed geometry types in wkb-encoded geoarrow input. Closes #491 Monaco OSM data from #491. cc @RaczeQ <img width="577" alt="image" src="https://github.com/developmentseed/lonboard/assets/15164633/601efa4c-a8f6-424c-9082-974b50d8b1b2">
Thank you Kyle for the help and incorporating this use-case into the |
Context
I have a GeoParquet file with geometries of mixed type (points, linestrings, polygons, multipolygons). I'd like to have it visualized at once, just like calling
explore
on a GeoDataFrame object with folium. Would it be possible to stack multiple layers on top of each other? I tried to filter out geometries based on the type in GeoArrow beforehand. but I'm also struggling with that (geoarrow/geoarrow-python#46).Issue
Calling
viz(pa_table)
results inValueError: Geometry type combination is not supported (['POINT', 'LINESTRING', 'POLYGON', 'MULTIPOLYGON'])
The text was updated successfully, but these errors were encountered: