Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Enable mixed geometry type visualization #491

Closed
RaczeQ opened this issue Apr 29, 2024 · 9 comments · Fixed by #495
Closed

[EPIC] Enable mixed geometry type visualization #491

RaczeQ opened this issue Apr 29, 2024 · 9 comments · Fixed by #495
Milestone

Comments

@RaczeQ
Copy link
Contributor

RaczeQ commented Apr 29, 2024

Context

I have a GeoParquet file with geometries of mixed type (points, linestrings, polygons, multipolygons). I'd like to have it visualized at once, just like calling explore on a GeoDataFrame object with folium. Would it be possible to stack multiple layers on top of each other? I tried to filter out geometries based on the type in GeoArrow beforehand. but I'm also struggling with that (geoarrow/geoarrow-python#46).

Issue

Calling viz(pa_table) results in ValueError: Geometry type combination is not supported (['POINT', 'LINESTRING', 'POLYGON', 'MULTIPOLYGON'])

@RaczeQ
Copy link
Contributor Author

RaczeQ commented Apr 29, 2024

After looking into the stacktrace, it looks like shapely issue with to_ragged_array function.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 2
      1 # table = pq.read_table(osm_data_path)
----> 2 viz(table)
      3 # table

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:150, in viz(data, scatterplot_kwargs, path_kwargs, polygon_kwargs, map_kwargs)
    138     layers = [
    139         create_layer_from_data_input(
    140             item,
   (...)
    146         for i, item in enumerate(data)
    147     ]
    148 else:
    149     layers = [
--> 150         create_layer_from_data_input(
    151             data,
    152             _viz_color=color_ordering[0],
    153             scatterplot_kwargs=scatterplot_kwargs,
    154             path_kwargs=path_kwargs,
    155             polygon_kwargs=polygon_kwargs,
    156         )
    157     ]
    159 map_kwargs = {} if not map_kwargs else map_kwargs
    161 if "basemap_style" not in map_kwargs.keys():

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:186, in create_layer_from_data_input(data, **kwargs)
    184 # pyarrow table
    185 if isinstance(data, pa.Table):
--> 186     return _viz_geoarrow_table(data, **kwargs)
    188 # Shapely array
    189 if isinstance(data, np.ndarray) and np.issubdtype(data.dtype, np.object_):

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:365, in _viz_geoarrow_table(table, _viz_color, scatterplot_kwargs, path_kwargs, polygon_kwargs)
    356 def _viz_geoarrow_table(
    357     table: pa.Table,
    358     *,
   (...)
    362     polygon_kwargs: Optional[PolygonLayerKwargs] = None,
    363 ) -> Union[ScatterplotLayer, PathLayer, PolygonLayer]:
    364     table = remove_extension_classes(table)
--> 365     table = parse_wkb_table(table)
    367     geometry_column_index = get_geometry_column_index(table.schema)
    368     geometry_field = table.schema.field(geometry_column_index)

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_geoarrow/parse_wkb.py:33, in parse_wkb_table(table)
     31     extension_type_name = field.metadata.get(b"ARROW:extension:name")
     32     if extension_type_name in wkb_names:
---> 33         new_field, new_column = parse_wkb_column(field, column)
     34         table = table.set_column(field_idx, new_field, new_column)
     36 return table

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_geoarrow/parse_wkb.py:84, in parse_wkb_column(field, column)
     81 # We call shapely.from_wkb on the _entire column_ so that we don't get mixed type
     82 # arrays in each column.
     83 shapely_arr = shapely.from_wkb(column)
---> 84 new_field, geom_arr = construct_geometry_array(
     85     shapely_arr,
     86     crs_str=crs_str,
     87 )
     89 # Slice full array to maintain chunking
     90 chunk_offsets = [0]

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_geoarrow/extension_types.py:292, in construct_geometry_array(shapely_arr, include_z, field_name, crs_str)
    282 def construct_geometry_array(
    283     shapely_arr: NDArray[np.object_],
    284     include_z: Optional[bool] = None,
   (...)
    290     # extension metadata on the field without instantiating extension types into the
    291     # global pyarrow registry
--> 292     geom_type, coords, offsets = shapely.to_ragged_array(
    293         shapely_arr, include_z=include_z
    294     )
    296     if coords.shape[-1] == 2:
    297         dims = CoordinateDimension.XY

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/shapely/_ragged_array.py:291, in to_ragged_array(geometries, include_z)
    286         raise ValueError(
    287             "Geometry type combination is not supported "
    288             f"({[GeometryType(t).name for t in geom_types]})"
    289         )
    290 else:
--> 291     raise ValueError(
    292         "Geometry type combination is not supported "
    293         f"({[GeometryType(t).name for t in geom_types]})"
    294     )
    296 return typ, coords, offsets

ValueError: Geometry type combination is not supported (['POINT', 'LINESTRING', 'POLYGON', 'MULTIPOLYGON'])

@RaczeQ
Copy link
Contributor Author

RaczeQ commented Apr 29, 2024

Using this code results in another error:

from geoarrow.rust.core import read_parquet

table = read_parquet(osm_data_path)
viz(table)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[52], line 1
----> 1 viz(table)

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:150, in viz(data, scatterplot_kwargs, path_kwargs, polygon_kwargs, map_kwargs)
    138     layers = [
    139         create_layer_from_data_input(
    140             item,
   (...)
    146         for i, item in enumerate(data)
    147     ]
    148 else:
    149     layers = [
--> 150         create_layer_from_data_input(
    151             data,
    152             _viz_color=color_ordering[0],
    153             scatterplot_kwargs=scatterplot_kwargs,
    154             path_kwargs=path_kwargs,
    155             polygon_kwargs=polygon_kwargs,
    156         )
    157     ]
    159 map_kwargs = {} if not map_kwargs else map_kwargs
    161 if "basemap_style" not in map_kwargs.keys():

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/lonboard/_viz.py:204, in create_layer_from_data_input(data, **kwargs)
    202 if hasattr(data, "__arrow_c_stream__"):
    203     data = cast("ArrowStreamExportable", data)
--> 204     return _viz_geoarrow_table(pa.table(data), **kwargs)
    206 # Anything with __geo_interface__
    207 if hasattr(data, "__geo_interface__"):

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/table.pxi:5221, in pyarrow.lib.table()

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/ipc.pxi:880, in pyarrow.lib.RecordBatchReader._import_from_c_capsule()

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/error.pxi:154, in pyarrow.lib.pyarrow_internal_check_status()

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/pyarrow/error.pxi:88, in pyarrow.lib.check_status()

File /mnt/ssd/dev/quackosm/.venv/lib/python3.10/site-packages/geoarrow/pyarrow/_type.py:58, in GeometryExtensionType.__arrow_ext_deserialize__(cls, storage_type, serialized)
     55 schema = lib.SchemaHolder()
     56 storage_type._export_to_c(schema._addr())
---> 58 c_vector_type = lib.CVectorType.FromStorage(
     59     schema, cls._extension_name.encode("UTF-8"), serialized
     60 )
     62 return cls(c_vector_type)

File src/geoarrow/c/_lib.pyx:496, in geoarrow.c._lib.CVectorType.FromStorage()

File src/geoarrow/c/_lib.pyx:375, in geoarrow.c._lib.CVectorType._move_from_ctype()

ValueError: Failed to initialize GeoArrowSchemaView: Expected valid list type for coord parent 1 for extension 'geoarrow.multipoint'

@RaczeQ
Copy link
Contributor Author

RaczeQ commented Apr 29, 2024

File for tests:
monaco_nofilter_noclip_compact.zip

@kylebarron
Copy link
Member

It's correct that we don't currently support mixed geometry types as input.

It would be good to have a helper to do this. It looks like shapely.get_type_ids is quite fast (10ms for 3M points), so we'll add something like this to the internals of viz. You should be able to use this until the next release with just viz(split_gdf(gdf)).

from typing import List
from shapely import GeometryType


def split_gdf(gdf: gpd.GeoDataFrame) -> List[gpd.GeoDataFrame]:
    type_ids = np.array(shapely.get_type_id(gdf.geometry))
    unique_type_ids = set(np.unique(type_ids))

    if GeometryType.GEOMETRYCOLLECTION in unique_type_ids:
        raise ValueError("GeometryCollections not currently supported")

    if GeometryType.LINEARRING in unique_type_ids:
        raise ValueError("LinearRings not currently supported")

    if len(unique_type_ids) == 1:
        return [gdf]

    if len(unique_type_ids) == 2:
        if unique_type_ids == {GeometryType.POINT, GeometryType.MULTIPOINT}:
            return [gdf]

        if unique_type_ids == {GeometryType.LINESTRING, GeometryType.MULTILINESTRING}:
            return [gdf]

        if unique_type_ids == {GeometryType.POLYGON, GeometryType.MULTIPOLYGON}:
            return [gdf]

    gdfs = []
    point_indices = np.where(
        (type_ids == GeometryType.POINT) | (type_ids == GeometryType.MULTIPOINT)
    )[0]
    if len(point_indices) > 0:
        gdfs.append(gdf.iloc[point_indices])

    linestring_indices = np.where(
        (type_ids == GeometryType.LINESTRING)
        | (type_ids == GeometryType.MULTILINESTRING)
    )[0]
    if len(linestring_indices) > 0:
        gdfs.append(gdf.iloc[linestring_indices])

    polygon_indices = np.where(
        (type_ids == GeometryType.POLYGON) | (type_ids == GeometryType.MULTIPOLYGON)
    )[0]
    if len(polygon_indices) > 0:
        gdfs.append(gdf.iloc[polygon_indices])

    return gdfs

In the long run, we'll support mixed geometry types on the JS side as well, but I think unblocking Python users is good enough for now.

Using this code results in another error:

Can you make a separate issue for this? This could be a bug in geoarrow-rs or geoarrow-pyarrow.

@RaczeQ
Copy link
Contributor Author

RaczeQ commented Apr 30, 2024

Thanks @kylebarron for some guidance. For now, I've used duckdb to get the geometry type (geoarrow-c has some internal functions to get that when calculating unique geometry types, but it's not exposed to the python api) and pyarrow compute filter to separate a single table into three with this code:

import duckdb
import pyarrow as pa
import pyarrow.compute as pc
from geoarrow.pyarrow import io
from lonboard import viz

duckdb.install_extension("spatial")
duckdb.load_extension("spatial")

try:
    from osmnx import geocode_to_gdf

    import quackosm as qosm

    region = geocode_to_gdf("Chicago") # will donwload 279 mb file, works on a 16 gb machine
    # region = geocode_to_gdf("Greater London") # doesn't work on a 16 gb machine, can display points and linestrings as separate maps, but not polygons
    data_path = qosm.convert_geometry_to_gpq(region.unary_union)

    geoparquet_path = data_path
except Exception:
    geoparquet_path = "monaco_nofilter_noclip_compact.geoparquet" # use file from the zip attached, small example

geometry_types = (
    duckdb.read_parquet(str(geoparquet_path))
    .project("ST_GeometryType(ST_GeomFromWKB(geometry)) t")
    .to_arrow_table()["t"]
)

polygon_values = pa.array(["POLYGON", "MULTIPOLYGON"])
linestring_values = pa.array(["LINESTRING", "MULTILINESTRING"])
points_values = pa.array(["POINT", "MULTIPOINT"])
array_filters = (points_values, linestring_values, polygon_values)

geoarrow_tbl = io.read_geoparquet_table(geoparquet_path)

arrays = [
    geoarrow_tbl.filter(pc.is_in(geometry_types, value_set=search_values))
    for search_values in array_filters
]

viz(arrays)

Is lonboard currently optimized to display big polygon files? Points using the scatterplot layer are blazingly fast, but the polygon layer takes a toll on machine resources.

image
image

@kylebarron
Copy link
Member

Is lonboard currently optimized to display big polygon files? Points using the scatterplot layer are blazingly fast, but the polygon layer takes a toll on machine resources.

Polygons are necessarily much more consuming to draw than points. As is, Lonboard draws every polygon without any simplification. Your options to improve performance are:

  • Drawing with only an outline filled=False or with only a fill stroked=False.;
  • Simplifying polygons before rendering
  • Reducing the amount of data rendered.

@kylebarron
Copy link
Member

Longer term we may additionally implement an optional tiled approach, where e.g. I port https://github.com/mapbox/geojson-vt to Rust + Arrow and then we dynamically fetch tiles from Python to JS. That would allow a general use case of having more data in Python than you want to send to the browser. See #414

@kylebarron
Copy link
Member

File for tests: monaco_nofilter_noclip_compact.zip

By the way I think we've been thinking that GeoParquet files should still be saved with a .parquet file extension, not .geoparquet, but I opened an issue on the spec repo to discuss. opengeospatial/geoparquet#211

kylebarron added a commit that referenced this issue May 2, 2024
### Change list

- Handle mixed geometry types in `viz` for GeoDataFrames, Shapely
arrays, and `__geo_interface__` inputs
- Handle mixed geometry types in wkb-encoded geoarrow input.

Closes #491

Monaco OSM data from #491. cc @RaczeQ

<img width="577" alt="image"
src="https://github.com/developmentseed/lonboard/assets/15164633/601efa4c-a8f6-424c-9082-974b50d8b1b2">
@RaczeQ
Copy link
Contributor Author

RaczeQ commented May 5, 2024

Thank you Kyle for the help and incorporating this use-case into the lonboard 🙂
I have to admit, I wasn't aware that *.geoparquet extension might be not compliant with official GeoParquet specification, so I'll probably update my libraries for that as well 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants