-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
interactive plotting with lonboard
#67
Conversation
|
||
Returns | ||
------- | ||
map : lonboard.Map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like you have docs yet, but if you use sphinx or mkdocs (with mkdocstrings), lonboard implements intersphinx, so you should be able to set it up so the return value is a hyperlink to the lonboard docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the hint. I'll probably use sphinx
to set up the docs, so I'll definitely make use of that!
xdggs/plotting.py
Outdated
cmap="viridis", | ||
center=None, | ||
): | ||
import geopandas as gpd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case you have interest, it would be possible to implement this conversion in a way that doesn't require GeoPandas (or pyarrow), by using raw numpy buffers and arro3, a more minimal implementation of Arrow in Python that Lonboard adds as a dependency in the next version.
This would also be faster to render as it doesn't have to go through GeoPandas (and shapely geometries).
But it would be a little more complex code here. I can give you some pointers if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd definitely be interested.
Right now, what we're doing is ask the user to pre-compute cell vertices from the cell ids (which can have various forms for different DGGS, but most often are int64
or uint64
arrays), then convert them into shapely
polygons. This is one of the things that take the longest other than converting to geopandas
and constructing the SolidPolygonLayer
, so if we could instead extend deck.gl
to directly support them (like it supports H3 ids), that would be best. However, I suppose that will either take a long time and require someone to actually do the work, especially since there's so many different kinds of DGGS (and I'm not confident in my TypeScript / JavaScript skills so I won't be able to work on this myself anytime soon).
So as the next best step, should we construct geoarrow
polygons from the cell vertex coordinates, possibly using arro3
? If you have any examples I'd be happy to try figuring out how to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the end of the day, we need to create vertex information in Lonboard/deck.gl for rendering. I believe that even deck.gl's H3 layer does not render the h3 cells on the GPU from an ID, but rather uses those IDs on the CPU in JS to create the vertex geometry.
For now, I think it's simplest to convert cell ids to Polygon geometries in Python, and then we can use the SolidPolygonLayer
/PolygonLayer
directly.
A GeoArrow Polygon array is just:
- a contiguous buffer of all the coordinates in all the polygons
- plus an offset array that points into the coordinates array and which says which coordinates are in each ring
- plus an offset array that points into the previous ring offsets array, and which says how many rings are in each geometry.
Here's a quick code example that might help illustrate:
import arro3.core
import h3.api.numpy_int as h3
import lonboard
import numpy as np
from arro3.core import Array, Field, fixed_size_list_array, list_array
cell = h3.geo_to_h3(-10, 15, 4)
disk = h3.k_ring(cell, 2)
# Overallocates in case there are any pentagons
coords_per_hexagon = 7
total_coords = len(disk) * coords_per_hexagon
# Allocate coordinate buffer
coords = np.zeros((total_coords, 2), dtype=np.float64)
# Allocate ring offsets. The length should be one greater than the number of rings
ring_offsets = np.zeros(len(disk) + 1, dtype=np.int32)
# In the case of h3 cells, we know that every cell will be a simple polygon with a
# single ring, so we can just use `arange`
geom_offsets = np.arange(len(disk) + 1, dtype=np.int32)
coord_offset = 0
for i, h in enumerate(disk):
boundary = h3.h3_to_geo_boundary(h, geo_json=True)
num_coords = len(boundary)
coords[coord_offset : coord_offset + num_coords, :] = boundary
coord_offset += num_coords
ring_offsets[i + 1] = coord_offset
# shrink the coordinates array, in case there are any pentagons
last_coord = ring_offsets[-1]
coords = coords[:last_coord, :]
polygon_array = list_array(geom_offsets, list_array(ring_offsets, coords))
# We need to tag the array with extension metadata (`geoarrow.polygon`) so that Lonboard knows that this is a geospatial column.
polygon_array_with_geo_meta = polygon_array.cast(
polygon_array.field.with_metadata({"ARROW:extension:name": "geoarrow.polygon"})
)
lonboard.viz(polygon_array_with_geo_meta)
(Note that you need to use latest main of arro3
, because I just implemented support for using numpy-backed memory for Rust Arrow arrays kylebarron/arro3#204, kylebarron/arro3#208)
(Note that this simple implementation probably breaks over the antimeridian.)
h3-py is not itself vectorized, so converting from h3 cell ids to polygon vertices is quite slow. In the longer term, I'd like to update https://github.com/nmandery/h3ronpy to have a fast way to convert from h3 cell ids to GeoArrow polygon array (or MultiPolygonArray in the case of cells that cross the antimeridian)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks great @kylebarron !
I believe that even deck.gl's H3 layer does not render the h3 cells on the GPU from an ID, but rather uses those IDs on the CPU in JS to create the vertex geometry.
Interesting. Do you know if deck.gl has some clever optimizations for plotting H3/S2 like, e.g., compute on the fly the cell polygon geometries and/or aggregating data (parent cells), depending on the current zoom level / display extent? This might still have some benefits for plotting very large datasets (common use case) interactively. Or maybe Lonboard allows doing such optimizations on the server side (widget callback)?
I just implemented support for using numpy-backed memory for Rust Arrow arrays
I guess this could really help better integration between arrow and xarray in general? 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the longer term, I'd like to update https://github.com/nmandery/h3ronpy to have a fast way to convert from h3 cell ids to GeoArrow polygon array
h3ronpy
has h3ronpy.arrow.vector.cells_to_wkb_polygons
which would require you to decode WKB, so not sure if that's better.
In any case, I've translated my example from #11 to use your example (while borrowing code from lonboard.viz
to construct the arrow table and construct a SolidPolygonLayer
):
import xarray as xr
import arro3.core
import healpy as hp
import lonboard
import numpy as np
from matplotlib import colormaps
from arro3.core import Array, Field, fixed_size_list_array, list_array, Schema, Table, ChunkedArray
resolution = 10
ds = (
xr.Dataset(
coords={
"cell_ids": (
"cells",
np.arange(12 * 4**resolution),
{
"grid_name": "healpix",
"resolution": resolution,
"indexing_scheme": "nested",
},
)
}
)
.pipe(xdggs.decode)
.pipe(lambda ds: ds.merge(ds.dggs.cell_centers()))
.assign(data=lambda ds: np.sin(np.radians(ds["latitude"])))
)
level = 10
cell_ids = np.arange(12 * 4**level)
def cell_ids2vertices(cell_ids, level, indexing_scheme):
nest = indexing_scheme == "nested"
nside = 2**level
boundary_vectors = hp.boundaries(nside, cell_ids, step=1, nest=nest)
lon, lat = hp.vec2ang(np.moveaxis(boundary_vectors, 1, -1), lonlat=True)
lon_ = (lon + 180) % 360 - 180
boundaries = np.reshape(np.stack((lon_, lat), axis=-1), (-1, 4, 2))
# fix the dateline / prime meridian issue
lon_ = boundaries[..., 0]
to_fix = abs(np.max(lon_, axis=-1) - np.min(lon_, axis=-1)) > 300
fixed_lon = lon_[to_fix, :] + 360
boundaries[to_fix, :, 0] = fixed_lon
return boundaries
def cell_ids2cell_boundaries_geoarrow(cell_ids, level, indexing_scheme):
vertices = cell_ids2vertices(cell_ids, level, indexing_scheme)
boundaries = np.concatenate([vertices, vertices[:, :1, :]], axis=1)
coords = np.reshape(boundaries, (-1, 2))
coords_per_pixel = boundaries.shape[1]
geom_offsets = np.arange(cell_ids.size + 1, dtype="int32")
ring_offsets = geom_offsets * coords_per_pixel
polygon_array = list_array(geom_offsets, list_array(ring_offsets, coords))
# We need to tag the array with extension metadata (`geoarrow.polygon`) so that Lonboard knows that this is a geospatial column.
polygon_array_with_geo_meta = polygon_array.cast(
polygon_array.field.with_metadata({"ARROW:extension:name": "geoarrow.polygon"})
)
return polygon_array_with_geo_meta
polygons = cell_ids2cell_boundaries_geoarrow(cell_ids, level=level, indexing_scheme="nested")
values = np.linspace(0, 1, cell_ids.size)
cmap = colormaps["viridis"]
colors = apply_continuous_cmap(values, cmap)
array = Array.from_arrow(polygons)
field = array.field.with_name("geometry")
schema = Schema([field])
table = Table.from_arrays([array], schema=schema)
num_rows = len(array)
if num_rows <= np.iinfo(np.uint8).max:
arange_col = Array.from_numpy(np.arange(num_rows, dtype=np.uint8))
elif num_rows <= np.iinfo(np.uint16).max:
arange_col = Array.from_numpy(np.arange(num_rows, dtype=np.uint16))
elif num_rows <= np.iinfo(np.uint32).max:
arange_col = Array.from_numpy(np.arange(num_rows, dtype=np.uint32))
else:
arange_col = Array.from_numpy(np.arange(num_rows, dtype=np.uint64))
table = table.append_column("row_index", ChunkedArray([arange_col])).append_column("values", ChunkedArray([Array.from_numpy(values)]))
layer = lonboard.SolidPolygonLayer(table=table, get_fill_color=colors)
map = lonboard.Map(layer)
(this also doesn't fully resolve the dateline issue)
which takes ~19 s to complete, plus about 10s to render. The version that converts to geopandas takes ~54s plus another ~10s to render, so this is a huge improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's cool! I should figure out how to use pyinstrument
. Was it a one-line thing to create that?
├─ 11.166 SolidPolygonLayer.__init__ lonboard/_layer.py:1498
│ [26 frames hidden] lonboard, ipywidgets, pyarrow, <__arr...
│ 8.384 ParquetWriter.write_table pyarrow/parquet/core.py:1084
- That's a lot of time just to write Parquet. I wouldn't have expected it to be that slow. For a long time I've thought about having a toggle for the user to switch off parquet encoding (e.g. if the Python environment is on the same machine as the browser). We go through Parquet currently because it creates a much smaller download in the case of remote Python environments.
- I'm curious what those other frames are. It still shouldn't be 3 seconds of overhead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it a one-line thing to create that?
it has different modes for how you can run it (see the user guide). I put the code above in a python script, then ran it with python -m pyinstrument lonboard_example.py
, but you could also use the context manager:
with pyinstrument.profile():
...
or the jupyter / ipython cell magic %%pyinstrument
(load first with %load_ext pyinstrument
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what those other frames are. It still shouldn't be 3 seconds of overhead
the ones beneath SolidPolygonLayer
? Here you go:
├─ 11.018 SolidPolygonLayer.__init__ lonboard/_layer.py:1498
│ └─ 11.018 SolidPolygonLayer.__init__ lonboard/_layer.py:266
│ ├─ 9.058 SolidPolygonLayer.__init__ lonboard/_layer.py:80
│ │ └─ 9.058 SolidPolygonLayer.__init__ lonboard/_base.py:16
│ │ └─ 9.058 SolidPolygonLayer.__init__ ipywidgets/widgets/widget.py:500
│ │ [1 frames hidden] ipywidgets
│ │ 8.991 SolidPolygonLayer.get_state ipywidgets/widgets/widget.py:589
│ │ ├─ 8.330 serialize_table lonboard/_serialization.py:72
│ │ │ └─ 8.330 serialize_table_to_parquet lonboard/_serialization.py:25
│ │ │ └─ 8.321 ParquetWriter.write_batch pyarrow/parquet/core.py:1068
│ │ │ [0 frames hidden]
│ │ │ 8.321 ParquetWriter.write_table pyarrow/parquet/core.py:1084
│ │ └─ 0.661 serialize_accessor lonboard/_serialization.py:58
│ │ └─ 0.661 serialize_pyarrow_column lonboard/_serialization.py:52
│ │ └─ 0.661 serialize_table_to_parquet lonboard/_serialization.py:25
│ │ └─ 0.641 ParquetWriter.write_batch pyarrow/parquet/core.py:1068
│ │ [1 frames hidden] pyarrow
│ └─ 1.958 default_geoarrow_viewport lonboard/_layer.py:215
│ └─ 1.865 total_bounds lonboard/_geoarrow/ops/bbox.py:36
│ └─ 1.865 _total_bounds_nest_2 lonboard/_geoarrow/ops/bbox.py:80
│ └─ 1.865 _coords_bbox lonboard/_geoarrow/ops/bbox.py:55
│ ├─ 0.939 amax <__array_function__ internals>:177
│ │ [3 frames hidden] numpy, <built-in>
│ └─ 0.926 amin <__array_function__ internals>:177
│ [3 frames hidden] numpy, <built-in>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the most recent commits integrate the code we talked about into xdggs
. If you have time to look at I'd appreciate another review, but either way thanks for all the help, this is already so much faster than the code I had before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW the next version of geoarrow.rust.core
will have constructor functions: geoarrow/geoarrow-rs#810. And so you can pass the arrays directly into geoarrow.rust.core.polygons
. Using those functions would ensure that the arrays generated are fully spec-compatible.
(As it is, you're generating data whose physical buffers are the same as geoarrow arrays, but you're missing some metadata like the correct field names for nested arrays and the correct nullability flags for nested arrays. Lonboard allows that input but it's not 100% to the geoarrow spec, especially if we enforce it in issues like geoarrow/geoarrow#64)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot for the review comments, @kylebarron!
|
||
Returns | ||
------- | ||
map : lonboard.Map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the hint. I'll probably use sphinx
to set up the docs, so I'll definitely make use of that!
xdggs/plotting.py
Outdated
cmap="viridis", | ||
center=None, | ||
): | ||
import geopandas as gpd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd definitely be interested.
Right now, what we're doing is ask the user to pre-compute cell vertices from the cell ids (which can have various forms for different DGGS, but most often are int64
or uint64
arrays), then convert them into shapely
polygons. This is one of the things that take the longest other than converting to geopandas
and constructing the SolidPolygonLayer
, so if we could instead extend deck.gl
to directly support them (like it supports H3 ids), that would be best. However, I suppose that will either take a long time and require someone to actually do the work, especially since there's so many different kinds of DGGS (and I'm not confident in my TypeScript / JavaScript skills so I won't be able to work on this myself anytime soon).
So as the next best step, should we construct geoarrow
polygons from the cell vertex coordinates, possibly using arro3
? If you have any examples I'd be happy to try figuring out how to do that.
This has speed implications: before, we've been using the shapely → geopandas → geoarrow route, which adds significant overhead. Co-authored-by: Kyle Barron <kylebarron@users.noreply.github.com>
In the future, we might want to move the conversion code to a separate module.
Hi, thanks @keewis @kylebarron , |
This reverts commit 5a71559.
looks like the most recent commits got the tests to pass. I'll merge later this evening. For future PRs:
|
I made conda-forge/arro3-core-feedstock#7 There must be something in the recipe that's preventing the recipe from building for osx-arm, but I don't know what it is. If you had interest in exploring why or how to fix it that would be great |
good catch, @tinaok, I appear to have accidentally dropped the cell ids from the hover table. I've extended the list of tasks left for future PRs, but for now I'll go ahead and merge. |
As described in #11, this allows us to plot a single map using
lonboard
. This works by computing the cell boundaries in advance, then uselonboard.SolidPolygonLayer
to visualize them.In the future, we might be able to extend this by attaching sliders for any additional dimensions, as
lonboard
is really fast when it comes to changing the color of the polygons.Also, as a future task, we should try to allow making use of the native support for H3 cell ids in
deck.gl
(and extend that to supporthealpix
and maybe other DGGS), which should get rid of a lot of the data transfer between the python kernel and the browser, and computing cell boundaries is also very expensive.Still missing: