-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ DatashaderRasterizer for burning vector shapes to xarray grids #35
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
An iterable-style DataPipe for turning vector geometries into raster grids! Uses datashader to do the rasterization. Included a doctest for rasterizing geopandas.GeoDataFrame to xarray.DataArray. Added a new section in the API docs too. Also made a small change to XarrayCanvasIterDataPipe so that the datashader.Canvas being yielded has a crs attribute containing the original xarray object's coordinate reference system!
Improved traceback error and added a unit test for when GeometryCollection vector types (i.e. those with an assortment of point, line or polygon types) are passed in to DatashaderRasterizer. Limitation is on spatialpandas really, and hence datashader. Also fallback having datashader.Canvas's CRS to None to prevent AttributeError, though that might cause some issues when the vector has a CRS but the canvas doesn't.
Decided that coordinate reference systems are a must now, for both the datashader.Canvas and geopandas.GeoDataFrame inputs, because geospatial context matters. Added a unit test to ensure these checks work.
Enable rasterization of line and polygon inputs too! Pretty much just two more elif statements. However, because rasterizing line and polygons using datashader results in boolean type xarray.DataArray outputs that can't be reprojected by rioxarray, had to cast them to uint8. Added parametrized unit tests that ensures the three vector input types work.
Improving the DatashaderRasterizer docstring so that people know what is happening. Mention that the default aggregation is 'count' for points, and 'any' for lines and polygons. Document AttributeError that is raised when either the canvas or vector input is missing a `.crs` attribute, and ValueError raised when vector geometry type is not supported. Also added an intersphinx link for shapely.
Tidy up the three test_datashader_rasterizer_* unit tests that were using unique datashader.Canvas objects with different widths/heights (because they were written somewhat independently). Using pytest fixtures to do so. Split the single missing_crs tests into two to make it more unit-like. For the vector geometries, there has also been some swapping of GeoDataFrame vs GeoSeries for different tests. Might still be a bit hard to follow but will suffice for now.
Ensure that the output dataarray's coordinate reference system and affine transform is correct in the doctest (which is like a mini-integration test).
This was referenced Aug 14, 2022
weiji14
added a commit
that referenced
this pull request
Sep 7, 2022
Just a random collection of mostly documentation-related patches. Patches type-hints in #52, isort imports in #35, mention functional name of IterDataPipe in walkthroughs #8 and #20, and remove mention of returned tuple to patch #33. * 🏷️ Add specific type hints for mask_datapipe in geopandas.py Should be either an xarray.DataArray or xarray.Dataset. * 🚨 Sort spatialpandas imports in datashader.py Ran isort to sort spatialpandas.geometry imports alphabetically. Also intersphinx linked the `.crs` attribute to geopandas.GeoDataFrame.crs. * 💬 Mention functional name of IterDataPipe in walkthroughs So people don't get confused on why the class-form like `Collator` is mentioned but `.collate` was used instead. * 📝 Remove mention of tuple being returned in test_pyogrio_reader Forgot to edit the unit test's docstring. Patches #33. * 🍻 It's GeoPackage and GeoDataFrame, not GeoTIFF and DataArray Need to be more careful when copying and pasting stuff.
weiji14
added a commit
that referenced
this pull request
May 30, 2023
Probably wanted to preserve all the columns when converting from geopandas.GeoDataFrame to spatialpandas.GeoDataFrame, but it doesn't work sometimes when the vector is wrapped by StreamWrapper. Decided to pass the vector.geometry GeoSeries as input instead (alternative was to do a view like vector.loc[:]). Partially reverts 6805418 in #35. Wanted to add a unit test, but it was hard to get a minimal reproducible example. Only know that this helps with a complicated data pipeline reading vector GeoJSON data from a HTTP request.
weiji14
added a commit
that referenced
this pull request
May 30, 2023
…104) * 🥅 Catch specific ValueError on conversion to spatialpandas On converting a vector geometry in a geopandas.GeoDataFrame (which could be wrapped in StreamWrapper) to a spatialpandas.GeoDataFrame, there could be several different types of `ValueError`s raised. This modifies the exception raising to target only the one specific ValueError caused by invalid geometry type. See logic at https://github.com/holoviz/spatialpandas/blame/v0.4.8/spatialpandas/geometry/base.py#L805-L849 for how the original ValueError is raised. Also clarified that MultiPoint, MultiLineString and MultiPolygon geometry types are supported. * 🐛 Convert just the geometry column to spatialpandas.GeoDataFrame Probably wanted to preserve all the columns when converting from geopandas.GeoDataFrame to spatialpandas.GeoDataFrame, but it doesn't work sometimes when the vector is wrapped by StreamWrapper. Decided to pass the vector.geometry GeoSeries as input instead (alternative was to do a view like vector.loc[:]). Partially reverts 6805418 in #35. Wanted to add a unit test, but it was hard to get a minimal reproducible example. Only know that this helps with a complicated data pipeline reading vector GeoJSON data from a HTTP request. * ✅ Add test for empty vector raising proper ValueError Test to ensure that the ValueError raised when an invalid geopandas.GeoDataFrame is passed into DatashaderRasterizer is not about unsupported geometry type, but something else instead. Not exactly a perfect regression test for #104, but it does help with code coverage.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
An iterable-style DataPipe for turning vector geometries into raster grids! Uses
datashader
to do the rasterization.Preview at https://zen3geo--35.org.readthedocs.build/en/35/api.html#zen3geo.datapipes.DatashaderRasterizer
Part 2 out of 2 of superseding #32. Recall that 1st step (#34) was to define a canvas, and 2nd step (this PR) is to burn the vector (points/lines/polygons) onto that canvas via some aggregation function.
TODO:
DatashaderRasterizerIterDataPipe
.crs
Won't do: