Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write documentation and streamline general behaviour #351

Merged
merged 188 commits into from
Apr 15, 2023

Conversation

rhugonnet
Copy link
Member

@rhugonnet rhugonnet commented Feb 10, 2023

Summary

This PR writes the documentation for GeoUtils! 🥳

It also:

  • Streamlines several features and behaviours to better fit the objective of the package,
  • Re-organizes the structure of several modules,
  • Homogenizes the syntax of the docstrings,
  • Interfaces GeoPandas/Shapely functionalities in Vector,
  • Adds many tests, solving several bugs and bringing test coverage to more than 90% (was 75% previously).

This PR is branched from #350 on the Mask addition (now closed and everything is in this PR, as some Mask fixes started to depend on new behaviours only available here).
This PR also casts single-band raster to 2D following #360 (now closed also).

Documentation

The hidden branch with the rendered documentation is here: https://geoutils-rhugonnet.readthedocs.io/en/docs/index.html

Regarding theme and rendering, this PR:

  • Implements the sphinx-book-theme (same as OGGM and many other packages, it's a sub-theme of the PyData theme, used by NumPy, Pandas, SciPy, etc). Good news also: the dark theme PR has been merged in their repo, we'll have it available at their next version release (EDIT: It's released! I just need to update the logo files to support the Dark theme :D; SECOND EDIT: Done!),
  • Switches to MyST-parser which is an extension of Markdown: partially to avoid re-learning another language with rST, but mostly because it is getting popular thanks to its many functionalities. See next point.
  • Switches to MyST-NB that allows to add Jupyter code-cells to Markdown to render code in the documentation, and easily switch to interactive mode with a Binder. It allows to write Python code in Markdown, or to write Markdown in Jupyter notebook. Both will be rendered the same! Older packages (and even more recent ones like Xarray) had to rely on the IPython sphinx extension, which was really a hassle (multi-line is hard, few rendering options, etc, for example here: https://docs.xarray.dev/en/stable/user-guide/indexing.html#positional-indexing).
  • Removes the /doc/code folder and the literalinclude:: code/coregistration.py :lines: 5-21 we were using in xDEM, which was a hassle to maintain, thanks to the previous point. Proper errors should be raised by Sphinx or can be tested in the CI directly (we can run the .md files as a Jupyter notebook).
  • Switches from pip to mambaforge for the Readthedocs in .readthedocs.yaml, which was failing and slow. Also adds pip -e . to dev-environment.yml to simplify dev installation of the package.
  • Adds sphinx-design for rendering nice tables in the documentation.
  • Soon, the sphinx inheritance diagrams will support intersphinx, so the diagram I created that shows inheritance from GeoUtils to xDEM will finally work (right now it only renders locally). See Fixes missing and broken links in inheritance diagrams sphinx-doc/sphinx#10614.
  • Adds the sphinx_gallery_conf option remove_config_comments = True to remove config comments in gallery examples, such as for choosing thumbnails: # sphinx_gallery_thumbnail_number = 2,
  • Adds subsection_order and within_subsection_order parameters to properly order gallery example subsections and individual examples,
  • Adds a post-build sphinx routine to remove myraster.tif and myvector.gpkg created during the "Open/Save" examples by Sphinx-Gallery, as advised here: ignore code-blocks sphinx-gallery/sphinx-gallery#361,
  • Adds a binder/ folder with environment and post-build setup for Binder, using myst-nb and jupytext, we can now launch any documentation page directly to our Binder (badge added to the README)!

Regarding the contents of the docs, I let you have a look there directly!
In short:

  • Structure recommended for writing docs (Getting started/Features/Examples/References),
  • An inviting landing page, with a logo 😛 and grid-cards shortcuts,
  • The classic "About/Install/Quick start" section, the first one important to make clear our place in the landscape of geospatial packages,
  • An explanation of the core concepts in "Fundamentals", key to understand the behaviour of the package (and its added value), with small embedded examples,
  • A descriptive documentation in "Rasters", "Vectors", with small embedded examples as well,
  • A full gallery of examples separated in "I/O", "Handling" and "Analysis", with 20 short gallery examples.
  • An API declined by section to make it more user-friendly, so semi-automatic (how it's done in NumPy, SciPy, Xarray, etc) where we need to specify any new function we add.

Architecture changes

This PR:

  • Moves satimg to georaster/,
  • Renames georaster.py and geovector.py into raster.py and vector.py (the "geo" already being in geoutils.raster, as previously discussed),
  • Re-organizes spatial_tools into raster/array.py, raster/multiraster.py and raster/sampling.py,
  • Moves geoviewer.py to a new bin/ folder.

Syntax changes

This PR:

  • Homogenize argument names across functions that use similar arguments,
  • Removes camel-case (e.g., cropGeom renamed into crop_geom),
  • Streamlines docstring names: a "raster" or "vector" is used to designate a Raster or Vector object (like an "array" is used for a np.array), all docstrings start without article (Raster to do that),
  • Adds many missing docstrings for class properties.

Changes linked to documentation

This PR:

  • Adds crs and transform as properties to Raster, with a data.setter, to ensure those are consistent and listed as properties in the API,
  • Adds ds, crs and bounds as properties to Vector, also for consistency and API,
  • Adds datetime, etc... as properties to SatelliteImage, also for consistency and API,
  • Adds "Is loaded?" into Raster.info(),
  • Adds __repr__ and modifies __str__ for Raster and Vector to be consistent with what is done in NumPy and GeoPandas, those now only prints "not_loaded" when the data is unloaded,
  • Adds _repr_html_ in Raster and Vector for representing rasters and vectors more stylishly in HTML,
  • Adds a copy_doc decorator in misc.py to automatically generate the docstring of wrapped GeoPandas functions (their __doc__) by replacing geopandas.GeoSeries or geopandas.GeoDataFrame by Vector, and adding an intersphinx link to the GeoPandas API in our doc,
  • Adds Vector.save as a wrapper around GeoDataFrame.to_file for consistency with Raster,
  • Adds Vector.show as a wrapper around GeoDataFrame.plot for consistency with Raster.
  • Makes Raster.show and Vector.show recognize and respect the first axis, to be able to plot everything on top by simply doing raster.show(); vector.show(). Axes can still be passed explicitly through ax=. To plot on a new axis, one can add a plt.figure() or simply do rst.show(ax="new").

Mask class

See #350, and the documentation. This PR adds tests, fixes behaviour of logical functions and adds overridden reproject, crop, polygonize and proximity to Mask.
Notable changes for Raster include:

  • Renames __eq__ to raster_equal for consistency with NumPy, __eq__ is now used by logical bitwise operations (see documentation and Add Mask subclass #350) and casts a Mask,
  • Renames equal_georeferenced_grid to georeferenced_grid_equal for naming consistency.

Casting single-band Raster to 2D

See #360.

GeoPandas functionalities in Vector

This PR:

  • Wraps all geometric functionalities of the geopandas.base.GeoPandasBase non-public class (inherited by both GeoSeries and GeoDataFrame for geometric functionalities) and the geopandas.GeoDataFrame class to Vector. Everything can be called using Vector, for example Vector.unary_union. For the output, three cases: 1/ returns a Vector when it is a geometric output, 2/ proposes to append a non-geometric per-feature output (pandas.Series) to Vector.ds, and 3/ when it's another type of output, it returns that output directly. Why is this useful? Practicality and consistency: it can be a hassle to work with GeoPandas sometimes as a "geometric" operation on a GeoDataFrame can return a GeoDataFrame, a GeoSeries or a shapely.Geometry. And "non-geometric per-feature" functions like .area or .length return a pandas.Series of the same length. While GeoPandas' team is still improving a couple things upstream, many behaviours are actually intentional and fixed to maintain a lower-level API for all types of user. End-users focusing on analysis like us don't need this, and it slows us down, so it's quite practical to have everything logically and automatically re-cast to a GeoDataFrame in the Vector, or appended to it! 😄
  • Adds many tests to check all functionalities behave as intended, that all GeoPandas functions are covered (with a couple exceptions), that all arguments are still up-to-date, as well as their default values. Inheritance is out of the question as it would prevent us from re-encapsulating the output (so we'd have to override everything the same, anyway). This is how GeoPandas wrap Shapely on their side, also. Right now, we don't wrap the Pandas functionalities, those have to be called through Vector.ds directly.
  • Automatically fetches the documentation from GeoPandas with a renaming for Vector for the basic description, and points to GeoPandas's API for details, to ensure a minimal maintenance effort.

Feature changes

This PR:

  • Changes the default load_data of Raster.__init__ to False, and adds a condition to run .load() when .data is called (if data is not already loaded). This is, in short, implicit loading! 😄 Adds tests to check this functionality, and modifies a couple things that were breaking this behaviour in .load() and __init__. Now implicit loading also works properly with the downsample and indexes arguments! Also fixes the behaviour for multi-band rasters, and adds tests,
  • Changes default in_value of polygonize to "all", as Mask.polygonize() now automatically overrides Raster.polygonize for a boolean input,
  • Makes indexing __getitem__ (i.e., []) possible with boolean np.ndarray and not only Mask,
  • Adds index assignment __setitem (i.e., [] = x) to Raster,
  • Adds a clip argument to Vector.crop, to optionally clip the geometries to the extent,
  • Adds a Raster.subsample function to perform subsampling directly from the raster,
  • Adds support for universal function methods through the array interface of Raster (e.g., reduce as in np.logical_or.reduce()),
  • Adds Vector.get_bounds_projected() and Raster.get_bounds_projected() to automatically derive bounds in any CRS from a Raster or Vector accouting for densification of footprint edges during reprojection,
  • Adds Vector.get_footprint_projected() and Raster.get_bounds_projected() to automatically derive a polygon vector in any CRS from a Raster or Vector accouting for densification of footprint edges during reprojection,
  • Adds Vector.from_bounds_projected to automatically create a Vector from projected bounds,
  • Adds Vector.get_metric_crs() and Raster.get_metric_crs() to automatically derive a metric CRS from a Raster or Vector which is UTM or UPS,
  • Adds as_array argument to Vector.create_mask to return a boolean np.ndarray (returns a Mask by default).

Bug fixes and tests

This PR:

  • Fixes value_at_coords on several bands and adds tests,
  • Fixes coords that returned a wrong output with its default grid=True argument and adds tests,
  • Fixes a bug in get_nanarray to run filled(np.nan) when original data was not of float type and adds tests,
  • Fixes a maximum recursion depth error when trying to run np.ma functions on Raster due to __array__ (not supported by NumPy, they don't have an array interface for masked-array functions),
  • Fixes bugs to geoviewer.py and adds tests,
  • Fixes mutability of Raster.nodata by defining it as tuple instead of list,
  • Fixes many untested functions and adds tests: Raster.to_xarray, Raster.shift, misc.diff_yml, etc...

Issues resolved

This PR:

@rhugonnet rhugonnet changed the title Write documentation in sphinx-book-theme Write documentation Feb 10, 2023
@rhugonnet
Copy link
Member Author

Alright, I have accounted for everything that was do-able without spending days on it!
The rest is listed in #361

Ready to merge!

@rhugonnet
Copy link
Member Author

After losing a couple hours on this, the logo problem actually didn't come from our file (SVG is the standard vector format for Sphinx, and ours was fine).
It was an issue with the Pydata Sphinx Theme when light/dark mode was defined as "auto": pydata/pydata-sphinx-theme#1180

@adehecq
Copy link
Member

adehecq commented Apr 14, 2023

Fully agree on all your points! Some info on additional features:

* For interactive plotting: yes! @friedrichknuth has good expertise in this, proposed to give a hand on adding `folium` or `bokeh` 😊.

Fantastic !

* For zonal statistics: a lot of people seem happy with `rasterstats`, we could look at what is done there, and see if it makes sense to add it as a dependency, or if it's better to integrate it directly ourselves? (maybe contact the maintainer?)

Yes, that makes sense. Also in old geoutils there were some zonal statistics tools that we could re-use. But they are based on GDAL/OGR...

Some more info on the doc :

* For the redundancy: it's voluntary, after reading guidelines, I realized it was good to have a bit of it in a doc (unlike in a paper), especially between gallery examples and feature doc (to avoid people have to read it all, and just look at their favorite way of learning about the package).

👍

* For the lack of figures in the feature doc: I agree, maybe we can add a few mini-galleries links across the feature docs? "Feature" sections are generally poor in terms of figures (e.g., Rasterio, NumPy), I feel adding mini-gallery links there would be sufficient, but the Quick start really needs more of its own figures. Maybe interactive plotting with facilitate some of these aspects?

Ok to add the mini gallery to the Feature section, and yes to more illustrations in the quickstart !

* For the binder/quick start: definitely need something better, available as either a page or a jupyter Notebook! New data (NDVI?) could come in there too, @adehecq you're the pro at setting up binders now 😉

Haha, I don't know if I'm a pro, but I can have a look later for sure.

@adehecq
Copy link
Member

adehecq commented Apr 14, 2023

All good on my end, we can always refine later ! Thanks again for all this work !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment