-
Notifications
You must be signed in to change notification settings - Fork 15
Vector Data and FATs
See also More-Glaciers-CCI-Use-Cases.
We need to ingest ESRI Shapefiles into Cate Desktop so that geometries can be displayed and user can interact with them on the 3D globe.
While the Cate WebAPI can easily read and process shape files through geopandas and/or fiona, the challenge is to efficiently stream also very large Shapefiles into the display components of Cate Desktop. Note that e.g. the Glaciers CCI products contain ~80 MB of binary geometry coordinates.
The following facts have a major impact on the streaming and display performance and need to be addressed either in the back-end or front-end:
- geometry data must be converted from binary Shapefile format into memory representation used by some Python library (e.g. geopandas, shapely), from there to some textual representation which can be interpreted by JavaScript libraries (e.g. GeoJSON, GML, KML, CZML), and finally into memory representation used by some JavaScript library (e.g. Cesium, OpenLayers, D3).
- geometry data must be transformed from its source CRS to some target CRS used in the display. For example, source coordinates may be in UTM, but GeoJSON only supports EPSG-4326, and the display may be set to Polar Stereographic.
- geometry data should be loaded only for the visible portion of the Earth and only for an adequate level of detail, i.e. hide all details that are not perceivable at the display's current zoom level.
Note: This is implemented only in branch 477-nf-support_glaciers_cci in cate
and cate-desktop
.
In the Python back-end geo-data resource is either represented
- by the
gpd.GeoDataFrame
, the variable is represented by agpd.GeoSeries
or - by the
fiona.Collection
, the variable is represented by a property within each collection record, which are GeoJSON Feature objects.
We currently provide the read_geo_data_collection
operation which returns a fiona.Collection
given a file path to a supported vector data format such as GeoJSON (.geojson) or Shapefile (.shp) or zipped Shapefile directory.
Important note: we prefer using Fiona over GeoPandas because reading Shapefiles by means of a fiona.Collection
is way faster (several 10x) than using a pd.GeoDataFrame
. This means, Cate users should read Shapefiles using the read_geo_data_collection()
operation rather than read_geo_data_frame()
for maximum display performance.
In addition, we've implemented a RESTful method in Cate's WebAPI which allows streaming GeoJSON from resources of type fiona.Collection
at a given level of detail (/res/geojson/<resource_name>?level=<level>&...
):
- Simplification is performed on source CRS coordinates using an own implementation of the Visvalingam’s algorithm powered by
numba
JIT compilation. But it still uses the pure Pythonheapq
implementation of a min-heap, which makes it slow. We remove a ratio of p polygon points where p =2 ** -(num_levels - (level + 1))
and wheremax_level=8
(hard-coded constant). Iflevel=0
we turn any geometry into a single point computed by the averages of longitudes and latitudes, respectively; - Transformation from the CRS used by the
fiona.Collection
object to EPSG-4326 as required by GeoJSON is done byproj4
package; - Streaming is done by a Tornado request handler implemented as an asynchronous "co-routine" (we still have issues here with concurrent invocations!).
An urgent problem is how we decide when to convert to points and when is it ok to stay with original geometries. Ideally, we would cluster the geometries, symbolize on low levels of detail until we reach the highest level of detail where we display the full geometries. An additional option would be to offer an extra clickable symbol (button) at highest highest level of detail that allows expanding into original geometry and collapsing into a symbol by clicking it.
Plan: Invoke that REST method for a given resource-variable pair if the resource's type is fiona.Collection
or gpd.GeoDataFrame
. Within a display, any polygons are then filled by the selected variable's value between a display min/max range using a given color bar (similar to raster data).
State: We stream GeoJSON into 3D globe and 2D map using custom data sources for the respective Cesium and OpenLayers APIs and display polygons at a constant simplification level using the default style settings (no styling by color mapping implemented yet). Loading of GeoJSON stream is done in a separate Web Worker process.
The current (not really award-winning) solution for displaying large Shapefiles (or any large geo-data sources) is to shrink them beforehand using some external tool, e.g. the GDAL command-line tool ogr2ogr
:
$ ogr2ogr output.shp input.shp -simplify 0.0001
There is still a lot of work to be done to let users efficiently work with vector layers in cate-desktop
:
- Implement loading of GeoJSON data only for visible area and the required level of detail (this may be the hardest part!)
- Cancel a streaming process if no longer required (e.g. view closed, selected variable changed).
- Implement a mapping from the selected variable used for the current vector layer to some geometry style. Provide GUI for the mapping and the style settings (e.g. details section of LAYERS panel if vector layer is selected)
- For polygon data there must be a mapping from the values of the selected variable to polygon fill colors
- For point data there must be a mapping from the values of the selected variable to symbols of varying shape, icon, size, color.
- Implement the default style settings for geometry of a selected vector layer if no variables exist or no variable is selected. Provide GUI for the default style settings (e.g. details section of LAYERS panel if vector layer is selected and/or user preferences dialog)
- For polygon data set the default stroke and fill
- For line data set the default stroke
- For point data set the default symbol
- How a selected geometry shall appear
- We must allow users to interact with geometry, e.g. select a point or polygon, and then to use a selected geometry as an input for operations that accept geometry objects (
subset_spatial
,tseries_point
, etc).
- How to best implement a filter, e.g.
filter_geo_data_collection
, that will return a new Fiona collection. Fiona does not seem to support creating new in-memory collections. - Each collection must have its own layer and style. How to style the geometries for the display?
- Features of a collection must be selectable, so that we can use their geometry as input to other operations --> Features may be harmonized with our current implementation of Placemarks and the current Countries layer.
The GeoJSON DataSource is currently downloaded for a selected variable:
export function getGeoJSONUrl(baseUrl: string, baseDir: string, layer: VariableVectorLayerState): string {
return baseUrl + `ws/res/geojson/${encodeURIComponent(baseDir)}/${encodeURIComponent(layer.resName)}?`
+ `level=8`
+ `&var=${encodeURIComponent(layer.varName)}`
+ `&index=${encodeURIComponent((layer.varIndex || []).join())}`
+ `&cmap=${encodeURIComponent(layer.colorMapName)}`
+ `&min=${encodeURIComponent(layer.displayMin + '')}`
+ `&max=${encodeURIComponent(layer.displayMax + '')}`;
}
This means, if we change the selected variable, Cate Desktop will request a new GeoJSON DataSource! This is very inefficient and we should get rid of the variable-dependency. Color coding w.r.t. a selected variable shall be done in Cate Desktop. We also need to share a DataSource across multiple instances of the Cesium 3D globe, otherwise a 2nd globe will create download and keep in-memory a 2nd DataSource.
- http://openlayers.org/en/master/examples/geojson.html
- http://openlayersbook.github.io/ch11-creating-web-map-apps/example-03.html
- Oboe.js + D3.js: http://bl.ocks.org/lightviz/8938240
- Using server-sent events
- Web Workers API and WorkerGlobalScope
- Web Workers from Code URLs: https://gist.github.com/SunboX/5849664