Compute area-weighted averages #226

j08lue · 2023-09-19T19:01:45Z

Global datasets with a lat/lon grid or so can have large variations in cell size.

When calculating averages (zonal statistics) over large areas (across a couple degrees latitude or more), an accurate result would require that

Each cell is weighted by the pixel / cell area
[much less important!] Cells only partially covered by the query geometry are weighed according to the percent of intersection

As clarified previously, rio-tiler, which we are using, calculates unweighted averages only

https://github.com/cogeotiff/rio-tiler/blob/066878704f841a332a53027b74f7e0a97f10f4b2/rio_tiler/io/rasterio.py#L573-L584

While intersection-weighted averages are complex to compute, simple pixel-area-weighted averages should not be, as pixel areas can be computed from the transform.

Before implementing this functionality, we should make a benchmark (documented in a notebook or so), that shows what difference it makes to have pixel-area weights. Possible cases could be

Average over the US (CONUS) for a 1 km resolution grid
Average over the North American continent for a 100 km resolution grid

I can also provide sample data for benchmarking, if that helps. There are a bunch of global datasets in https://www.earthdata.nasa.gov/dashboard/data-catalog. Maybe one of the NO2 ones?

To access them, you may need to use the VEDA JupyterHub, though, since the buckets are private.

A relevant GHG dataset to benchmark this against would be CASA-GFED3 Land Carbon Flux, which should be accessible on the VEDA or GHG Center JupyterHub services.

User stories

As a user of the zonal statistics function, I would like the results to be accurate no matter what the original projection of the data is, so I can trust the results.
As a scientist distributing my large-scale data through TiTiler with the zonal statistics endpoint, I need the results of that calculation to be accurate, such that consumers of that data get correct extracts from my data.
As a provider of a zonal statistics service for large-scale data, I need the calculations to be accurate, so I can provide this service with confidence.

Acceptance criteria

Did a benchmark for the difference area-weighting makes for a few use cases
Implement weighted averages by pixel area

j08lue · 2023-10-06T09:12:47Z

What is needed to surface this functionality in the GHG Center backend and use it in the frontend?

When we have a working endpoint, we also need to validate the values we compute.

j08lue · 2023-11-09T20:24:37Z

Follow-up ticket: https://github.com/NASA-IMPACT/veda-architecture/issues/334

j08lue transferred this issue from NASA-IMPACT/veda-ui Sep 19, 2023

vincentsarago mentioned this issue Sep 20, 2023

Compute area-weighted statistics cogeotiff/rio-tiler#640

Merged

4 tasks

j08lue mentioned this issue Oct 9, 2023

Upgrade titiler-pgstac #234

Closed

3 tasks

j08lue closed this as not planned Won't fix, can't repro, duplicate, stale Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute area-weighted averages #226

Compute area-weighted averages #226

j08lue commented Sep 19, 2023 •

edited

Loading

j08lue commented Oct 6, 2023

j08lue commented Nov 9, 2023

Compute area-weighted averages #226

Compute area-weighted averages #226

Comments

j08lue commented Sep 19, 2023 • edited Loading

User stories

Acceptance criteria

j08lue commented Oct 6, 2023

j08lue commented Nov 9, 2023

j08lue commented Sep 19, 2023 •

edited

Loading