Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute area-weighted averages #226

Closed
2 tasks
j08lue opened this issue Sep 19, 2023 · 2 comments
Closed
2 tasks

Compute area-weighted averages #226

j08lue opened this issue Sep 19, 2023 · 2 comments

Comments

@j08lue
Copy link

j08lue commented Sep 19, 2023

Global datasets with a lat/lon grid or so can have large variations in cell size.

When calculating averages (zonal statistics) over large areas (across a couple degrees latitude or more), an accurate result would require that

  1. Each cell is weighted by the pixel / cell area
  2. [much less important!] Cells only partially covered by the query geometry are weighed according to the percent of intersection

As clarified previously, rio-tiler, which we are using, calculates unweighted averages only

https://github.com/cogeotiff/rio-tiler/blob/066878704f841a332a53027b74f7e0a97f10f4b2/rio_tiler/io/rasterio.py#L573-L584

While intersection-weighted averages are complex to compute, simple pixel-area-weighted averages should not be, as pixel areas can be computed from the transform.

Before implementing this functionality, we should make a benchmark (documented in a notebook or so), that shows what difference it makes to have pixel-area weights. Possible cases could be

  1. Average over the US (CONUS) for a 1 km resolution grid
  2. Average over the North American continent for a 100 km resolution grid

I can also provide sample data for benchmarking, if that helps. There are a bunch of global datasets in https://www.earthdata.nasa.gov/dashboard/data-catalog. Maybe one of the NO2 ones?

To access them, you may need to use the VEDA JupyterHub, though, since the buckets are private.

A relevant GHG dataset to benchmark this against would be CASA-GFED3 Land Carbon Flux, which should be accessible on the VEDA or GHG Center JupyterHub services.

User stories

  1. As a user of the zonal statistics function, I would like the results to be accurate no matter what the original projection of the data is, so I can trust the results.
  2. As a scientist distributing my large-scale data through TiTiler with the zonal statistics endpoint, I need the results of that calculation to be accurate, such that consumers of that data get correct extracts from my data.
  3. As a provider of a zonal statistics service for large-scale data, I need the calculations to be accurate, so I can provide this service with confidence.

Acceptance criteria

  • Did a benchmark for the difference area-weighting makes for a few use cases
  • Implement weighted averages by pixel area
@j08lue j08lue transferred this issue from NASA-IMPACT/veda-ui Sep 19, 2023
@j08lue
Copy link
Author

j08lue commented Oct 6, 2023

What is needed to surface this functionality in the GHG Center backend and use it in the frontend?

When we have a working endpoint, we also need to validate the values we compute.

@j08lue j08lue mentioned this issue Oct 9, 2023
3 tasks
@j08lue
Copy link
Author

j08lue commented Nov 9, 2023

@j08lue j08lue closed this as not planned Won't fix, can't repro, duplicate, stale Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant