Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute area-weighted statistics #640

Merged
merged 4 commits into from
Sep 27, 2023
Merged

Compute area-weighted statistics #640

merged 4 commits into from
Sep 27, 2023

Conversation

vincentsarago
Copy link
Member

@vincentsarago vincentsarago commented Sep 20, 2023

ref NASA-IMPACT/veda-backend#226

This PR adds a coverage options to the get_array_statistics function to enable area weighted statistics. This PR do not take care of the coverage array creation which should be done by the client application (maybe with some helper in rio-tiler)

cc @kylebarron @j08lue

To Do

  • more tests
  • docs
  • validate the logic (why some values are defined using weight or not)
  • add coverage utility functions

"minority": float(keys[counts.tolist().index(counts.min())].tolist())
if valid_pixels
else numpy.nan,
"mean": float(array.mean()),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't use the weighted array for the min and max (like https://github.com/isciences/exactextract#supported-statistics)

"std": float(array.std()),
"median": float(numpy.ma.median(array)),
"majority": majority,
"minority": minority,
"unique": float(counts.size),
**dict(zip(percentiles_names, percentiles_values)),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 maybe the percentiles should use the weighted array?

rio_tiler/utils.py Outdated Show resolved Hide resolved
# 3, 4
data = np.ma.array((1, 2, 3, 4)).reshape((1, 2, 2))

# Coverage Array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should call this the weight array instead of coverage? coverage doesn't seem to make the most sense to me here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in https://github.com/isciences/exactextract, weight are really weight, while the coverage is called cell coverage fractions. I didn't want to confuse people 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, coverage came from exactextract because their weights are usually derived from the coverage of each polygon in the cell. I think weight is more general than coverage though. There could be use cases for weighted zonal stats that aren't partial-coverage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coverage_weights ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we ever have both coverage and weights?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like exactextract does support both coverage and weight in effect, because it allows weight as a parameter and it computes coverage itself

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we ever have both coverage and weights?

Not planned right now but I don't want to be blocked in the future. I agree that coverage is not a perfect name but I don't want to use weights because it's not it's not weights but spatial fraction. I'm open to change to better name if you have ideas :D

(self.height, cover_scale, self.width, cover_scale)
).astype("float32")

return cover_array.sum(-1).sum(1) / (cover_scale**2)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly modified version of perrygeo/python-rasterstats#136

@sgoodm I know ☝️ is a 7 years old PR (😅) but I hope you don't mind that I reused some of the code here 🙏

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all, @jacobwhall and I have been pushing use of COGs in our work at @aiddata and are fans of these projects, so happy to share some code to support 👍 👍

@vincentsarago
Copy link
Member Author

Note: we don't have a BaseReader method that returns statistics for a GeoJSON Feature but this is how it could look like in the client (e.g TiTiler):

with Reader(path) as src:
    data = src_dst.feature(
        shape,
        shape_crs=WGS84_CRS,
    )

    coverage_array = data.get_coverage_array(
        shape, shape_crs=WGS84_CRS
    )

    stats = data.statistics(coverage=coverage_array)

@vincentsarago vincentsarago marked this pull request as ready for review September 25, 2023 19:16
@vincentsarago vincentsarago merged commit 7bed9de into main Sep 27, 2023
@vincentsarago vincentsarago deleted the CoverageStatistics branch September 27, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants