Skip to content
This repository has been archived by the owner on Aug 29, 2023. It is now read-only.

Add operation data_frame_aggregate() #707

Closed
forman opened this issue Jul 12, 2018 · 1 comment
Closed

Add operation data_frame_aggregate() #707

forman opened this issue Jul 12, 2018 · 1 comment

Comments

@forman
Copy link
Member

forman commented Jul 12, 2018

Expected behavior

Users of Glaciers CCI data require an operation that would provide summary statistics from aggregating numerical series in the Shapefile. For all or the given numeric series, the operation would compute the following statistics and create new column names for each variable {var}:

  • {var}_count
  • {var}_mean
  • {var}_median
  • {var}_sum
  • {var}_std
  • {var}_min
  • {var}_max

Note, there is a per-variable {var}_count because values may be excluded because they are NaN.

Also note, another way to achieve similar results is the new compute_data_frame operation, see #703.

Proposed signature is:

data_frame_aggregate(df: pd.DataFrame, 
                     var_names: VarNamesLike.TYPE = None,
                     keep_geometry: bool = False,
                     monitor: Monitor = Monitor.NONE) -> pd.DataFrame

In case keep_geometry == True an intersection of all geometries in every geometry column is performed.

Actual behavior

Such operation does not exist yet.

Specifications

Cate 1.0 ... 2.0.dev15

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants