first try at a function to break 1D-daily into 2D for seasons #3

remicousin · 2021-08-17T13:58:04Z

Also includes seasonal_sum to illustrate how could be used to make "seasonal" functions.

The original plan was to make a generic function that will split 1D daily data into 2D daily data by days and years and select a subset of days defining a season. Thus allowing to perform analysis on the days dimension to obtain yearly series of a seasonal quantity and thus be able to reproduce a lot of popular Ingrid functions (e.g. and including onset date).

It turns out that xarray offers a grouping function and more specifically by bins that then allow operations by group. This seems nicer to me as there is then no need to create a new dimension, worry about indexing and bins can have different sizes (making leap year management easier). The idea being to give the edges of the season of interest, group, then drop every other group and then that can be fed to any other operator.

So I went with that. Because bins are defined by their edges, I made the inputs for the season be start and end rather than start and length. Now that fits better the groupby_bins function and it's typically easier for a user to think of seasons in these terms. That may mean changing onset date function inputs accordingly, or maybe cover the 2 cases (ie also offer inputs to be start and length).

Now it also turns out that it's not easy to drop groups from a grouped object. Or it was but someone complained about automatically dropped empty bins so this PR brings them back automatically and recommends using dropna to actually drop. But one can do dropna only after applying an operator to the grouped data, so that is something to be aware of when creating something like seasonal_sum.

That makes me doubt whether this is a good route or not. The grouping thing is seducing but the dropping business not so much. Other options could be:

use the pandas versions of those things (groupby, cut) and use filter for dropping;
go a totally different route, TBD

There is a run_test_season_stuff() to help with testing and one can run calc.py with
CONFIG=config-sample.yaml python calc.py

Includes seasonal_sum to illustrate how could be used to make "seasonal" functions

remicousin · 2021-08-17T14:00:55Z

And of course there could be other problems that the ones I laid down in introduction

ikhomyakov · 2021-08-20T18:13:10Z

That makes me doubt whether this is a good route or not. The grouping thing is seducing but the dropping business not so much. Other options could be:

@remicousin, I also doubt that the grouping is a good approach. Let me look into this more and I will try to update by Monday.

remicousin · 2021-08-20T18:26:09Z

That makes me doubt whether this is a good route or not. The grouping thing is seducing but the dropping business not so much. Other options could be:

@remicousin, I also doubt that the grouping is a good approach. Let me look into this more and I will try to update by Monday.

ok!

That label-grouping concept seems to be all over the place though... Maybe I am not using it right... or maybe we need a more sophisticated version... or maybe it's not the way to go at all...

aaron-kaplan · 2021-08-31T13:56:39Z

My first thought for extracting a single season would be to use where with drop=True instead of groupby_bins. But if you're planning to generalize beyond the current use case to support multiple seasons, groupby_bins might be the way to go. Did you consider and reject the option of using where?

onset_maproom/calc.py

remicousin · 2021-08-31T14:26:04Z

My first thought for extracting a single season would be to use where with drop=True instead of groupby_bins. But if you're planning to generalize beyond the current use case to support multiple seasons, groupby_bins might be the way to go. Did you consider and reject the option of using where?

I didn't think of using where. I am definitely thinking of using this as a generalization to do seasonal calculations from daily data. It turns out that this is how Ingrid does it too (~groups according to edges and drops every other group so I figured that was probably a good way to do things). Regardless, how would you do it with where?

find the edges like done here
use where to keep days within the right edges
then how do you make it "2D." Using some selection and then concatenating on a new dimension? One advantage of groupby_bins is that the bins need not be the same size, which is bound to happen with leap years, since the grouping doesn't create another dimension, but rather labels the grouped elements of interest.

aaron-kaplan · 2021-08-31T15:24:03Z

Rémi and I had a convo and decided to try using where to drop the dates that are out of season, followed by groupby_bins to do the aggregation over each season. There will then be one group per year rather than two, eliminating the complicated and not entirely correct concatenation and dropna steps.

First drops data not within seasons of interest Then uses groupby

remicousin · 2021-09-01T15:57:11Z

Latest commits improves on this. However, can't use groupby_bins, even after dropping days outside seasons, because the empty interval for the wrong seasons is still considered. Thus used groupby according to a group description. If that is the right route, then need to figure out how to give "group" better metadata (which is a bit sad because groupby_bins was creating nice labels -- not that I am sure they could be reused in a meaningful manner, though...).

Or could use groupby_bins withe edges as the first day of each season, but then the "nice" groupby_bins labels would have lost the end point of the seasons. I figured it might be easier for us to define "group" in a better fashion than the crude one I have now.

Maybe diff or differentiate applied to the edges of the seasons and keeping every other one could help build a time dimension to replace group. I haven't tried. And I don't know if it works on time. But that would be easy to figure out. But let's validate the last commit first.

…aningless integers Turns out I can't put a GroupBy object in a Dataset (or can I?) so instead, the function returns what groupby needs to be run then the functions that rely on it like seasonal_sum actually run the groupby

remicousin · 2021-09-02T18:39:50Z

In this last commit I also attempted to clean up: code formatting; naming conventions... but there might be more to do.

aaron-kaplan

You're doing some weird shit with array broadcasting that I find difficult to follow. I made it most of the way through but gave up near the end. If you think the result is right, let's move ahead for now. I would like to get the onset date function working, write a comprehensive set of automated tests for it (I will help with that), and then we can refine the implementation with confidence because the tests will tell us if we break anything.

onset_maproom/calc.py

…ast edges

…, seasons_starts/ends Return to cumsum trick and proper use of data selection

onset_maproom/calc.py

Previously, the case where the input data ended in 28 or 29 Feb was dropping the last complete year (because was lookign for March 1st) The change also makes end edges always right and not some times March 1st

aaron-kaplan

Looks good.

At some point I would like to see if we can express the season starts and ends using the CF conventions for cell boundaries. Xarray doesn't understand cell bounds, so this isn't relevant to our immediate goals, but other existing tools (e.g. ingrid) do understand cell bounds, and xarray might in the future, so I think we should generate CF-compliant metadata if we can. If you want to look into that now, feel free; if you want to put it off for another time I'm ok with that too.

remicousin · 2021-09-09T16:06:50Z

See that I made a last commit as I realized my dealing of picking 29 Feb as end of season was not quite right. I explain in the commit comment. Can explain further if need be. I think now it is square.

For the CF conventions business, shall we address the other PR first to wrap up onset_date function? Once this is merged, I'll rebase the other branch on master and I'll try use what we created here to apply the onset_index function with it to make an onset_date function (for multiple years basically). Then if I have time, I am happy to check back out at CF stuff.

onset_maproom/calc.py

and no need for end_day2/_month2

I hadnot understood where that extra dim was coming from...

first try at a function to break 1D-daily into 2D for seasons

cf3d209

Includes seasonal_sum to illustrate how could be used to make "seasonal" functions

remicousin requested review from drewmresnick and ikhomyakov August 17, 2021 13:58

remicousin added enhancement labels Aug 17, 2021

ikhomyakov assigned remicousin Aug 17, 2021

remicousin requested a review from aaron-kaplan August 25, 2021 22:00

aaron-kaplan reviewed Aug 31, 2021

View reviewed changes

onset_maproom/calc.py Outdated Show resolved Hide resolved

aaron-kaplan reviewed Aug 31, 2021

View reviewed changes

onset_maproom/calc.py Outdated Show resolved Hide resolved

remicousin added 2 commits August 31, 2021 11:58

indent comments like the rest

099f533

revisit daily grouping by season

aea13a2

First drops data not within seasons of interest Then uses groupby

aaron-kaplan requested changes Sep 2, 2021

View reviewed changes

remicousin added 8 commits September 7, 2021 11:04

4 spaces identation

c54cd44

commenting cosmetics

f1717c3

call grouped function only once in the seasonal sum example

d9f1a00

removing out-of-date comment

873c5f9

blacked

f1455c0

need not concatenate start/end_edges and can just reassign coord

4f2c609

more synthetic way to cut what's outside of the very first and very l…

6ee2a0a

…ast edges

no more need to broadcast to additional dimentions to generate groups…

8e1c35f

…, seasons_starts/ends Return to cumsum trick and proper use of data selection

aaron-kaplan reviewed Sep 9, 2021

View reviewed changes

onset_maproom/calc.py Outdated Show resolved Hide resolved

nothing beyond 88

12f7ae4

aaron-kaplan reviewed Sep 9, 2021

View reviewed changes

onset_maproom/calc.py Outdated Show resolved Hide resolved

aaron-kaplan reviewed Sep 9, 2021

View reviewed changes

onset_maproom/calc.py Outdated Show resolved Hide resolved

cosmetics

9179baa

aaron-kaplan reviewed Sep 9, 2021

View reviewed changes

onset_maproom/calc.py Show resolved Hide resolved

correct dealing of 29 Feb as end season input

719a3a3

Previously, the case where the input data ended in 28 or 29 Feb was dropping the last complete year (because was lookign for March 1st) The change also makes end edges always right and not some times March 1st

aaron-kaplan previously approved these changes Sep 9, 2021

View reviewed changes

remicousin dismissed aaron-kaplan’s stale review via 719a3a3 September 9, 2021 16:03

aaron-kaplan reviewed Sep 9, 2021

View reviewed changes

onset_maproom/calc.py Outdated Show resolved Hide resolved

remicousin added 3 commits September 9, 2021 14:45

use np syntaxt to add 1 da

6972d71

and no need for end_day2/_month2

no need to squeeze anymore

4a1a2eb

I hadnot understood where that extra dim was coming from...

recompleting the URL to Ingrid function doc

b215796

aaron-kaplan approved these changes Sep 9, 2021

View reviewed changes

remicousin merged commit 36f42c0 into master Sep 13, 2021

remicousin deleted the remic/onset3 branch September 13, 2021 15:36

Agro-Climate mentioned this pull request Mar 15, 2022

Remic/watbal #63

Closed

remicousin mentioned this pull request Mar 16, 2022

case rho is not constant #64

Closed

first try at a function to break 1D-daily into 2D for seasons #3

first try at a function to break 1D-daily into 2D for seasons #3

Uh oh!

Conversation

remicousin commented Aug 17, 2021

Uh oh!

remicousin commented Aug 17, 2021

Uh oh!

ikhomyakov commented Aug 20, 2021

Uh oh!

remicousin commented Aug 20, 2021

Uh oh!

aaron-kaplan commented Aug 31, 2021

Uh oh!

Uh oh!

Uh oh!

remicousin commented Aug 31, 2021

Uh oh!

aaron-kaplan commented Aug 31, 2021

Uh oh!

remicousin commented Sep 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

remicousin commented Sep 2, 2021

Uh oh!

aaron-kaplan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aaron-kaplan left a comment

Choose a reason for hiding this comment

Uh oh!

remicousin commented Sep 9, 2021

Uh oh!

Uh oh!

Uh oh!

remicousin commented Sep 1, 2021 •

edited

Loading