Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

major refactoring of local.py #27

Merged
merged 45 commits into from
Oct 10, 2023
Merged

major refactoring of local.py #27

merged 45 commits into from
Oct 10, 2023

Conversation

RondeauG
Copy link
Collaborator

@RondeauG RondeauG commented Sep 8, 2023

Pull Request Checklist:

  • This PR addresses an already opened issue (for bug fixes / features)
    • No.
  • (If applicable) Documentation has been added / updated (for bug fixes / features).
  • (If applicable) Tests have been added.
  • HISTORY.rst has been updated (with summary of main changes).
    • Link to issue (:issue:number) and pull request (:pull:number) has been added.

What kind of change does this PR introduce?

  • Class Data mostly replaced by get_yearly_op, both in indicators.py
  • Everything that Data did should be supported, but using xclim's stats indicator, with indexers & xclim.core.missing.
    • This will also allow for getting min instead of only max/volume.
    • This also means that you can use date_bounds (i.e. [03-01, 06-30]) as well as dayofyear.
  • Added a rolling window to get_yearly_op (e.g. Q14max).
  • The streamflow to volume transformation was moved to xhydro.indicators.compute_volume() and uses xclim.core.units.rate2amount().
  • Mostly re-written fit, parametric_quantiles & criteria functions in local.py (and split them in 3 for more control on what is computed).
    • The functions now stay fully lazy (no more .to_dataframe() or .load())
    • get_criterions (now _get_criteria_1d()) should be better optimized and should never crash for AICC.
  • Added xs.health_checks to xhydro.utils.

Does this PR introduce a breaking change?

  • Multiple functions were removed and not replaced, since alternatives already exist. Let me know if some of them need to be reincluded.
    • xhydro.utils.get_julian_day --> ds.time.dt.dayofyear should pretty much cover this, no? Also, date_bounds should cover what was needed from this function.
    • xhydro.utils.get_timestep --> xarray.infer_freq(ds.time) is a more robust version of this.
    • Data.select_catchments --> ds.where(ds.id.str.contains(), drop=True) should cover this. .contains() works with regex.
  • No more DataFrames anywhere, since they break the lazy computing and are horrible for memory management. If you want something more than a simple .to_dataframe() for your notebooks, we could code that in .utils.

Other information:

  • For now, the season dimension has been fully removed and get_yearly_op instead outputs multiple variables. One reason for this is to have much better metadata. We can discuss this.
  • Local has for now been replaced with 3 functions. If we still want a class, we'll need to see its usefulness.

TC-FF and others added 27 commits May 10, 2023 16:09
@RondeauG
Copy link
Collaborator Author

RondeauG commented Sep 8, 2023

@sebastienlanglois @TC-FF I'm not quite done, but I wanted to put the files here before our meeting so you can see what I'm working on.

@sebastienlanglois
Copy link
Contributor

sebastienlanglois commented Sep 8, 2023

Thanks @RondeauG, your work is fantastic! The code is super easy to understand and well-organized :) I know there's more to be done, but I couldn't resist giving your branch a try with the notebook example to see how it's coming along. The fit function worked like a charm! I will modify the notebook directly in you branch as the code is being developed so we can have a fully working example!

image
I'm having a small issue with parametric_quantiles. It seems like thenp.nan parameters are passed to each distribution and they except less. Maybe I'm not using it correctly
image.

Also, I would like to understand a little better how to use the code you provided below. My dataset, ds, contains a streamflow variable with dimensions (id, time), but I'm a bit uncertain about how to properly pass it in the example you've given. Currently, I'm encountering the following issue:
image

# Get the maximum streamflow between DOY 1 and 120, tolerance of 15% missing values
qmax_winter = xc.core.indicator.Indicator.from_dict(
                                 data={"base": "stats",
                                            "input": {"da": "streamflow"},
                                            "parameters":
                                                            {"op": "max",
                                                             "indexer": {"doy_bounds": [1, 120]}},
                                            "missing": "pct",
                                            "missing_options": {"tolerance": 0.15}
                                            },
                                  identifier="qmax_winter",
                                  module="fa",
                              )

Thanks again for all the work and looking forward to the next meeting!

@RondeauG RondeauG changed the base branch from Class-Data-in-frequency_analysis.local to main October 2, 2023 20:58
setup.py Outdated Show resolved Hide resolved
@RondeauG RondeauG marked this pull request as ready for review October 4, 2023 20:45
@RondeauG
Copy link
Collaborator Author

RondeauG commented Oct 4, 2023

@sebastienlanglois @TC-FF Ça devrait être prêt pour révision ! Quelques points spécifiques pour vous:

  1. Comme c'est un refactoring quasi-complet de Local frequency analysis #20, je crois que le plus simple pour cette PR est de merger directement dans le main. Quand on est contents avec cette PR, je crois qu'on pourra seulement fermer l'autre PR. Les comparaisons devenaient vraiment bordéliques, sinon...
  2. @TC-FF en particulier. Il y a certaines fonctions de l'ancien xhydro comme get_julian_day ou select_catchments que je n'ai pas transférées parce que des alternatives existent directement dans xarray, mais on pourrait toujours les remettre dans xhydro.utils si tu les veux. Juste me le faire savoir.
  3. J'ai mis à jour le notebook avec les nouvelles fonctions. Je sens que la partie sur l'analyse fréquentielle pourrait être bonifiée, mais au moins tout devrait être là.
  4. La classe Local est actuellement dans un entre deux un peu étrange où elle n'est plus trop utile (à mes yeux, en tout cas). Pour le moment, je l'ai commentée et j'ai aussi gardé en commentaires certaines options de visualisation to_dataframe qui existaient dans l'ancien xhydro. Est-ce que vous voulez encore ça ?
  5. @sebastienlanglois J'ai ajouté tes infos à la liste d'auteurs, mais il me manque notamment l'adresse courriel que tu voudrais utiliser. À toi aussi de voir si tu veux une affiliation ou pas dans le Zenodo (idem @TC-FF ).

@TC-FF TC-FF merged commit 84747f6 into main Oct 10, 2023
@TC-FF TC-FF deleted the local_changes branch October 10, 2023 14:27
@RondeauG RondeauG mentioned this pull request Oct 12, 2023
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants