Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enh/threshold metadata #189

Merged
merged 83 commits into from
Sep 2, 2022
Merged

Enh/threshold metadata #189

merged 83 commits into from
Sep 2, 2022

Conversation

bzah
Copy link
Member

@bzah bzah commented Jun 17, 2022

Pull Request to resolve #183

  • Unit tests cover the changes.
  • These changes were tested on real data.
  • The relevant documentation has been added or updated.
  • A short description of the changes has been added to doc/source/references/release_notes.rst.

Describe the changes you made

  • icclim.index threshold parameter now properly update the index name and title in the output.
  • Add generic indices with rich metadata.
  • Add configurable Threshold
  • [breaking change] Make all the ECAD indices non configurable

The generic indices include:

  • CountOccurrences
    for SU, TR, TG90p, TN90p, TX90p, FD, ID, TG10p, TN10p, TX10p, RR1, R10mm, R20mm, R75p, R95p, R99p, SD1, SD5cm, SD50cm, CD, CW, WD, WW

  • MaxConsecutiveOccurrence
    for CSU, CFD, CDD, CWD

  • SumOfSpellLengths
    WSDI, CSDI

  • Excess
    GD4

  • Deficit
    HD17

  • FractionOfTotal
    R75pTOT, R95pTOT, r99pTOT

  • MeanOfDifference
    DTR

  • DifferenceOfExtremes
    ETR

  • MeanOfAbsoluteOneTimeStepDifference
    vDTR

  • Maximum
    RX1day, TXx, TNx, user_index - maximum

  • Minimum
    TXn, TNn, user_index - minimum

  • Average
    TG, TN, TX, SDII, SD, user_index - mean

  • Sum
    PRCPTOT, user_index - sum

  • StandardDeviation
    new

  • MaxOfRollingSum
    RX5day, user_index - rolling_sum on extreme_mode=max

  • MinOfRollingSum
    user_index - rolling_sum on extreme_mode=min

  • MaxOfRollingAverage
    user_index - rolling_average on extreme_mode=max

  • MinOfRollingAverage
    user_index - rolling_average on extreme_mode=min

  • DifferenceOfMeans
    user_index - anomaly

Every generic function is configurable.
Those that depend on a threshold benefit from the Threshold class to allow to pass doy percentiles, period percentiles, per grid cells, scalar with unit and unit less scalar as a valid thresholds.
A few examples (syntax may change):

# count_occurrences
su_33 = icclim.index(in_files="data.nc", index_name="count_occurrences", threshold=">= 33 degC", slice_mode="jja")
wsdi_99 = icclim.index(in_files="data.nc", index_name="count_occurrences", threshold=">= 99 doy_per")
period_wsdi = icclim.index(in_files="data.nc", index_name="count_occurrences", threshold=">= 99 period_per")

# Excess
gd4_in_kelvin = icclim.index(in_files="data.nc", index_name="excess", threshold="277.15 K")
excess_of_pr = icclim.index(in_files="data.nc", index_name="excess", threshold="10 mmday")

# FractionOfTotal
r99pTOT = icclim.index(in_files="data.nc", index_name="fraction_of_total", threshold=Threshold(">= 99 period_per", min_value="1 mmday"))

Results of icclim.index are Dataset (or netcdf) with rich metadata including identifier, standard_name, long_name, description, cell_methods, short_name.
Example for su33 computed above:

short_name:       number_of_tx_days
standard_name:    number_of_days_when_maximum_air_temperature_greater_or_equal_to_33.0_degC
description:      Number of days of summer when maximum temperature is greater or equal to 33.0 degC.
long_name:        Number of days when tx is greater or equal to 33.0 degC.
units:            d
cell_methods:     time: sum over days

(note that standard_name should be changed to something more CF friendly).

@bzah bzah added this to the 5.3 milestone Jun 17, 2022
@bzah bzah linked an issue Jun 17, 2022 that may be closed by this pull request
@bzah bzah force-pushed the enh/percentile_as_input branch from 2d8a841 to c59d370 Compare June 21, 2022 08:20
@bzah bzah force-pushed the enh/threshold_metadata branch from 09f2e7c to 3baf929 Compare June 21, 2022 08:46
Base automatically changed from enh/percentile_as_input to master June 21, 2022 08:48
@bzah bzah marked this pull request as ready for review June 21, 2022 08:48
@bzah

This comment was marked as outdated.

@bzah bzah requested a review from pagecp June 21, 2022 08:54
@bzah bzah marked this pull request as draft June 21, 2022 15:23
@bzah

This comment was marked as outdated.

@bzah bzah force-pushed the enh/threshold_metadata branch from 5b6c7a8 to 9b4498f Compare June 29, 2022 12:15
@bzah bzah force-pushed the enh/threshold_metadata branch from 9b4498f to 052881d Compare June 30, 2022 21:07
bzah and others added 11 commits July 6, 2022 19:00
- We can now compute a CDD or a heatwave
We still can't compute CWD (`tn<{t1}` and `pr>{t2}`) though
- when in_files is a dictionary,
threshold can now be used the same way as `threshold` of `icclim.index`.
- Add registry class
- remove ::compute from Indicator class (replaced by plain _call_ method)
- turn IndexConfig into a dataclass
- Make Threshold  constructor handle percentile computation
- Divide ClimateVar creation in two separate processes
  1. When in_files is a dictionary
  2. When in_files is a readable content directly (nc, zarr, da, ds)
- BREAKING CHANGE: when in_files is a dict, renamed per_var_name to threshold_var_name
- `threshold` now works with strings thresholds
It's either a threshold with unit (e.g. "22 degC")
or doy percentiles (e.g. "75 doy_per")
or period percentiles (e.g. "75 period_per")
- `threshold` works with a combination such as ["1mm", "75 doy_per"]
- Refactored read_climate_vars to always create a dictionary from the `var_names`
bzah added 6 commits August 29, 2022 11:00
LogicalLink is not (yet?) part of icclim::index API.
coef is not part of icclim::index API, it can only be computed with a user_index.
date_event is now part of icclim::index API.
It can be used on `maximum`, `minimum`, `max_of_rolling_average`,
`max_of_rolling_sum`,`min_of_rolling_average`,`min_of_rolling_sum`.
I forgot to add it to count_occurrences (the first and last date of the occurrences, per sampling)
and to max_consecutive_occurrence (start and end date of the spell).

Also clean up dead code from user index.
@bzah bzah force-pushed the enh/threshold_metadata branch from cfd0d4c to cb76220 Compare August 29, 2022 17:04
bzah added 9 commits August 30, 2022 14:50
This unit can be used with either `count_occurrences` or `fraction_of_total`.
For ECAD indices, '%' is the default unit of rxxpTOT index family (as per specified in ATBD v11).
Also fix some minor issues with date_event (explicit use of lat and lon dims).
`::_guess_dataset_var_names` is capable of handling
None standard_index.
No need to raise an error.
They were unit tests and are now integration tests.
It was computing **min** instead of max.
Big bad commit of the day that refactor quite a few things following unit test update.
- enh: Split `window_width` into 3 distinct parameter.
- maint: renamed is_single_var into is_compared_to_reference for readability.
- enh: Add `to_percent` to difference_of_means (aka anomaly).
- enh: Improve metadata for unknown varaibles
- fix: When adding a variable with `must_add_reference_var`, if time_range exist now it works.
- maint: Simplified `must_add_reference_var` as `climate_vars_dict` is always a dictionary
- enh: Improve input parsing for when the input is a list of mixed type (e.g. path to nc + a DataArray)
- maint: update most tests to use generic indicator
- maint: some more boring refactoring
- maint: Add release notes
- Added `sampling_method` parameter to control the behavior of `slice_mode`.
For now this is implemented only for `difference_of_means`.
@github-actions
Copy link

github-actions bot commented Sep 1, 2022

Coverage

Report
FileStmtsMissCoverMissing
icclim
   icclim_exceptions.py7186%20
   icclim_logger.py852966%24, 43, 65, 70–95, 105, 111–142, 150–155
   main.py2442191%101, 122, 131, 366, 451, 477–478, 480–481, 483, 485–486, 488–489, 508–515, 652, 659, 663
   utils.py32391%18, 29, 42
icclim/generic_indices
   cf_var_metadata.py50198%26
   generic_indicators.py4617983%53, 56, 72, 76, 80, 97, 103, 202, 248, 255, 380, 707, 738, 744, 752, 771, 854, 874–882, 886, 916, 977–978, 1009–1036, 1063–1086, 1102, 1117–1147, 1153–1157
icclim/models
   cf_calendar.py35197%20
   climate_variable.py97991%92, 136, 147, 193, 199, 232, 260–261, 298
   frequency.py1751393%167–170, 314–316, 323, 401, 427, 432, 483, 510
   operator.py29197%14
   registry.py30197%23
   standard_index.py30293%65, 69
   threshold.py1343078%98–99, 104–117, 120, 143–147, 203–205, 219–230, 239–256, 272–273
icclim/pre_processing
   input_parsing.py1864675%42, 103–118, 124–129, 139, 171, 187, 191, 201, 221, 230, 232, 239, 248, 276, 281–289, 309–325, 348
   rechunk.py87298%39, 129
icclim/tests
   test_rechunk.py27293%65, 93
   testing_utils.py22291%31, 44
TOTAL294124392% 

Test results

Tests Skipped Failures Errors Time
143 0 💤 0 ❌ 0 🔥 1m 42s ⏱️

@bzah bzah marked this pull request as ready for review September 1, 2022 15:28
@bzah
Copy link
Member Author

bzah commented Sep 1, 2022

Yipiti youpi, this pull request is now ready!

@bzah
Copy link
Member Author

bzah commented Sep 1, 2022

Before merge:

  • cleanup todos
  • update release note
  • pump coverage of generic_indicators.py to min 80%

bzah and others added 4 commits September 2, 2022 18:36
Some indices were not using the expected operator.
">=" was used instead of ">"
Plus the registry ::list was broken
It was broken when using the new groupby+resample sampling_method.
- Removed `identifier` from generic indicator in favor of `name`, which is **not** templated.
- Fixed bootstrapped computation that could not run if "percentiles" was not in coords (because it was renamed)
- Fixed some warnings
@bzah bzah merged commit a34d39f into master Sep 2, 2022
@bzah bzah deleted the enh/threshold_metadata branch September 2, 2022 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: [metadata] Make tx90p configurable threshold appears in metadata
1 participant