Make indices and indicators easier to maintain and parse #364

aulemahal · 2020-02-12T19:32:34Z

Description

I think the structure of xclim needs a major refactoring to better organise the indices and indicators and make it easier for automated tools to parse the available methods.

Right now we define indices that take care of the computation and of the units. Then we create an indicator that adds checks and metadata. A few of those indicators called wrapped versions of the indices, but mostly the relationships are 1-to-1.

The problem:

There is a lot of duplicated code for very similar algorithms.
Both xclim.indices and xclim.atmos are crowded and hard to maintain.

I suggest the following structure:
Indices: Base algorithms, with some options and general names and doc
Indicator: Wrapped partial of indices, setting some options and defaults (+ metadata)
Aliases : For creating indicator that have different metadata and defaults for the same computation.

Aliases would simply update an existing Indicator with new metadata and new default values for parameters.

Ex:
Indice : degree_days(temp, ref_temp, max_temp, min_temp, sign)
Indicator : heating_degree_days -> degree_days(tas, 17 degC, 17 degC, None, -1)
Aliases : freezing_index -> heating_degree_days (ref_temp = 0 degC)

Caveats:
The parsed docstring would be less relevent to the actual indicator than right now, but there could be an easy way to modify this as we modify the metadata.
This will take time.

The text was updated successfully, but these errors were encountered:

huard · 2020-02-12T20:01:02Z

One thing we probably want to keep are the custom docstrings. I agree that computationally, heating and cooling degree days are similar, but ideally the docstrings would be different to explain what each function is doing, provide examples and references.

So totally agree we could replace similar code with generic functions, but I would preserve the individual indices to keep those docstrings.

aulemahal · 2020-02-12T22:34:15Z

Makes sense!

Decorator idea

We could implement this idea by having a Indicator decorator?

Example:

@create_indicator(
    identifier="blabla",
   **other metadata)
def heating_degree_days(tas, thresh='17 degC'):
    """Specific doc-string"""
    return degree_days(tas, ref_temp=thresh, max_temp=thresh, min_temp=None, sign=-1)

Pros:

Easy to implement
Signature is correctly seen by interactive editors (ipython, pycharm)
Doc-string easily overrided

Cons:

Contact with the compute function is lost
Unit handling has to be done again

Third level idea

Instead, there could be a xclim/indices/core.py file with the base algos. Then, in xclim/indices/threshold.py, we would have the multiple *_degree_days indices. This adds a 3rd level so we get:

Algorithms : only computation
Indices : docstring, unit handling
Indicator and Aliases : CF metadata, NaN- and incomplete period handling

Pros:

More structure

Cons:

Too much structure?

huard · 2020-02-13T14:43:47Z

I think we want to maintain access to the "pure computation" layer for users, so -1 for me for the decorator option.

We're already in some ways in the 3rd level state, where generic.py has a few functions (e.g. select_resample_op) that are re-used by indices.

I'm +1 on renaming generic.py to core.py and adding to it.

In terms of refactoring, I think we could also split utils.py and create a units.py for all the units handing stuff, and a base.py for the Indicator class related stuff.

tlogan2000 · 2020-02-13T14:57:52Z

I think this is a good idea as well (not sure I can give an opinion on exactly how to implement) but the idea is sound especially for a v1.x release

One thing we need to consider is that accessing xclim.indices directly is currently the only option for bypassing the very strict missing data checks

I think we would need to minimally add a 'skipna'=True option to the indicator classes.
Moving forward we may want to implement options for the various flavours WMO Data completeness checks : see section 4 https://library.wmo.int/doc_num.php?explnum_id=4166

Note - These are mainly wrt to calculating valid months. I would assume a valid year requires 12 valid months ?

We could then set a default check method depending on the indicator calculation (sum, mean, extreme, count,) but users could still mix and match depending on their need.

huard · 2020-02-13T15:09:12Z

Agree that we need more flexibility for missing data checks. For example, a typical use case is to say I'll accept results as long as there is less than 5% of missing values.

We need to think of an API for indicators for this. My suggestion would be to modify missing_any to support a % threshold, and then add an argument to the __call__ function to specify the threshold of missing data. That can be part of another issue though.

tlogan2000 · 2020-02-13T15:52:52Z

Agreed percent is a good / felxible compromise . However, is a little dangerous for annual in my opinion ... i.e. even 5% mssing could be ~19 days in a single month. We probably could apply the % check monthly then infer yearly based on having 12 acceptable months?

huard · 2020-02-13T16:14:19Z

This probably needs more thinking (do we interpolate missing values before averaging for example), but it's out of scope for this issue.

tlogan2000 · 2020-02-13T16:42:31Z

This probably needs more thinking (do we interpolate missing values before averaging for example), but it's out of scope for this issue.

Will create a separate issue for this

Zeitsperre · 2020-02-13T17:57:50Z

Just wanted to chime in here with a suggestion for dealing with docstrings and potentially call signatures (this also flows into the capabilities we want for a command-line tool).

Within Miranda, I've had a few instances where multi-level calls to functions necessitated wrapping many functions and in order to deal with the call signatures and docstrings, I've used a few libraries, I have a few examples within the GIS tools of the library that work effectively (https://github.com/Ouranosinc/miranda/blob/master/miranda/gis/vector.py), that depend on inspect and another library called custom_inherit (https://github.com/rsokl/custom_inherit).

huard · 2020-04-22T15:03:22Z

Another issue to consider is that some indicator arguments cannot take negative values, and this restriction should ideally be propagated to Finch and the climatedata.ca web frontend. Thoughts on how to solve this ?

aulemahal · 2020-04-22T15:22:45Z

I'm not sure what you mean here.
Is it: 1) We should add some input argument checks so that indicators fail on invalid input (e.g: negative Kelvin temps)
or 2) Right now some indicator fail on value that should be valid. If 2, any examples?

edit:
Thoughts on 1 : It would be a bit unpythonic to do that in the indices/indicator themselves. AFAI understand it, duck-typing implies that validity of argument is the caller's responsability. However, we could implement our own typing objects so that this information is made explicitly available and the WPS processes could use it. Or may be the indicators, but I would argue for keeping the ability to switch this off.

huard · 2020-04-22T15:40:31Z

And not only do a check, but encode the information that this argument requires positive values down to Finch, so it can enforce checking itself.

Yes, using Typing is something I've considered.

aulemahal · 2020-04-22T16:11:11Z

What would people think of this:
We implement our own type hints, like "PositiveInt", "NumberInRange[0, 10]", "Unit['temperature']", etc. Those would be new classes with at least a check(arg) function, returning True or False.
As we already check the units on indices, we would need to change declare_units to read the annotations instead of arguments. (And maybe rename it check_units).
Further argument checks could be done by another function in xclim.core.typing, or by the process/indicator themselves as the annotation itself already provides the checking function. This would limit us to testing only arguments annotated with our own types, which I think is ok.

huard · 2020-04-22T16:58:10Z

Is this an abuse of type hints ?
Otherwise I think this would work.

aulemahal · 2020-04-22T17:16:15Z

Python does not enforce any dynamic use of type hints, but we would need to look into common packages (ex: mypy) to see if this approach is compatible!

Zeitsperre · 2020-04-22T17:37:14Z

I don't see mypy being useful to us as it requires that all dependencies be using it (or have defined types in call signatures throughout) as well. Perhaps once the GIS-based libraries are moved from subset and the number of dependencies is reduced (significantly), mypy might be a good option.

Something similar would be worth looking into.

huard · 2020-04-22T19:07:32Z

Just to be clear, here is what I understand from this proposal:

@declare_units
def my_computation(tas: X[T], thresh: NonNeg[T]) -> X[T]:
...

And then check does something like:

inspect the signature,
get the type of each argument
map the type to an Input class
run the parameter through the class, which checks an input is valid

We'd probably want to remove the part that handles output unit conversion, or at least think about it.

aulemahal · 2020-04-22T20:09:40Z

That's what I am proposing!
Minor clarifications:
0. Would be the argument binding, like what is already done in the indicator class.
3. -> The annotation attribute of the signature object is the input class, so the check method would do a isinstance(annotation, BaseXclimType) before checking. Also, important point : I do not want to check the type but the value of the input argument. Let's keep xclim duck-typed.
5. We could have an option to either warn or raise on invalid inputs.

aulemahal added the enhancement New feature or request label Feb 12, 2020

aulemahal added this to the v1.0 milestone Feb 12, 2020

aulemahal self-assigned this Feb 12, 2020

tlogan2000 mentioned this issue Feb 13, 2020

Improve missing data check #365

Closed

aulemahal mentioned this issue Feb 13, 2020

Add command line interface #342

Merged

6 tasks

huard modified the milestones: v1.0, v0.18 Apr 16, 2020

huard mentioned this issue Apr 23, 2020

Use AllowedValue to constrain indicator arguments to the positive domain bird-house/finch#124

Open

huard mentioned this issue May 5, 2020

Cleaning up of Indicator class #435

Merged

6 tasks

huard modified the milestones: v0.18, v0.17 May 12, 2020

huard added this to the v0.18 milestone May 13, 2020

huard linked a pull request Jun 25, 2020 that will close this issue

Cleaning up of Indicator class #435

Merged

6 tasks

huard modified the milestones: v0.18, v0.19 Jun 26, 2020

huard closed this as completed in #435 Jul 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make indices and indicators easier to maintain and parse #364

Make indices and indicators easier to maintain and parse #364

aulemahal commented Feb 12, 2020

huard commented Feb 12, 2020

aulemahal commented Feb 12, 2020

huard commented Feb 13, 2020

tlogan2000 commented Feb 13, 2020

huard commented Feb 13, 2020

tlogan2000 commented Feb 13, 2020

huard commented Feb 13, 2020

tlogan2000 commented Feb 13, 2020

Zeitsperre commented Feb 13, 2020

huard commented Apr 22, 2020

aulemahal commented Apr 22, 2020 •

edited

Loading

huard commented Apr 22, 2020

aulemahal commented Apr 22, 2020

huard commented Apr 22, 2020

aulemahal commented Apr 22, 2020

Zeitsperre commented Apr 22, 2020

huard commented Apr 22, 2020

aulemahal commented Apr 22, 2020 •

edited

Loading

Make indices and indicators easier to maintain and parse #364

Make indices and indicators easier to maintain and parse #364

Comments

aulemahal commented Feb 12, 2020

Description

huard commented Feb 12, 2020

aulemahal commented Feb 12, 2020

Decorator idea

Third level idea

huard commented Feb 13, 2020

tlogan2000 commented Feb 13, 2020

huard commented Feb 13, 2020

tlogan2000 commented Feb 13, 2020

huard commented Feb 13, 2020

tlogan2000 commented Feb 13, 2020

Zeitsperre commented Feb 13, 2020

huard commented Apr 22, 2020

aulemahal commented Apr 22, 2020 • edited Loading

huard commented Apr 22, 2020

aulemahal commented Apr 22, 2020

huard commented Apr 22, 2020

aulemahal commented Apr 22, 2020

Zeitsperre commented Apr 22, 2020

huard commented Apr 22, 2020

aulemahal commented Apr 22, 2020 • edited Loading

aulemahal commented Apr 22, 2020 •

edited

Loading

aulemahal commented Apr 22, 2020 •

edited

Loading