-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make indices and indicators easier to maintain and parse #364
Comments
One thing we probably want to keep are the custom docstrings. I agree that computationally, heating and cooling degree days are similar, but ideally the docstrings would be different to explain what each function is doing, provide examples and references. So totally agree we could replace similar code with generic functions, but I would preserve the individual indices to keep those docstrings. |
Makes sense! Decorator ideaWe could implement this idea by having a Indicator decorator? Example: @create_indicator(
identifier="blabla",
**other metadata)
def heating_degree_days(tas, thresh='17 degC'):
"""Specific doc-string"""
return degree_days(tas, ref_temp=thresh, max_temp=thresh, min_temp=None, sign=-1) Pros:
Cons:
Third level ideaInstead, there could be a
Pros:
Cons:
|
I think we want to maintain access to the "pure computation" layer for users, so -1 for me for the decorator option. We're already in some ways in the 3rd level state, where generic.py has a few functions (e.g. select_resample_op) that are re-used by indices. I'm +1 on renaming generic.py to core.py and adding to it. In terms of refactoring, I think we could also split utils.py and create a units.py for all the units handing stuff, and a base.py for the Indicator class related stuff. |
I think this is a good idea as well (not sure I can give an opinion on exactly how to implement) but the idea is sound especially for a v1.x release One thing we need to consider is that accessing I think we would need to minimally add a 'skipna'=True option to the indicator classes. Note - These are mainly wrt to calculating valid months. I would assume a valid year requires 12 valid months ? We could then set a default check method depending on the indicator calculation (sum, mean, extreme, count,) but users could still mix and match depending on their need. |
Agree that we need more flexibility for missing data checks. For example, a typical use case is to say I'll accept results as long as there is less than 5% of missing values. We need to think of an API for indicators for this. My suggestion would be to modify missing_any to support a % threshold, and then add an argument to the |
Agreed percent is a good / felxible compromise . However, is a little dangerous for annual in my opinion ... i.e. even 5% mssing could be ~19 days in a single month. We probably could apply the % check monthly then infer yearly based on having 12 acceptable months? |
This probably needs more thinking (do we interpolate missing values before averaging for example), but it's out of scope for this issue. |
Will create a separate issue for this |
Just wanted to chime in here with a suggestion for dealing with docstrings and potentially call signatures (this also flows into the capabilities we want for a command-line tool). Within Miranda, I've had a few instances where multi-level calls to functions necessitated wrapping many functions and in order to deal with the call signatures and docstrings, I've used a few libraries, I have a few examples within the GIS tools of the library that work effectively (https://github.com/Ouranosinc/miranda/blob/master/miranda/gis/vector.py), that depend on |
Another issue to consider is that some indicator arguments cannot take negative values, and this restriction should ideally be propagated to Finch and the climatedata.ca web frontend. Thoughts on how to solve this ? |
I'm not sure what you mean here. edit: |
And not only do a check, but encode the information that this argument requires positive values down to Finch, so it can enforce checking itself. Yes, using Typing is something I've considered. |
What would people think of this: |
Is this an abuse of type hints ? |
Python does not enforce any dynamic use of type hints, but we would need to look into common packages (ex: mypy) to see if this approach is compatible! |
I don't see mypy being useful to us as it requires that all dependencies be using it (or have defined types in call signatures throughout) as well. Perhaps once the GIS-based libraries are moved from subset and the number of dependencies is reduced (significantly), mypy might be a good option. Something similar would be worth looking into. |
Just to be clear, here is what I understand from this proposal:
And then
We'd probably want to remove the part that handles output unit conversion, or at least think about it. |
That's what I am proposing! |
Description
I think the structure of xclim needs a major refactoring to better organise the indices and indicators and make it easier for automated tools to parse the available methods.
Right now we define indices that take care of the computation and of the units. Then we create an indicator that adds checks and metadata. A few of those indicators called wrapped versions of the indices, but mostly the relationships are 1-to-1.
The problem:
xclim.indices
andxclim.atmos
are crowded and hard to maintain.I suggest the following structure:
Indices: Base algorithms, with some options and general names and doc
Indicator: Wrapped partial of indices, setting some options and defaults (+ metadata)
Aliases : For creating indicator that have different metadata and defaults for the same computation.
Aliases
would simply update an existing Indicator with new metadata and new default values for parameters.Ex:
Indice :
degree_days(temp, ref_temp, max_temp, min_temp, sign)
Indicator :
heating_degree_days -> degree_days(tas, 17 degC, 17 degC, None, -1)
Aliases :
freezing_index -> heating_degree_days (ref_temp = 0 degC)
Caveats:
The parsed docstring would be less relevent to the actual indicator than right now, but there could be an easy way to modify this as we modify the metadata.
This will take time.
The text was updated successfully, but these errors were encountered: