Correctly interpolate seasons in Grouper #2019

saschahofmann · 2024-12-11T11:21:42Z

Pull Request Checklist:

This PR addresses an already opened issue (for bug fixes / features)
- This PR fixes Linear interpolation for seasonal bias adjustment leads to non-smooth distribution #2014
Tests for the changes have been added (for bug fixes / features)
CHANGELOG.rst has been updated (with summary of main changes)
- Link to issue (:issue:number) and pull request (:pull:number) has been added

What kind of change does this PR introduce?

This PR adds a line to correctly interpolate seasonal values. It also changes the test_timeseries function that now accepts a calendar argument instead of cftime. Not providing it or providing None is equivalent to cftime=False and calendar='standard to the previous cftime=True. This allows for testing different calendar implementations e.g. 360_day calendars

saschahofmann · 2024-12-11T16:04:40Z

I just realised that the factor of 1/6 is assuming that all seasons have the same length which in gregorian calendars is not necessarily true but I am not sure it matters too much at least the function should be smooth.

saschahofmann · 2024-12-11T16:22:20Z

Just to prove that this leads to a smooth result, same input as in the issue:

github-actions · 2024-12-11T16:53:52Z

Warning

This Pull Request is coming from a fork and must be manually tagged approved in order to perform additional testing.

saschahofmann · 2024-12-12T09:46:16Z

Weirdly and contrary to what I showed yesterday, today I am still getting clear transitions as if there still wasn't any linear interpolation.

Zeitsperre · 2024-12-12T15:00:42Z

@saschahofmann We recently changed the layout of xclim to use a src structure. It might be worthwhile to try reinstalling the library.

…into fix-Ouranosinc#2014

saschahofmann · 2024-12-12T22:52:17Z

I reinstalled xclim but I am still getting very similar results to before the "fix". You have any advice on where else I could look?

Zeitsperre · 2024-12-13T13:27:13Z

I reinstalled xclim but I am still getting very similar results to before the "fix". You have any advice on where else I could look?

Could it be that you have obsolete __pycache__ folders still among your cloned folders? @coxipi is looking into recreating your example based on your branch for validation, but if the tests are working as intended on CI, then it's likely a caching/installation issue.

coxipi · 2024-12-13T13:38:30Z

I managed to install the environment, for some reason I only had the branch "main" when I cloned the fork yesterday

I confirmed that the function has the appropriate modifications inside the notebook I'm using

import inspect
print(inspect.getsource(sdba.base.Grouper.get_index))

I also find that the interpolation is wrong.

I'll try to have look later. Maybe the interp boolean condition is not triggered properly?

saschahofmann · 2024-12-16T11:05:38Z

I am pretty sure that the get_index function is updated in my notebook. Either I am wrong in expecting a smoother result (it seems to have changed slightly to what I got earlier) or there is something else going on. I will keep investigating

coxipi · 2024-12-16T18:00:34Z

It's simply interp which can't be "nearest", otherwise no interpolation takes place ... I think our only other option is linear.

from xclim import sdba
QM = sdba.EmpiricalQuantileMapping.train(
    ref, hist, nquantiles=15, group="time.season", kind="+"
)

scen = QM.adjust(sim, extrapolation="constant", interp="nearest")
scen_interp = QM.adjust(sim, extrapolation="constant", interp="linear")
outd = {
    "Reference":ref,
    "Model - biased":hist,
    "Model - adjusted - no interp":scen, 
    "Model - adjusted - linear interp":scen_interp, 
}
for k,da in outd.items(): 
    da.groupby("time.dayofyear").mean().plot(label=k)
plt.legend()

This doesn't reproduce your figure however. It seems your figure above was matching the reference very well, better than what I have even with the linear interpolation. But it does get rid of obvious discontinuities.

coxipi · 2024-12-16T19:00:51Z

There is clearly something wrong going on. Comparing
hist - scen_month
scen_time - scen_month
scen_season - scen_month
scen_month - scen_month

scen_season is way off

saschahofmann · 2024-12-18T13:36:18Z

@coxipi I think only mention this in the original issue: my analysis is done with QuantileDeltaMapping instead of EmpiricalQuantileMapping. Here the equivalent chart to yours for that:

season still seem kinda weird

saschahofmann · 2024-12-18T13:58:08Z

A similar trend becomes apparent when looking at the adjusted - historical (now for EmpiricalQuantileMapping)

coxipi · 2024-12-18T14:01:28Z

Yes, I have seen simlilar things by playing with the choice of how get_index. I feel this should not be this sensitive. Let me try and get this back

coxipi · 2024-12-18T15:58:43Z

Working the season adjustment factors interpolated to the months, then use the interpolated adjustment factors as if I was doing a monthly adjustment

saschahofmann · 2024-12-18T16:03:47Z

I guess that makes sense. If I am right above and the problem is that we use a single quantile determined for a season when interpolating the af than it'd make sense that when you interpolate to month the difference between the two is smoother.

saschahofmann · 2024-12-18T16:07:50Z

My suggestion is for a single future value:

Create a timeseries (read season series) by grabbing the quantiles respective to each season. So for example: the 28th Feb in our example timeseries is around the 99th percentile for season 0 but in the 1st percentile for season 1.
For this series we interpolate to the season float index. E.g in the case of 29 Feb to 0.5

coxipi · 2024-12-18T16:14:44Z

Same idea with dayofyear:

I have been very sloppy in implementing in this, but I just wanted to confirm we can have decent interpolation results from the season. So it's most likely and the way we do the interpolation that is not appropriate. Although what you discuss about quantiles vs. dayofyear is interesting, I think the problem is more basic

Adjustment with dayofyear is just a sample of 15 points, so It's quite noisy, not surprising:

I didn't really add the cyclic bounds correctly in this case either, I should have added ~45 days before and after for padding. Anyways, the middle of the still shows that the interpolation can be made to work better

coxipi · 2024-12-18T17:44:09Z

I guess that makes sense. If I am right above and the problem is that we use a single quantile determined for a season when interpolating the af than it'd make sense that when you interpolate to month the difference between the two is smoother.

I have also interpolated to dayofyear and get something better (see my last post), so I think it's something about the interpolation done in the interpolation in sdba.utils, do you agree?

To your point, the problem with the current interpolation might be related to weird interactions in quantiles /season, we're doing a 2d interpolation after all.

coxipi · 2024-12-18T19:48:09Z

What I don't fully understand: I assume that the new quantile is in respect to the current season? Which in this case would mean that the during the transitions the values are extremely different ?

I am not sure what you mean by the "new quantile is in respect to the current season"

saschahofmann · 2024-12-19T09:58:58Z

Ok I think my comment is only true for QuantileDeltaMapping. I am not understanding why it happens for EmpireQuantileMapping.

In the former case, the first thing we do is compute the quantile for a value in respect to a season

sim_q = group.apply(u.rank, ds.sim, main_only=True, pct=True)

I could imagine that when we do linear interpolation that doesn't make too much sense. Instead the quantile for a value should be computed for each season separately and then we should interpolate these values at the decimal season point.

saschahofmann · 2024-12-19T10:20:30Z

I do agree that there is something in the interpolation. I am looking at the plot of the interpolated af and I am very confused.

For two periods it seem to interpolate something correctly but the rest of the time it throws all sense overboard 😓
Some of the changing points seem to correlate with the mids of the seasons. I guess at these points the interpolation changes from using the 2 closest season values from one to the other so we get inconsistencies.

You think its in _interp_on_quantiles_2D? I tried adding rescale=True to np.griddata but that didn't help.

coxipi · 2024-12-19T13:57:19Z

I could imagine that when we do linear interpolation that doesn't make too much sense. Instead the quantile for a value should be computed for each season separately and then we should interpolate these values at the decimal season point.

I see your point.

Basic interpolation of EmpiricalQuantileMapping (EQM): Discrete data {sim_q, af_q} -> Interp to continuous function: af = f(sim).
Basic interpolation of QuantileDeltaMapping: Discrete data {q, af_q} -> Interp to continuous function: af = f(rank).
After we add a time grouping index in the mix, e.g. for EQM the discrete data is {season, sim_q,af_q}. The idea is to interpolate this to a bivariate function af = f(dayofyear, sim). In this case, it makes sense to think of the sim values as being distributed along dayofyear, we have the orginal timeseries for these values.
If we try this in QDM, the final function will look like af = f(dayofyear, rank). But the notion of rank is dependent on the time grouping. Saying that we want a mix of the adjustment of season-1 and season-2 when we are the boundary of the two seasons is a bit weird, since the ranks were computed for season-1 or season-2, but not the period in between.

To give a concrete example, consider tasmax on Feb.28. Its rank is computed in DJF. But if we included it in MAM, it would be a lower rank. We are saying that we have an interpolated continuous function that we want to apply on those ranks, but these values are still segmented, there are four groups of ranks in a year. Say Feb.28 of year 20XX is rank=0.9, but in MAM would be rank=0.3. Should apply the transformation for a rank in-between?

Anyways, I agree the situation in QDM is more complicated

saschahofmann · 2024-12-19T14:39:15Z

For the QDM case, I imagined it to be like this: for every sim value you get the af per season (or month) and then interpolate this "timeseries" to the time of the sim value. In your example of Feb 28, you would grab the af value for DJF at rank 0.9 and for MAM at rank 0.3 add the values and divide by 2 (since Feb 28 is at season 0.5).

Understand I correctly, that you would do the interpolation for EQM by first creating a yearly distribution per dayofyear?

coxipi · 2024-12-19T14:54:11Z

Understand I correctly, that you would do the interpolation for EQM by first creating a yearly distribution per dayofyear?

Yes, that is what I was doing. There is a first interpolation, convert the series of af_q obtained seasonally to a group of af_q interpolated to each dayofyear. Then, simply proceed as if the adjustment had been on dayofyears. Not temporal interpolation left, only a second uni-dimensional interpolation on the quantiles. For QDM, this means that the ranks of sim would be obtained individually for each dayofyear.

I was only trying to see if there is an interpolation that can work reasonnably well in this context, I'm not sure if it's the correct way to go. It's not a 2D interp, so it's quite different from the current implementation.

For the QDM case, I imagined it to be like this: for every sim value you get the af per season (or month) and then interpolate this "timeseries" to the time of the sim value. In your example of Feb 28, you would grab the af value for DJF at rank 0.9 and for MAM at rank 0.3 add the values and divide by 2 (since Feb 28 is at season 0.5).

I was thinking something like this, but what if you're more in season-1, e.g. Jan 31rst 20XX. Do you just add the specific value among the MAM group of values to compute "a rank of Jan 31rst 20XX within MAM"? It might make more sense to always compute the rank for a window of ~90 days around the day in question. In this case, this is starting to ressemble an adjustment with "time.dayofyear" with a window of 90. The difference is simply that the training data, you obtain adjustment factors not for every day, but only in the four ranges of ~90 days (AKA seasons)

saschahofmann · 2024-12-19T15:01:43Z

I was thinking that the timeseries you build for every value would be linearly interpolated like right now.

You build a timeseries with the respective quantiles for DJF at 0, MAM at 1, JJA at 2 and SON at 3 and then proceed to interpolate the value for the day of year ( I think Jan 31 is something like 0.173)

coxipi · 2024-12-19T15:08:53Z

Yes this could be done in this way too. I converted the 0,1,2,3 of seasons to given dayofyear values, then interpolated on the range [1,365], but instead we can dayofyears in the season range.

Right now, the interpolation is two dimensional, so the seasons and quantiles are interpolated at the same time.

saschahofmann · 2024-12-19T15:21:30Z

I guess that will probably lead to similar results? Only tricky thing are different calendar year lengths e.g. we work a lot with 360_day calendars

saschahofmann · 2024-12-19T15:24:40Z

What I still dont understand why we see a similar problem for the EmpiricalQuantileMapping? If I read the code correctly the simu values are directly interpolated to the historical values? not sure how the quantiles are considered in this?

coxipi · 2024-12-19T15:32:23Z

What I still dont understand why we see a similar problem for the EmpiricalQuantileMapping? If I read the code correctly the simu values are directly interpolated to the historical values? not sure how the quantiles are considered in this?

They aren't. I think the interpolation makes more sense as-is in this context, I agree. Maybe the interpolation is not working as expected

coxipi · 2024-12-19T15:34:33Z

Results below are for EQM

interpolating the results of training on dayofyear first, then perform a dayofyear adjustment (`adj-season-interp_on_doy`)

interpolating the results of training on month first, then perform a month adjustment, with the 2d interpolation (`adj-season-month`)

coxipi · 2024-12-19T15:38:49Z

Even with this alternative implementation, the results from "time" are closer to "month" than "season", and that seems wrong

coxipi · 2024-12-19T15:50:14Z

I think the season adjustment just doesn't work, forget about interpolation?

# compute a time adjustment with a mask to keep months 3-4-5
ref_spring = ref.where(ref.time.dt.month.isin([3,4,5]), drop=True)
hist_spring = hist.where(hist.time.dt.month.isin([3,4,5]), drop=True)
sim_spring = sim.where(sim.time.dt.month.isin([3,4,5]), drop=True)
scen_spring = sdba.EmpiricalQuantileMapping.train(ref_spring, hist_spring).adjust(sim_spring)

# compute monthly, choose at the end
scen_month = sdba.EmpiricalQuantileMapping.train(ref, hist, group="time.month").adjust(sim, interp="nearest")
scen_month_spring = scen_month.where(scen_month.time.dt.month.isin([3,4,5]), drop=True)

# compute seasonally, choose at the end
scen_season = sdba.EmpiricalQuantileMapping.train(ref, hist, group="time.season").adjust(sim, interp="nearest")
scen_season_spring = scen_season.where(scen_season.time.dt.month.isin([3,4,5]), drop=True)
outd={
    "Time adjustment on months 3,4,5":scen_spring, 
    "Season adjustment no interpolation, select months 3,4,5 at the end":scen_season_spring, 
}
for k,da in outd.items(): 
    (da).groupby("time.dayofyear").mean().plot(label=k)
plt.legend()

saschahofmann · 2024-12-19T16:00:11Z

Uff ok. I guess we would expect to be the same, wouldn't we?

saschahofmann · 2024-12-19T16:03:51Z

af's are exactly the same so it's not the training:

saschahofmann · 2024-12-19T16:15:39Z

I think it leads to the same problem: the interpolated af's are just wrong:

ds = eqm_time.ds
group = "time"
sim = hist_spring
af = u.interp_on_quantiles(
    sim,
    ds.hist_q,
    ds.af,
    method='nearest',
    group=group,
)

ds = eqm_season.ds
group = "time.season"
sim = hist
af_season = u.interp_on_quantiles(
    sim,
    ds.hist_q,
    ds.af,
    method='nearest',
    group=group,
)
af_season = af_season.where(af_season.time.dt.month.isin([3,4,5]), drop=True)

af.groupby('time.dayofyear').mean().plot(label='interpolated af time')
af_season.groupby('time.dayofyear').mean().plot(linestyle='--', label='interploated af time.season')
plt.legend()

coxipi · 2024-12-19T16:19:13Z

Great point. Just to summarize:

af agree	scen disagree

The problem exists even without interpolation though, with interp="nearest". That why I wanted to compare the "time" adjustment with only MAM in the timeseries vs. the "time.season" adjustment, WITHOUT interpolation, and select MAM at the end. I believe in this specific case, both these adjustments should be equivalent.

saschahofmann · 2024-12-19T16:26:30Z

Sorry my bad, interpolation in this case means nearest interpolation for each time value, as opposed to the af value provided by the training (with dimensions quantile [and season])

coxipi · 2024-12-19T16:43:45Z

In any case, the mean over the 15-year period should be closer to the target, time.season is way off (and in fact, I maintain, it should be equal to the time adjustment.)

Sorry my bad, interpolation in this case means nearest interpolation for each time value, as opposed to the af value provided by the training (with dimensions quantile [and season])

Got it, sorry I read too fast. Good observation. It seems _interp_on_quantiles_2D is messing up season even without interpolation, I will have to look into this

saschahofmann · 2024-12-19T16:49:25Z

I think I am getting closer. I tried to replicate whats happening inside _interp_on_quantiles_2D, for a single value on 2000-04-15. Above that value would be -1.23223145

from scipy.interpolate import griddata
oldx = qdm_season.ds.hist_q.sel(season='MAM')
oldg = np.ones(20)
oldy = qdm_season.ds.af.sel(season='MAM')
value = hist.sel(time='2000-04-15')
newx = value
newg = u.map_season_to_int(value.time.dt.season)
griddata(
    (oldx.values, oldg),
    oldy.values,
    (newx.values, newg),
    method='nearest'
)

If I didn't make mistake that should mimick the function but it leads to 0.20145109.

saschahofmann · 2024-12-20T15:55:14Z

Ok I just need to find the difference between this code and the one running _interp_on_quantiles_2D. Chart shows the af obtained from EQM with group='time' as above and the one manually computed.

from scipy.interpolate import griddata

oldx = qdm_season.ds.hist_q.sel(season="MAM")
oldg = np.ones(20)
oldy = qdm_season.ds.af.sel(season="MAM")
value = hist_spring
newx = value
newg = u.map_season_to_int(value.time.dt.season)
afs = griddata((oldx.values, oldg), oldy.values, (newx.values, newg), method="nearest")

af.groupby("time.dayofyear").mean().plot(label="interpolated af time")
af_corrected = xr.DataArray(afs, coords=dict(time=hist_spring.time))
af_corrected.groupby('time.dayofyear').mean().plot( linestyle="--",
                                                   label='seasonal af manually computed with griddata')
plt.legend()

coxipi · 2024-12-20T17:28:11Z

Great, I have reproduced your example

def _interp_on_quantiles_2d(newx, newg, oldx, oldy, oldg, method, extrap):
    mask_new = np.isnan(newx) | np.isnan(newg)
    mask_old = np.isnan(oldy) | np.isnan(oldx) | np.isnan(oldg)
    out = np.full_like(newx, np.nan, dtype=f"float{oldy.dtype.itemsize * 8}")
    if np.all(mask_new) or np.all(mask_old):
        warn(
            "All-nan slice encountered in interp_on_quantiles",
            category=RuntimeWarning,
        )
        return out
    out[~mask_new] = griddata(
        (oldx[~mask_old], oldg[~mask_old]),
        oldy[~mask_old],
        (newx[~mask_new], newg[~mask_new]),
        method=method,
    )
    # if method == "nearest" or extrap != "nan":
    #     # 'nan' extrapolation implicit for cubic and linear interpolation.
    #     out = _extrapolate_on_quantiles(out, oldx, oldg, oldy, newx, newg, extrap)
    return out

def bad_af_season(afq, hist):
    oldg = np.ones(20)
    oldy = afq.sel(season="MAM")
    value = hist.where(hist.time.dt.month.isin([3,4,5]), drop=True)
    newx = value
    newg = u.map_season_to_int(value.time.dt.season)
    afs = _interp_on_quantiles_2d(newx,newg,oldx,oldy,oldg, "nearest", "nan")
    return xr.DataArray(afs, coords=dict(time=value.time))
    
af.groupby("time.dayofyear").mean().plot(label="interpolated af time")
good_af_season(qdm_season.ds.af,hist).groupby('time.dayofyear').mean().plot( linestyle="--",
                                                   label='seasonal af manually computed with griddata')
bad_af_season(qdm_season.ds.af,hist).groupby('time.dayofyear').mean().plot( linestyle="--",
                                                   label='seasonal af manually computed with xclim internals')
plt.legend()

And I get the same (the functions are quite similar, not surprised, but I still wanted to confirm quickly). I was having some problems with numba, I commented the "extrapolate", but we have problems when method!="nearest" too, so that should not be the problem. So now I suspect that it might be because of the arguments
newx, newg, oldx, oldy, oldg, I've built them in your way, I have to check what is done in xclim

saschahofmann · 2024-12-20T20:18:06Z

Yes I'm also still trying to figure out the difference but I guess I ran into the same problem as you with the extrapolation

saschahofmann added 2 commits December 11, 2024 12:17

Correctly interpolate seasonal in Grouper

e8ba3f0

Update Changelog

26a9b0f

github-actions bot added the sdba Issues concerning the sdba submodule. label Dec 11, 2024

saschahofmann added 3 commits December 11, 2024 12:24

Fix test_make_hourly_temps

8661313

Fix linting

2c18c22

Add comment explaining the factor 1/6

18687b3

Fix typo in Changelog

ca2fc5f

Zeitsperre requested a review from aulemahal December 11, 2024 16:53

Zeitsperre added bug Something isn't working standards / conventions Suggestions on ways forward labels Dec 11, 2024

Zeitsperre added 2 commits December 11, 2024 15:57

Merge branch 'main' into fix-Ouranosinc#2014

4d8ee67

Merge branch 'main' into fix-Ouranosinc#2014

0ce6dfa

Zeitsperre mentioned this pull request Dec 12, 2024

Prepare v0.54.0 #2016

Merged

5 tasks

saschahofmann added 2 commits December 12, 2024 23:44

Change order of params for pip install

c111abd

Merge branch 'fix-Ouranosinc#2014' of github.com:saschahofmann/xclim …

ce8ff0c

…into fix-Ouranosinc#2014

github-actions bot added the docs Improvements to documenation label Dec 12, 2024

Merge branch 'main' into fix-Ouranosinc#2014

94091fc

Correctly interpolate seasons in Grouper #2019

Are you sure you want to change the base?

Correctly interpolate seasons in Grouper #2019

Conversation

saschahofmann commented Dec 11, 2024

Pull Request Checklist:

What kind of change does this PR introduce?

saschahofmann commented Dec 11, 2024

saschahofmann commented Dec 11, 2024

github-actions bot commented Dec 11, 2024

saschahofmann commented Dec 12, 2024

Zeitsperre commented Dec 12, 2024

saschahofmann commented Dec 12, 2024

Zeitsperre commented Dec 13, 2024

coxipi commented Dec 13, 2024 • edited Loading

saschahofmann commented Dec 16, 2024

coxipi commented Dec 16, 2024 • edited Loading

coxipi commented Dec 16, 2024

saschahofmann commented Dec 18, 2024

saschahofmann commented Dec 18, 2024 • edited Loading

coxipi commented Dec 18, 2024

coxipi commented Dec 18, 2024

saschahofmann commented Dec 18, 2024

saschahofmann commented Dec 18, 2024

coxipi commented Dec 18, 2024 • edited Loading

coxipi commented Dec 18, 2024 • edited Loading

coxipi commented Dec 18, 2024

saschahofmann commented Dec 19, 2024

saschahofmann commented Dec 19, 2024 • edited Loading

coxipi commented Dec 19, 2024 • edited Loading

saschahofmann commented Dec 19, 2024

coxipi commented Dec 19, 2024 • edited Loading

saschahofmann commented Dec 19, 2024

coxipi commented Dec 19, 2024

saschahofmann commented Dec 19, 2024

saschahofmann commented Dec 19, 2024

coxipi commented Dec 19, 2024

coxipi commented Dec 19, 2024 • edited Loading

interpolating the results of training on dayofyear first, then perform a dayofyear adjustment (adj-season-interp_on_doy)

interpolating the results of training on month first, then perform a month adjustment, with the 2d interpolation (adj-season-month)

coxipi commented Dec 19, 2024

coxipi commented Dec 19, 2024 • edited Loading

saschahofmann commented Dec 19, 2024

saschahofmann commented Dec 19, 2024

saschahofmann commented Dec 19, 2024 • edited Loading

coxipi commented Dec 19, 2024 • edited Loading

saschahofmann commented Dec 19, 2024

coxipi commented Dec 19, 2024

saschahofmann commented Dec 19, 2024

saschahofmann commented Dec 20, 2024

coxipi commented Dec 20, 2024 • edited Loading

saschahofmann commented Dec 20, 2024

coxipi commented Dec 13, 2024 •

edited

Loading

coxipi commented Dec 16, 2024 •

edited

Loading

saschahofmann commented Dec 18, 2024 •

edited

Loading

coxipi commented Dec 18, 2024 •

edited

Loading

coxipi commented Dec 18, 2024 •

edited

Loading

saschahofmann commented Dec 19, 2024 •

edited

Loading

coxipi commented Dec 19, 2024 •

edited

Loading

coxipi commented Dec 19, 2024 •

edited

Loading

coxipi commented Dec 19, 2024 •

edited

Loading

interpolating the results of training on dayofyear first, then perform a dayofyear adjustment (`adj-season-interp_on_doy`)

interpolating the results of training on month first, then perform a month adjustment, with the 2d interpolation (`adj-season-month`)

coxipi commented Dec 19, 2024 •

edited

Loading

saschahofmann commented Dec 19, 2024 •

edited

Loading

coxipi commented Dec 19, 2024 •

edited

Loading

coxipi commented Dec 20, 2024 •

edited

Loading