-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly interpolate seasons in Grouper #2019
base: main
Are you sure you want to change the base?
Conversation
I just realised that the factor of 1/6 is assuming that all seasons have the same length which in gregorian calendars is not necessarily true but I am not sure it matters too much at least the function should be smooth. |
Warning This Pull Request is coming from a fork and must be manually tagged |
Weirdly and contrary to what I showed yesterday, today I am still getting clear transitions as if there still wasn't any linear interpolation. |
@saschahofmann We recently changed the layout of xclim to use a |
I reinstalled xclim but I am still getting very similar results to before the "fix". You have any advice on where else I could look? |
Could it be that you have obsolete |
I managed to install the environment, for some reason I only had the branch "main" when I cloned the fork yesterday
import inspect
print(inspect.getsource(sdba.base.Grouper.get_index))
I'll try to have look later. Maybe the |
I am pretty sure that the |
It's simply from xclim import sdba
QM = sdba.EmpiricalQuantileMapping.train(
ref, hist, nquantiles=15, group="time.season", kind="+"
)
scen = QM.adjust(sim, extrapolation="constant", interp="nearest")
scen_interp = QM.adjust(sim, extrapolation="constant", interp="linear")
outd = {
"Reference":ref,
"Model - biased":hist,
"Model - adjusted - no interp":scen,
"Model - adjusted - linear interp":scen_interp,
}
for k,da in outd.items():
da.groupby("time.dayofyear").mean().plot(label=k)
plt.legend() This doesn't reproduce your figure however. It seems your figure above was matching the reference very well, better than what I have even with the linear interpolation. But it does get rid of obvious discontinuities. |
@coxipi I think only mention this in the original issue: my analysis is done with |
Yes, I have seen simlilar things by playing with the choice of how |
I guess that makes sense. If I am right above and the problem is that we use a single quantile determined for a season when interpolating the af than it'd make sense that when you interpolate to month the difference between the two is smoother. |
My suggestion is for a single future value:
|
I have also interpolated to dayofyear and get something better (see my last post), so I think it's something about the interpolation done in the interpolation in To your point, the problem with the current interpolation might be related to weird interactions in quantiles /season, we're doing a 2d interpolation after all. |
I am not sure what you mean by the "new quantile is in respect to the current season" |
Ok I think my comment is only true for QuantileDeltaMapping. I am not understanding why it happens for EmpireQuantileMapping. In the former case, the first thing we do is compute the quantile for a value in respect to a season sim_q = group.apply(u.rank, ds.sim, main_only=True, pct=True) I could imagine that when we do linear interpolation that doesn't make too much sense. Instead the quantile for a value should be computed for each season separately and then we should interpolate these values at the decimal season point. |
I see your point.
To give a concrete example, consider tasmax on Feb.28. Its rank is computed in DJF. But if we included it in MAM, it would be a lower rank. We are saying that we have an interpolated continuous function that we want to apply on those ranks, but these values are still segmented, there are four groups of ranks in a year. Say Feb.28 of year 20XX is rank=0.9, but in MAM would be rank=0.3. Should apply the transformation for a rank in-between? Anyways, I agree the situation in QDM is more complicated |
For the QDM case, I imagined it to be like this: for every sim value you get the Understand I correctly, that you would do the interpolation for EQM by first creating a yearly distribution per dayofyear? |
Yes, that is what I was doing. There is a first interpolation, convert the series of af_q obtained seasonally to a group of af_q interpolated to each dayofyear. Then, simply proceed as if the adjustment had been on dayofyears. Not temporal interpolation left, only a second uni-dimensional interpolation on the quantiles. For QDM, this means that the ranks of sim would be obtained individually for each dayofyear. I was only trying to see if there is an interpolation that can work reasonnably well in this context, I'm not sure if it's the correct way to go. It's not a 2D interp, so it's quite different from the current implementation.
I was thinking something like this, but what if you're more in season-1, e.g. Jan 31rst 20XX. Do you just add the specific value among the MAM group of values to compute "a rank of Jan 31rst 20XX within MAM"? It might make more sense to always compute the rank for a window of ~90 days around the day in question. In this case, this is starting to ressemble an adjustment with "time.dayofyear" with a window of 90. The difference is simply that the training data, you obtain adjustment factors not for every day, but only in the four ranges of ~90 days (AKA seasons) |
I was thinking that the timeseries you build for every value would be linearly interpolated like right now. You build a timeseries with the respective quantiles for DJF at 0, MAM at 1, JJA at 2 and SON at 3 and then proceed to interpolate the value for the day of year ( I think Jan 31 is something like 0.173) |
Yes this could be done in this way too. I converted the 0,1,2,3 of seasons to given dayofyear values, then interpolated on the range [1,365], but instead we can dayofyears in the season range. Right now, the interpolation is two dimensional, so the seasons and quantiles are interpolated at the same time. |
I guess that will probably lead to similar results? Only tricky thing are different calendar year lengths e.g. we work a lot with 360_day calendars |
What I still dont understand why we see a similar problem for the EmpiricalQuantileMapping? If I read the code correctly the simu values are directly interpolated to the historical values? not sure how the quantiles are considered in this? |
They aren't. I think the interpolation makes more sense as-is in this context, I agree. Maybe the interpolation is not working as expected |
Even with this alternative implementation, the results from "time" are closer to "month" than "season", and that seems wrong |
I think the season adjustment just doesn't work, forget about interpolation? # compute a time adjustment with a mask to keep months 3-4-5
ref_spring = ref.where(ref.time.dt.month.isin([3,4,5]), drop=True)
hist_spring = hist.where(hist.time.dt.month.isin([3,4,5]), drop=True)
sim_spring = sim.where(sim.time.dt.month.isin([3,4,5]), drop=True)
scen_spring = sdba.EmpiricalQuantileMapping.train(ref_spring, hist_spring).adjust(sim_spring)
# compute monthly, choose at the end
scen_month = sdba.EmpiricalQuantileMapping.train(ref, hist, group="time.month").adjust(sim, interp="nearest")
scen_month_spring = scen_month.where(scen_month.time.dt.month.isin([3,4,5]), drop=True)
# compute seasonally, choose at the end
scen_season = sdba.EmpiricalQuantileMapping.train(ref, hist, group="time.season").adjust(sim, interp="nearest")
scen_season_spring = scen_season.where(scen_season.time.dt.month.isin([3,4,5]), drop=True)
outd={
"Time adjustment on months 3,4,5":scen_spring,
"Season adjustment no interpolation, select months 3,4,5 at the end":scen_season_spring,
}
for k,da in outd.items():
(da).groupby("time.dayofyear").mean().plot(label=k)
plt.legend() |
Uff ok. I guess we would expect to be the same, wouldn't we? |
I think it leads to the same problem: the interpolated af's are just wrong: ds = eqm_time.ds
group = "time"
sim = hist_spring
af = u.interp_on_quantiles(
sim,
ds.hist_q,
ds.af,
method='nearest',
group=group,
)
ds = eqm_season.ds
group = "time.season"
sim = hist
af_season = u.interp_on_quantiles(
sim,
ds.hist_q,
ds.af,
method='nearest',
group=group,
)
af_season = af_season.where(af_season.time.dt.month.isin([3,4,5]), drop=True)
af.groupby('time.dayofyear').mean().plot(label='interpolated af time')
af_season.groupby('time.dayofyear').mean().plot(linestyle='--', label='interploated af time.season')
plt.legend() |
Sorry my bad, interpolation in this case means |
In any case, the mean over the 15-year period should be closer to the target, time.season is way off (and in fact, I maintain, it should be equal to the time adjustment.)
Got it, sorry I read too fast. Good observation. It seems |
I think I am getting closer. I tried to replicate whats happening inside from scipy.interpolate import griddata
oldx = qdm_season.ds.hist_q.sel(season='MAM')
oldg = np.ones(20)
oldy = qdm_season.ds.af.sel(season='MAM')
value = hist.sel(time='2000-04-15')
newx = value
newg = u.map_season_to_int(value.time.dt.season)
griddata(
(oldx.values, oldg),
oldy.values,
(newx.values, newg),
method='nearest'
) If I didn't make mistake that should mimick the function but it leads to 0.20145109. |
Ok I just need to find the difference between this code and the one running from scipy.interpolate import griddata
oldx = qdm_season.ds.hist_q.sel(season="MAM")
oldg = np.ones(20)
oldy = qdm_season.ds.af.sel(season="MAM")
value = hist_spring
newx = value
newg = u.map_season_to_int(value.time.dt.season)
afs = griddata((oldx.values, oldg), oldy.values, (newx.values, newg), method="nearest")
af.groupby("time.dayofyear").mean().plot(label="interpolated af time")
af_corrected = xr.DataArray(afs, coords=dict(time=hist_spring.time))
af_corrected.groupby('time.dayofyear').mean().plot( linestyle="--",
label='seasonal af manually computed with griddata')
plt.legend() |
Great, I have reproduced your example def _interp_on_quantiles_2d(newx, newg, oldx, oldy, oldg, method, extrap):
mask_new = np.isnan(newx) | np.isnan(newg)
mask_old = np.isnan(oldy) | np.isnan(oldx) | np.isnan(oldg)
out = np.full_like(newx, np.nan, dtype=f"float{oldy.dtype.itemsize * 8}")
if np.all(mask_new) or np.all(mask_old):
warn(
"All-nan slice encountered in interp_on_quantiles",
category=RuntimeWarning,
)
return out
out[~mask_new] = griddata(
(oldx[~mask_old], oldg[~mask_old]),
oldy[~mask_old],
(newx[~mask_new], newg[~mask_new]),
method=method,
)
# if method == "nearest" or extrap != "nan":
# # 'nan' extrapolation implicit for cubic and linear interpolation.
# out = _extrapolate_on_quantiles(out, oldx, oldg, oldy, newx, newg, extrap)
return out
def bad_af_season(afq, hist):
oldg = np.ones(20)
oldy = afq.sel(season="MAM")
value = hist.where(hist.time.dt.month.isin([3,4,5]), drop=True)
newx = value
newg = u.map_season_to_int(value.time.dt.season)
afs = _interp_on_quantiles_2d(newx,newg,oldx,oldy,oldg, "nearest", "nan")
return xr.DataArray(afs, coords=dict(time=value.time))
af.groupby("time.dayofyear").mean().plot(label="interpolated af time")
good_af_season(qdm_season.ds.af,hist).groupby('time.dayofyear').mean().plot( linestyle="--",
label='seasonal af manually computed with griddata')
bad_af_season(qdm_season.ds.af,hist).groupby('time.dayofyear').mean().plot( linestyle="--",
label='seasonal af manually computed with xclim internals')
plt.legend() And I get the same (the functions are quite similar, not surprised, but I still wanted to confirm quickly). I was having some problems with numba, I commented the "extrapolate", but we have problems when method!="nearest" too, so that should not be the problem. So now I suspect that it might be because of the arguments |
Yes I'm also still trying to figure out the difference but I guess I ran into the same problem as you with the extrapolation |
Pull Request Checklist:
number
) and pull request (:pull:number
) has been addedWhat kind of change does this PR introduce?
This PR adds a line to correctly interpolate seasonal values. It also changes the test_timeseries function that now accepts a
calendar
argument instead ofcftime
. Not providing it or providingNone
is equivalent tocftime=False
andcalendar='standard
to the previouscftime=True
. This allows for testing different calendar implementations e.g. 360_day calendars