Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add drift removal options consistent with best practice #168

Open
DamienIrving opened this issue Jul 22, 2021 · 4 comments
Open

Add drift removal options consistent with best practice #168

DamienIrving opened this issue Jul 22, 2021 · 4 comments

Comments

@DamienIrving
Copy link

DamienIrving commented Jul 22, 2021

Very exciting to see the recent developments regarding branch time visualization and drift removal.

As I understand it, the calculate_drift function subtracts the linear trend of the control run over the time period of the branched off forced runs. While restricting the linear fit to the time period of the control run that corresponds to the forced run is fine in many use cases, the literature on best practice for drift removal recommends a drift estimate based on the full control time series in order to minimize contamination of the drift estimate by internal variability (Sen Gupta et al., 2013).

It also often doesn't matter whether the drift estimate is based on a linear or higher order fit to the data, but in some cases it does matter and users will often want to compare and contrast a linear fit with other options like a cubic fit. Just to be safe, in the latest paper on drift in the CMIP6 ensemble the authors fit a cubic polynomial to the control experiment (Irving et al, 2021) and you can then subtract the relevant segment of that polynomial from the forced data (e.g. see Section 2.6 of Irving et al, 2019).

Long story short, it might be worth adding functionality to cmip6_preprocessing so that users can choose what polynomial they'd like to fit to the control data (linear or cubic is possibly enough options) and whether that fit is over the full control run or just the overlapping time period*. (And it would be cool if users could easily visualize the different fits to the control data).

Unfortunately all the drift removal code I wrote for our CMIP6 paper uses iris as opposed to xarray, but I could possibly help out with implementing these updates to cmip6_preprocessing if that would be useful.

*Occasionally the control runs will have a weird step change or something that you will want to avoid when fitting the polynomial, so it might actually be best to let the user specify full, overlapping or some custom period of time for the fit.

@jbusecke
Copy link
Owner

Thanks a lot for the valuable input @DamienIrving. I would love to collaborate with you on implementing these options (and also precalculate these in the cloud so users could easily try and decide which method fits their needs).

Few questions:

  1. When you say a drift estimate based on the full control time series, how do you deal with the different lengths of control runs provided? I have seen some with 250 years and others with 500+ years? I naively implemented the existing method thinking it would be more 'fair' to fit a linear trend over whatever period the branched run was covering.
  2. Implementing other methods should be fairly simple if we refactor the logic to use the new xarray polyfit (I had originally encountered some performance issues, but this here motivates me to dig into this more).

*Occasionally the control runs will have a weird step change or something that you will want to avoid when fitting the polynomial, so it might actually be best to let the user specify full, overlapping or some custom period of time for the fit.

There is already an option to customize the length of the period used for the trend fitting, but currently that is just in years since the branch point. This should be easy enough to adapt.

Unfortunately all the drift removal code I wrote for our CMIP6 paper uses iris as opposed to xarray, but I could possibly help out with implementing these updates to cmip6_preprocessing if that would be useful.

I would love some help on this, and I think it would be cool to have a published 'groundtruth' to compare results produced here?

@DamienIrving
Copy link
Author

With respect to different lengths of control runs, in my experience most people choose to be more accurate (i.e. use all the available control data) than "fair" (by using the forced run time period only) - and that's certainly the recommendation from Sen Gupta et al - but as long as you have the option of allowing the user to select what segment of the control data to use then they can make those decisions themselves.

Pre-calculating is an interesting idea. In my own work, I've tended to fit a cubic to the control data and then store the polynomial coefficients for re-use later. For instance, if the general expression is At^3 + Bt^2 + Ct + D, I'd store the A, B, C and D values for each grid point. You can then just feed in the t values corresponding to the forced experiment into the expression for your de-drifting.

Interesting aside: if you de-drift your data at every grid point and then calculate a spatially integrated quantity (e.g. de-drift ocean temperature data and then calculate global OHC) you'll get a different answer than if you spatially integrate first and then de-dedrift the spatially integrated quantity (e.g. calculate global OHC in the forced and control experiments and then de-drift the forced global OHC).

I'll try and put aside some time over the next few weeks to familiarize myself with cmip6_preprocessing so I can be useful when it comes to implementing these additions to the drift removal functionality 😄

@jbusecke
Copy link
Owner

Pre-calculating is an interesting idea. In my own work, I've tended to fit a cubic to the control data and then store the polynomial coefficients for re-use later. For instance, if the general expression is At^3 + Bt^2 + Ct + D, I'd store the A, B, C and D values for each grid point. You can then just feed in the t values corresponding to the forced experiment into the expression for your de-drifting.

I think either option is suited well for precalculation, because even if you store 4 values per grid point the data is very small compared to the full output, and the time to calculate is substantial. I like the idea of eventually having a simple option switch for people to calculate their end results with different methods and then comparing the results.

I'll try and put aside some time over the next few weeks to familiarize myself with cmip6_preprocessing so I can be useful when it comes to implementing these additions to the drift removal functionality 😄

Super awesome! Thanks. Any help i can provided, please feel free to reach out!

@jbusecke
Copy link
Owner

jbusecke commented Aug 2, 2021

Hi @DamienIrving,
I think once pydata/xarray#5629 has a solution we could start working on this issue. I will be on vacation in the two middle weeks of August but am very excited to tackle this after I am back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants