Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add test for resampling over long periods #329

Merged
merged 4 commits into from
Feb 16, 2023

Conversation

MarcoGorelli
Copy link

No description provided.

@CommonClimate
Copy link
Collaborator

Hi @MarcoGorelli, test_resample_long_periods looks like a very minimal test (2 values). Is that sufficient to judge that the code is working properly? If I understand things right, the test is taking dataframe_dt(), which generates a series of 5 values (0 to 4) at yearly intervals, reindexes them according to 3 rules (1 Ga, 1 Ma and 2 ka), and makes sure that the resampled average is correct. However, I don't understand:

  • why the expected values are [0 2.5], as the mean value of dataframe_dt() is 2.0.
  • why there is a test on metadata that uses SOI metadata. Is it to make the the rule is properly added to the series' label?

The docstring is a little terse. What do you mean by "The offset string or object representing target conversion"? Can you link to pandas pages that explain how to use that? I added a docstring example that should produce this output:

LR04_resample5k

other than that, it looks good.

@CommonClimate
Copy link
Collaborator

Additional question on this, which popped up in #330 . I notice that the pandas Resampler class has an interpolate method, but that does not appear to be available to pyleo.SeriesResampler(). Would it be hard to make that work? We just figured out it would save us a bunch of headaches to apply interpolation via pandas instead of the more limited scipy.interpolate.interp1d, which can't deal with NaNs.

@MarcoGorelli
Copy link
Author

hey @CommonClimate

Is that sufficient to judge that the code is working properly?

not really, but it's tested in pandas itself, here it's just to check that the convenience method works fine

thanks for updating the docstring!

I'll take a look at interpolate, I think it should work

@MarcoGorelli
Copy link
Author

why the expected values are [0 2.5], as the mean value of dataframe_dt() is 2.0.

it's because of how it's bucketing the data - the values are [0, 1, 2, 3, 4], and 0 ends up in its own bucket, and the others in the other one, and so the means are 0 and 2.5

@CommonClimate
Copy link
Collaborator

OK. I think we're ready to merge when this check passes. How do you want to handle the interpolate() thing? Using the docstring example, this is what I get:

ts5k = ts.resample('5ka').interpolate()
/Users/julieneg/opt/miniconda3/envs/pyleo310/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,

@MarcoGorelli
Copy link
Author

why there is a test on metadata that uses SOI metadata. Is it to make the the rule is properly added to the series' label?

yeah it's just to check that it roundtrips

I've added a link to the pandas docs

Regarding interpolate - it looks like you already remove null values during instantiation

v_mod, t_mod = tsbase.clean_ts(self.value, self.time, verbose=verbose)

, could you show an example of how you end up with missing values and would like to interpolate them please? Would also prefer to keep that to a separate issue if possible

@CommonClimate anything else need doing here?

@CommonClimate
Copy link
Collaborator

Hi @MarcoGorelli , sure I'll merge this now and we'll open a new issue in the paleopandas repo so I can tag you there.

@CommonClimate CommonClimate merged commit 51e2f37 into LinkedEarth:Development Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants