-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Add initial property-based tests using Hypothesis #22280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
80f126c
3b3889d
d51cac5
ae17d4d
5c6e2bd
779b49a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,3 +28,4 @@ dependencies: | |
- pytest | ||
- pytest-xdist | ||
- moto | ||
- hypothesis>=3.58.0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,3 +25,4 @@ dependencies: | |
- cython>=0.28.2 | ||
- pytest | ||
- pytest-xdist | ||
- hypothesis>=3.58.0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,7 @@ | |
'html5lib', | ||
'ipython', | ||
'jinja2' | ||
'hypothesis', | ||
'lxml', | ||
'numexpr', | ||
'openpyxl', | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,3 +26,4 @@ dependencies: | |
- html5lib==1.0b2 | ||
- beautifulsoup4==4.2.1 | ||
- pymysql==0.6.0 | ||
- hypothesis>=3.58.0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,3 +11,5 @@ dependencies: | |
# universal | ||
- pytest | ||
- pytest-xdist | ||
- pip: | ||
- hypothesis>=3.58.0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,3 +31,5 @@ dependencies: | |
- pytest | ||
- pytest-xdist | ||
- moto | ||
- pip: | ||
- hypothesis>=3.58.0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -32,3 +32,5 @@ dependencies: | |
- pytest | ||
- pytest-xdist | ||
- moto | ||
- pip: | ||
- hypothesis>=3.58.0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -45,6 +45,7 @@ dependencies: | |
- pytest | ||
- pytest-xdist | ||
- moto | ||
- hypothesis>=3.58.0 | ||
- pip: | ||
- backports.lzma | ||
- cpplint | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,3 +25,4 @@ dependencies: | |
- pytest-xdist | ||
- pip: | ||
- python-dateutil==2.5.3 | ||
- hypothesis>=3.58.0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,3 +28,4 @@ dependencies: | |
- pytest | ||
- pytest-xdist | ||
- moto | ||
- hypothesis>=3.58.0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,6 +41,7 @@ dependencies: | |
- pytest-xdist | ||
- pytest-cov | ||
- moto | ||
- hypothesis>=3.58.0 | ||
- pip: | ||
- brotlipy | ||
- coverage | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,3 +12,4 @@ dependencies: | |
- pytz | ||
- pytest | ||
- pytest-xdist | ||
- hypothesis>=3.58.0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -450,3 +450,37 @@ def mock(): | |
return importlib.import_module("unittest.mock") | ||
else: | ||
return pytest.importorskip("mock") | ||
|
||
|
||
# ---------------------------------------------------------------- | ||
# Global setup for tests using Hypothesis | ||
|
||
from hypothesis import strategies as st | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will put a hard dependency on hypothesis for testing. Are we OK with that? After some thought, I think it's fine. It's a well-maintained project, and working around it in the test suite seems silly. If we're ok with that, then @Zac-HD could you update
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My read of the reviews so far is that @jreback was in favor of a mandatory dependency (also my recommendation), and you're now in favor too. I've therefore made the relevant changes and it's all ready to go 🎉 (though one build on Travis has errored out, the tests passed until the timeout) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so, I still think we need to a) remove hypothesis from 1 build (the same one we have removed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback there are two problems with making Hypothesis optional for
That doesn't make it completely unreasonable, I'd prefer to just have the dependency - and I've been using Pandas for much longer than Hypothesis! TLDR - what's wrong with putting Hypothesis in the same category as pytest? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think moto is a bit different, since it's relatively unimportant to mainline pandas, and so is easy to work around. IMO, hypothesis should be treated the same as pytest. |
||
|
||
# Registering these strategies makes them globally available via st.from_type, | ||
# which is use for offsets in tests/tseries/offsets/test_offsets_properties.py | ||
for name in 'MonthBegin MonthEnd BMonthBegin BMonthEnd'.split(): | ||
cls = getattr(pd.tseries.offsets, name) | ||
st.register_type_strategy(cls, st.builds( | ||
cls, | ||
n=st.integers(-99, 99), | ||
normalize=st.booleans(), | ||
)) | ||
|
||
for name in 'YearBegin YearEnd BYearBegin BYearEnd'.split(): | ||
cls = getattr(pd.tseries.offsets, name) | ||
st.register_type_strategy(cls, st.builds( | ||
cls, | ||
n=st.integers(-5, 5), | ||
normalize=st.booleans(), | ||
month=st.integers(min_value=1, max_value=12), | ||
)) | ||
|
||
for name in 'QuarterBegin QuarterEnd BQuarterBegin BQuarterEnd'.split(): | ||
cls = getattr(pd.tseries.offsets, name) | ||
st.register_type_strategy(cls, st.builds( | ||
cls, | ||
n=st.integers(-24, 24), | ||
normalize=st.booleans(), | ||
startingMonth=st.integers(min_value=1, max_value=12) | ||
)) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
# -*- coding: utf-8 -*- | ||
""" | ||
Behavioral based tests for offsets and date_range. | ||
|
||
This file is adapted from https://github.com/pandas-dev/pandas/pull/18761 - | ||
which was more ambitious but less idiomatic in its use of Hypothesis. | ||
|
||
You may wish to consult the previous version for inspiration on further | ||
tests, or when trying to pin down the bugs exposed by the tests below. | ||
""" | ||
|
||
import pytest | ||
from hypothesis import given, assume, strategies as st | ||
from hypothesis.extra.pytz import timezones as pytz_timezones | ||
from hypothesis.extra.dateutil import timezones as dateutil_timezones | ||
|
||
import pandas as pd | ||
|
||
from pandas.tseries.offsets import ( | ||
MonthEnd, MonthBegin, BMonthEnd, BMonthBegin, | ||
QuarterEnd, QuarterBegin, BQuarterEnd, BQuarterBegin, | ||
YearEnd, YearBegin, BYearEnd, BYearBegin, | ||
) | ||
|
||
# ---------------------------------------------------------------- | ||
# Helpers for generating random data | ||
|
||
gen_date_range = st.builds( | ||
pd.date_range, | ||
start=st.datetimes( | ||
# TODO: Choose the min/max values more systematically | ||
min_value=pd.Timestamp(1900, 1, 1).to_pydatetime(), | ||
max_value=pd.Timestamp(2100, 1, 1).to_pydatetime() | ||
), | ||
periods=st.integers(min_value=2, max_value=100), | ||
freq=st.sampled_from('Y Q M D H T s ms us ns'.split()), | ||
tz=st.one_of(st.none(), dateutil_timezones(), pytz_timezones()), | ||
) | ||
|
||
gen_random_datetime = st.datetimes( | ||
min_value=pd.Timestamp.min.to_pydatetime(), | ||
max_value=pd.Timestamp.max.to_pydatetime(), | ||
timezones=st.one_of(st.none(), dateutil_timezones(), pytz_timezones()) | ||
) | ||
|
||
# The strategy for each type is registered in conftest.py, as they don't carry | ||
# enough runtime information (e.g. type hints) to infer how to build them. | ||
gen_yqm_offset = st.one_of(*map(st.from_type, [ | ||
MonthBegin, MonthEnd, BMonthBegin, BMonthEnd, | ||
QuarterBegin, QuarterEnd, BQuarterBegin, BQuarterEnd, | ||
YearBegin, YearEnd, BYearBegin, BYearEnd | ||
])) | ||
|
||
|
||
# ---------------------------------------------------------------- | ||
# Offset-specific behaviour tests | ||
|
||
|
||
# Based on CI runs: Always passes on OSX, fails on Linux, sometimes on Windows | ||
@pytest.mark.xfail(strict=False, reason='inconsistent between OSs, Pythons') | ||
@given(gen_random_datetime, gen_yqm_offset) | ||
def test_on_offset_implementations(dt, offset): | ||
assume(not offset.normalize) | ||
# check that the class-specific implementations of onOffset match | ||
# the general case definition: | ||
# (dt + offset) - offset == dt | ||
compare = (dt + offset) - offset | ||
assert offset.onOffset(dt) == (compare == dt) | ||
|
||
|
||
@pytest.mark.xfail(strict=True) | ||
@given(gen_yqm_offset, gen_date_range) | ||
def test_apply_index_implementations(offset, rng): | ||
# offset.apply_index(dti)[i] should match dti[i] + offset | ||
assume(offset.n != 0) # TODO: test for that case separately | ||
|
||
# rng = pd.date_range(start='1/1/2000', periods=100000, freq='T') | ||
ser = pd.Series(rng) | ||
|
||
res = rng + offset | ||
res_v2 = offset.apply_index(rng) | ||
assert (res == res_v2).all() | ||
|
||
assert res[0] == rng[0] + offset | ||
assert res[-1] == rng[-1] + offset | ||
res2 = ser + offset | ||
# apply_index is only for indexes, not series, so no res2_v2 | ||
assert res2.iloc[0] == ser.iloc[0] + offset | ||
assert res2.iloc[-1] == ser.iloc[-1] + offset | ||
# TODO: Check randomly assorted entries, not just first/last | ||
|
||
|
||
@pytest.mark.xfail(strict=True) | ||
@given(gen_yqm_offset) | ||
def test_shift_across_dst(offset): | ||
# GH#18319 check that 1) timezone is correctly normalized and | ||
# 2) that hour is not incorrectly changed by this normalization | ||
# Note that dti includes a transition across DST boundary | ||
dti = pd.date_range(start='2017-10-30 12:00:00', end='2017-11-06', | ||
freq='D', tz='US/Eastern') | ||
assert (dti.hour == 12).all() # we haven't screwed up yet | ||
|
||
res = dti + offset | ||
assert (res.hour == 12).all() |
Uh oh!
There was an error while loading. Please reload this page.