-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: replace of numeric by string / dtype coversion (GH15743) #15812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #15812 +/- ##
==========================================
- Coverage 90.99% 90.97% -0.02%
==========================================
Files 143 143
Lines 49403 49418 +15
==========================================
+ Hits 44956 44960 +4
- Misses 4447 4458 +11
Continue to review full report at Codecov.
|
@@ -985,3 +985,5 @@ Bug Fixes | |||
- Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`) | |||
- Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`) | |||
- Bug in ``pd.read_msgpack`` which did not allow to load dataframe with an index of type ``CategoricalIndex`` (:issue:`15487`) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI for the future if you put this somewhere in the Bug Fixes section , rather than the end you won't have merge conflicts. (we have blank lines for this purpose)
pandas/core/missing.py
Outdated
if not isinstance(values_to_mask, (list, np.ndarray)): | ||
if isinstance(values_to_mask, np.ndarray): | ||
mask_type = values_to_mask.dtype.type | ||
elif isinstance(values_to_mask, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can change this entire test to:
# import at top if its not
from pandas._libs.lib import infer_dtype
....
inferred = infer_dtype(values_to_mask)
if inferred in ['string', 'unicode']:
mask_type = np.object
else:
mask_type = np.asarray(values_to_mask).dtype
I think this will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may need to include 'mixed' here as well, and tests this too:
mixed is [1, '1']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change only to simplify? Or is this change a must do? I ask before I implemented and it broke all tests. I tried to investigate why, didn't understand yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what did this break?
yes, testing the first value is wrong (as it could also be 0-len), further it might have mixed values anyhow.
show me a test that broke?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could build on what I wrote and just add the mixed support. Anyway, following your approach, the beginning of the function is this:
def mask_missing(arr, values_to_mask):
"""
Return a masking array of same size/shape as arr
with entries equaling any member of values_to_mask set to True
"""
inferred = infer_dtype(values_to_mask)
if inferred in ['string', 'unicode']:
mask_type = np.object
else:
mask_type = np.asarray(values_to_mask).dtype
if not isinstance(values_to_mask, (list, np.ndarray)):
values_to_mask = [values_to_mask]
try:
values_to_mask = np.array(values_to_mask, dtype=mask_type)
except Exception:
values_to_mask = np.array(values_to_mask, dtype=object)
...
This breaks the following tests:
Here's the output:
/Users/carlos/anaconda/envs/pandas_dev/bin/python3.6 "/Users/carlos/Library/Application Support/IntelliJIdea2017.1/python/helpers/pycharm/_jb_pytest_runner.py" --path /Users/carlos/Dropbox/opensource/pandas-ucals/pandas/tests/series/test_replace.py
Testing started at 21:32 ...
Launching py.test with arguments /Users/carlos/Dropbox/opensource/pandas-ucals/pandas/tests/series/test_replace.py
============================= test session starts ==============================
platform darwin -- Python 3.6.0, pytest-3.0.7, py-1.4.32, pluggy-0.4.0
rootdir: /Users/carlos/Dropbox/opensource/pandas-ucals, inifile: setup.cfg
plugins: cov-2.3.1
collected 11 items
pandas/tests/series/test_replace.py F
pandas/tests/series/test_replace.py:12 (TestSeriesReplace.test_replace)
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace>
def test_replace(self):
N = 100
ser = pd.Series(np.random.randn(N))
ser[0:4] = np.nan
ser[6:10] = 0
# replace list with a single value
ser.replace([np.nan], -1, inplace=True)
exp = ser.fillna(-1)
tm.assert_series_equal(ser, exp)
rs = ser.replace(0., np.nan)
ser[ser == 0.] = np.nan
> tm.assert_series_equal(rs, ser)
pandas/tests/series/test_replace.py:27:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/util/testing.py:1215: in assert_series_equal
obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
obj = 'Series', message = 'Series values are different (4.0 %)'
left = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]'
right = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]'
diff = None
def raise_assert_detail(obj, message, left, right, diff=None):
if isinstance(left, np.ndarray):
left = pprint_thing(left)
if isinstance(right, np.ndarray):
right = pprint_thing(right)
msg = """{0} are different
{1}
[left]: {2}
[right]: {3}""".format(obj, message, left, right)
if diff is not None:
msg = msg + "\n[diff]: {diff}".format(diff=diff)
> raise AssertionError(msg)
E AssertionError: Series are different
E
E Series values are different (4.0 %)
E [left]: [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]
E [right]: [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]
pandas/util/testing.py:1053: AssertionError
F
pandas/tests/series/test_replace.py:189 (TestSeriesReplace.test_replace2)
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace2>
def test_replace2(self):
N = 100
ser = pd.Series(np.fabs(np.random.randn(N)), tm.makeDateIndex(N),
dtype=object)
ser[:5] = np.nan
ser[6:10] = 'foo'
ser[20:30] = 'bar'
# replace list with a single value
rs = ser.replace([np.nan, 'foo', 'bar'], -1)
> self.assertTrue((rs[:5] == -1).all())
E AssertionError: False is not true
pandas/tests/series/test_replace.py:201: AssertionError
F
pandas/tests/series/test_replace.py:178 (TestSeriesReplace.test_replace_bool_with_bool)
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_bool>
def test_replace_bool_with_bool(self):
s = pd.Series([True, False, True])
result = s.replace(True, False)
expected = pd.Series([False] * len(s))
> tm.assert_series_equal(expected, result)
pandas/tests/series/test_replace.py:183:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/util/testing.py:1215: in assert_series_equal
obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
obj = 'Series', message = 'Series values are different (66.66667 %)'
left = '[False, False, False]', right = '[True, False, True]', diff = None
def raise_assert_detail(obj, message, left, right, diff=None):
if isinstance(left, np.ndarray):
left = pprint_thing(left)
if isinstance(right, np.ndarray):
right = pprint_thing(right)
msg = """{0} are different
{1}
[left]: {2}
[right]: {3}""".format(obj, message, left, right)
if diff is not None:
msg = msg + "\n[diff]: {diff}".format(diff=diff)
> raise AssertionError(msg)
E AssertionError: Series are different
E
E Series values are different (66.66667 %)
E [left]: [False, False, False]
E [right]: [True, False, True]
pandas/util/testing.py:1053: AssertionError
F
pandas/tests/series/test_replace.py:171 (TestSeriesReplace.test_replace_bool_with_string)
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_string>
def test_replace_bool_with_string(self):
# nonexistent elements
s = pd.Series([True, False, True])
result = s.replace(True, '2u')
expected = pd.Series(['2u', False, '2u'])
> tm.assert_series_equal(expected, result)
pandas/tests/series/test_replace.py:177:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/util/testing.py:1188: in assert_series_equal
assert_attr_equal('dtype', left, right)
pandas/util/testing.py:918: in assert_attr_equal
left_attr, right_attr)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
obj = 'Attributes', message = 'Attribute "dtype" are different'
left = dtype('O'), right = dtype('bool'), diff = None
def raise_assert_detail(obj, message, left, right, diff=None):
if isinstance(left, np.ndarray):
left = pprint_thing(left)
if isinstance(right, np.ndarray):
right = pprint_thing(right)
msg = """{0} are different
{1}
[left]: {2}
[right]: {3}""".format(obj, message, left, right)
if diff is not None:
msg = msg + "\n[diff]: {diff}".format(diff=diff)
> raise AssertionError(msg)
E AssertionError: Attributes are different
E
E Attribute "dtype" are different
E [left]: object
E [right]: bool
pandas/util/testing.py:1053: AssertionError
. . F
pandas/tests/series/test_replace.py:123 (TestSeriesReplace.test_replace_mixed_types)
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_mixed_types>
def test_replace_mixed_types(self):
s = pd.Series(np.arange(5), dtype='int64')
def check_replace(to_rep, val, expected):
sc = s.copy()
r = s.replace(to_rep, val)
sc.replace(to_rep, val, inplace=True)
tm.assert_series_equal(expected, r)
tm.assert_series_equal(expected, sc)
# MUST upcast to float
e = pd.Series([0., 1., 2., 3., 4.])
tr, v = [3], [3.0]
check_replace(tr, v, e)
# MUST upcast to float
e = pd.Series([0, 1, 2, 3.5, 4])
tr, v = [3], [3.5]
check_replace(tr, v, e)
# casts to object
e = pd.Series([0, 1, 2, 3.5, 'a'])
tr, v = [3, 4], [3.5, 'a']
check_replace(tr, v, e)
# again casts to object
e = pd.Series([0, 1, 2, 3.5, pd.Timestamp('20130101')])
tr, v = [3, 4], [3.5, pd.Timestamp('20130101')]
check_replace(tr, v, e)
# casts to object
e = pd.Series([0, 1, 2, 3.5, True], dtype='object')
tr, v = [3, 4], [3.5, True]
check_replace(tr, v, e)
# test an object with dates + floats + integers + strings
dr = pd.date_range('1/1/2001', '1/10/2001',
freq='D').to_series().reset_index(drop=True)
result = dr.astype(object).replace(
[dr[0], dr[1], dr[2]], [1.0, 2, 'a'])
expected = pd.Series([1.0, 2, 'a'] + dr[3:].tolist(), dtype=object)
> tm.assert_series_equal(result, expected)
pandas/tests/series/test_replace.py:165:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/util/testing.py:1215: in assert_series_equal
obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
obj = 'Series', message = 'Series values are different (30.0 %)'
left = '[2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]'
right = '[1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]'
diff = None
def raise_assert_detail(obj, message, left, right, diff=None):
if isinstance(left, np.ndarray):
left = pprint_thing(left)
if isinstance(right, np.ndarray):
right = pprint_thing(right)
msg = """{0} are different
{1}
[left]: {2}
[right]: {3}""".format(obj, message, left, right)
if diff is not None:
msg = msg + "\n[diff]: {diff}".format(diff=diff)
> raise AssertionError(msg)
E AssertionError: Series are different
E
E Series values are different (30.0 %)
E [left]: [2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]
E [right]: [1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]
pandas/util/testing.py:1053: AssertionError
. . . .
=================================== FAILURES ===================================
________________________ TestSeriesReplace.test_replace ________________________
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace>
def test_replace(self):
N = 100
ser = pd.Series(np.random.randn(N))
ser[0:4] = np.nan
ser[6:10] = 0
# replace list with a single value
ser.replace([np.nan], -1, inplace=True)
exp = ser.fillna(-1)
tm.assert_series_equal(ser, exp)
rs = ser.replace(0., np.nan)
ser[ser == 0.] = np.nan
> tm.assert_series_equal(rs, ser)
pandas/tests/series/test_replace.py:27:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/util/testing.py:1215: in assert_series_equal
obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
obj = 'Series', message = 'Series values are different (4.0 %)'
left = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]'
right = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]'
diff = None
def raise_assert_detail(obj, message, left, right, diff=None):
if isinstance(left, np.ndarray):
left = pprint_thing(left)
if isinstance(right, np.ndarray):
right = pprint_thing(right)
msg = """{0} are different
{1}
[left]: {2}
[right]: {3}""".format(obj, message, left, right)
if diff is not None:
msg = msg + "\n[diff]: {diff}".format(diff=diff)
> raise AssertionError(msg)
E AssertionError: Series are different
E
E Series values are different (4.0 %)
E [left]: [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]
E [right]: [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]
pandas/util/testing.py:1053: AssertionError
_______________________ TestSeriesReplace.test_replace2 ________________________
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace2>
def test_replace2(self):
N = 100
ser = pd.Series(np.fabs(np.random.randn(N)), tm.makeDateIndex(N),
dtype=object)
ser[:5] = np.nan
ser[6:10] = 'foo'
ser[20:30] = 'bar'
# replace list with a single value
rs = ser.replace([np.nan, 'foo', 'bar'], -1)
> self.assertTrue((rs[:5] == -1).all())
E AssertionError: False is not true
pandas/tests/series/test_replace.py:201: AssertionError
________________ TestSeriesReplace.test_replace_bool_with_bool _________________
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_bool>
def test_replace_bool_with_bool(self):
s = pd.Series([True, False, True])
result = s.replace(True, False)
expected = pd.Series([False] * len(s))
> tm.assert_series_equal(expected, result)
pandas/tests/series/test_replace.py:183:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/util/testing.py:1215: in assert_series_equal
obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
obj = 'Series', message = 'Series values are different (66.66667 %)'
left = '[False, False, False]', right = '[True, False, True]', diff = None
def raise_assert_detail(obj, message, left, right, diff=None):
if isinstance(left, np.ndarray):
left = pprint_thing(left)
if isinstance(right, np.ndarray):
right = pprint_thing(right)
msg = """{0} are different
{1}
[left]: {2}
[right]: {3}""".format(obj, message, left, right)
if diff is not None:
msg = msg + "\n[diff]: {diff}".format(diff=diff)
> raise AssertionError(msg)
E AssertionError: Series are different
E
E Series values are different (66.66667 %)
E [left]: [False, False, False]
E [right]: [True, False, True]
pandas/util/testing.py:1053: AssertionError
_______________ TestSeriesReplace.test_replace_bool_with_string ________________
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_string>
def test_replace_bool_with_string(self):
# nonexistent elements
s = pd.Series([True, False, True])
result = s.replace(True, '2u')
expected = pd.Series(['2u', False, '2u'])
> tm.assert_series_equal(expected, result)
pandas/tests/series/test_replace.py:177:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/util/testing.py:1188: in assert_series_equal
assert_attr_equal('dtype', left, right)
pandas/util/testing.py:918: in assert_attr_equal
left_attr, right_attr)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
obj = 'Attributes', message = 'Attribute "dtype" are different'
left = dtype('O'), right = dtype('bool'), diff = None
def raise_assert_detail(obj, message, left, right, diff=None):
if isinstance(left, np.ndarray):
left = pprint_thing(left)
if isinstance(right, np.ndarray):
right = pprint_thing(right)
msg = """{0} are different
{1}
[left]: {2}
[right]: {3}""".format(obj, message, left, right)
if diff is not None:
msg = msg + "\n[diff]: {diff}".format(diff=diff)
> raise AssertionError(msg)
E AssertionError: Attributes are different
E
E Attribute "dtype" are different
E [left]: object
E [right]: bool
pandas/util/testing.py:1053: AssertionError
__________________ TestSeriesReplace.test_replace_mixed_types __________________
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_mixed_types>
def test_replace_mixed_types(self):
s = pd.Series(np.arange(5), dtype='int64')
def check_replace(to_rep, val, expected):
sc = s.copy()
r = s.replace(to_rep, val)
sc.replace(to_rep, val, inplace=True)
tm.assert_series_equal(expected, r)
tm.assert_series_equal(expected, sc)
# MUST upcast to float
e = pd.Series([0., 1., 2., 3., 4.])
tr, v = [3], [3.0]
check_replace(tr, v, e)
# MUST upcast to float
e = pd.Series([0, 1, 2, 3.5, 4])
tr, v = [3], [3.5]
check_replace(tr, v, e)
# casts to object
e = pd.Series([0, 1, 2, 3.5, 'a'])
tr, v = [3, 4], [3.5, 'a']
check_replace(tr, v, e)
# again casts to object
e = pd.Series([0, 1, 2, 3.5, pd.Timestamp('20130101')])
tr, v = [3, 4], [3.5, pd.Timestamp('20130101')]
check_replace(tr, v, e)
# casts to object
e = pd.Series([0, 1, 2, 3.5, True], dtype='object')
tr, v = [3, 4], [3.5, True]
check_replace(tr, v, e)
# test an object with dates + floats + integers + strings
dr = pd.date_range('1/1/2001', '1/10/2001',
freq='D').to_series().reset_index(drop=True)
result = dr.astype(object).replace(
[dr[0], dr[1], dr[2]], [1.0, 2, 'a'])
expected = pd.Series([1.0, 2, 'a'] + dr[3:].tolist(), dtype=object)
> tm.assert_series_equal(result, expected)
pandas/tests/series/test_replace.py:165:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/util/testing.py:1215: in assert_series_equal
obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
obj = 'Series', message = 'Series values are different (30.0 %)'
left = '[2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]'
right = '[1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]'
diff = None
def raise_assert_detail(obj, message, left, right, diff=None):
if isinstance(left, np.ndarray):
left = pprint_thing(left)
if isinstance(right, np.ndarray):
right = pprint_thing(right)
msg = """{0} are different
{1}
[left]: {2}
[right]: {3}""".format(obj, message, left, right)
if diff is not None:
msg = msg + "\n[diff]: {diff}".format(diff=diff)
> raise AssertionError(msg)
E AssertionError: Series are different
E
E Series values are different (30.0 %)
E [left]: [2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]
E [right]: [1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]
pandas/util/testing.py:1053: AssertionError
====================== 5 failed, 6 passed in 0.54 seconds ======================
Process finished with exit code 0
I could invest time to find why those 5 tests now are failing, to then tackle the mixed support.... Or just build on my approach and only tackle the mixed support. Anyway, I'm here to learn, let me know what's the best approach and I'll follow. Thanks.
@@ -447,7 +452,6 @@ def wrapper(arr, mask, limit=None): | |||
|
|||
|
|||
def pad_1d(values, limit=None, mask=None, dtype=None): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
normally don't like to edit think not-associated with the PR (e.g. you may have some editor setting which change this)...no big deal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok... Sorry for that... I'm using IntelliJ IDEA, and it formatted all file with PEP8 standard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no problem. we dont' quite follow PEP8 (as flake8 doesn't actually)......
pandas/tests/series/test_replace.py
Outdated
@@ -227,3 +226,10 @@ def test_replace_with_empty_dictlike(self): | |||
s = pd.Series(list('abcd')) | |||
tm.assert_series_equal(s, s.replace(dict())) | |||
tm.assert_series_equal(s, s.replace(pd.Series([]))) | |||
|
|||
def test_replace_string_with_nan(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you test this with unicode as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
thanks @ucals I pushed a more generalized soln to your branch. This is actually a big area have been meaning to fix. There are quite a lot of subtleties w.r.t. numpy (and pandas) coercions. |
ok ping on green. (note if you feel up to adding more test cases to |
thanks @ucals as I said if you want to add some followup tests, pls do. |
closes pandas-dev#15743 Author: Carlos Souza <carlos@udacity.com> Author: Jeff Reback <jeff@reback.net> Closes pandas-dev#15812 from ucals/bug-fix-15743 and squashes the following commits: e6e4971 [Carlos Souza] Adding replace unicode with number and replace mixed types with string tests bd31b2b [Carlos Souza] Resolving merge conflict by incorporating @jreback suggestions 73805ce [Jeff Reback] CLN: add infer_dtype_from_array 45e67e4 [Carlos Souza] Fixing PEP8 line indent 0a98557 [Carlos Souza] BUG: replace of numeric by string fixed 97e1f18 [Carlos Souza] Test e62763c [Carlos Souza] Fixing PEP8 line indent 080c71e [Carlos Souza] BUG: replace of numeric by string fixed 8b463cb [Carlos Souza] Merge remote-tracking branch 'upstream/master' 9fc617b [Carlos Souza] Merge remote-tracking branch 'upstream/master' e12bca7 [Carlos Souza] Sync fork 676a4e5 [Carlos Souza] Test
git diff upstream/master --name-only -- '*.py' | flake8 --diff