BUG: replace of numeric by string / dtype coversion (GH15743) #15812

ucals · 2017-03-27T02:34:46Z

closes BUG: replace of numeric by string / dtype coversion #15743
test added / passed
passes git diff upstream/master --name-only -- '*.py' | flake8 --diff
whatsnew entry

codecov · 2017-03-27T02:57:16Z

Codecov Report

Merging #15812 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15812      +/-   ##
==========================================
- Coverage   90.99%   90.97%   -0.02%     
==========================================
  Files         143      143              
  Lines       49403    49418      +15     
==========================================
+ Hits        44956    44960       +4     
- Misses       4447     4458      +11

Impacted Files	Coverage Δ
pandas/core/missing.py	`84.27% <100%> (-0.63%)`	⬇️
pandas/types/cast.py	`85.6% <100%> (+0.28%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.56% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1dab800...e6e4971. Read the comment docs.

jreback · 2017-03-27T12:08:45Z

doc/source/whatsnew/v0.20.0.txt

@@ -985,3 +985,5 @@ Bug Fixes
 - Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`)
 - Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`)
 - Bug in ``pd.read_msgpack`` which did not allow to load dataframe with an index of type ``CategoricalIndex`` (:issue:`15487`)
+


FYI for the future if you put this somewhere in the Bug Fixes section , rather than the end you won't have merge conflicts. (we have blank lines for this purpose)

jreback · 2017-03-27T12:15:27Z

pandas/core/missing.py

-    if not isinstance(values_to_mask, (list, np.ndarray)):
+    if isinstance(values_to_mask, np.ndarray):
+        mask_type = values_to_mask.dtype.type
+    elif isinstance(values_to_mask, list):


you can change this entire test to:

# import at top if its not from pandas._libs.lib import infer_dtype .... inferred = infer_dtype(values_to_mask) if inferred in ['string', 'unicode']: mask_type = np.object else: mask_type = np.asarray(values_to_mask).dtype

I think this will work.

may need to include 'mixed' here as well, and tests this too:

mixed is [1, '1']

Is this change only to simplify? Or is this change a must do? I ask before I implemented and it broke all tests. I tried to investigate why, didn't understand yet.

what did this break?

yes, testing the first value is wrong (as it could also be 0-len), further it might have mixed values anyhow.

show me a test that broke?

We could build on what I wrote and just add the mixed support. Anyway, following your approach, the beginning of the function is this:

def mask_missing(arr, values_to_mask): """ Return a masking array of same size/shape as arr with entries equaling any member of values_to_mask set to True """ inferred = infer_dtype(values_to_mask) if inferred in ['string', 'unicode']: mask_type = np.object else: mask_type = np.asarray(values_to_mask).dtype if not isinstance(values_to_mask, (list, np.ndarray)): values_to_mask = [values_to_mask] try: values_to_mask = np.array(values_to_mask, dtype=mask_type) except Exception: values_to_mask = np.array(values_to_mask, dtype=object) ...

This breaks the following tests:

Here's the output:

/Users/carlos/anaconda/envs/pandas_dev/bin/python3.6 "/Users/carlos/Library/Application Support/IntelliJIdea2017.1/python/helpers/pycharm/_jb_pytest_runner.py" --path /Users/carlos/Dropbox/opensource/pandas-ucals/pandas/tests/series/test_replace.py Testing started at 21:32 ... Launching py.test with arguments /Users/carlos/Dropbox/opensource/pandas-ucals/pandas/tests/series/test_replace.py ============================= test session starts ============================== platform darwin -- Python 3.6.0, pytest-3.0.7, py-1.4.32, pluggy-0.4.0 rootdir: /Users/carlos/Dropbox/opensource/pandas-ucals, inifile: setup.cfg plugins: cov-2.3.1 collected 11 items pandas/tests/series/test_replace.py F pandas/tests/series/test_replace.py:12 (TestSeriesReplace.test_replace) self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace> def test_replace(self): N = 100 ser = pd.Series(np.random.randn(N)) ser[0:4] = np.nan ser[6:10] = 0 # replace list with a single value ser.replace([np.nan], -1, inplace=True) exp = ser.fillna(-1) tm.assert_series_equal(ser, exp) rs = ser.replace(0., np.nan) ser[ser == 0.] = np.nan > tm.assert_series_equal(rs, ser) pandas/tests/series/test_replace.py:27: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/util/testing.py:1215: in assert_series_equal obj='{0}'.format(obj)) ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156) ??? ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274) ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'Series', message = 'Series values are different (4.0 %)' left = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]' right = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]' diff = None def raise_assert_detail(obj, message, left, right, diff=None): if isinstance(left, np.ndarray): left = pprint_thing(left) if isinstance(right, np.ndarray): right = pprint_thing(right) msg = """{0} are different {1} [left]: {2} [right]: {3}""".format(obj, message, left, right) if diff is not None: msg = msg + "\n[diff]: {diff}".format(diff=diff) > raise AssertionError(msg) E AssertionError: Series are different E E Series values are different (4.0 %) E [left]: [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557] E [right]: [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557] pandas/util/testing.py:1053: AssertionError F pandas/tests/series/test_replace.py:189 (TestSeriesReplace.test_replace2) self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace2> def test_replace2(self): N = 100 ser = pd.Series(np.fabs(np.random.randn(N)), tm.makeDateIndex(N), dtype=object) ser[:5] = np.nan ser[6:10] = 'foo' ser[20:30] = 'bar' # replace list with a single value rs = ser.replace([np.nan, 'foo', 'bar'], -1) > self.assertTrue((rs[:5] == -1).all()) E AssertionError: False is not true pandas/tests/series/test_replace.py:201: AssertionError F pandas/tests/series/test_replace.py:178 (TestSeriesReplace.test_replace_bool_with_bool) self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_bool> def test_replace_bool_with_bool(self): s = pd.Series([True, False, True]) result = s.replace(True, False) expected = pd.Series([False] * len(s)) > tm.assert_series_equal(expected, result) pandas/tests/series/test_replace.py:183: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/util/testing.py:1215: in assert_series_equal obj='{0}'.format(obj)) ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156) ??? ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274) ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'Series', message = 'Series values are different (66.66667 %)' left = '[False, False, False]', right = '[True, False, True]', diff = None def raise_assert_detail(obj, message, left, right, diff=None): if isinstance(left, np.ndarray): left = pprint_thing(left) if isinstance(right, np.ndarray): right = pprint_thing(right) msg = """{0} are different {1} [left]: {2} [right]: {3}""".format(obj, message, left, right) if diff is not None: msg = msg + "\n[diff]: {diff}".format(diff=diff) > raise AssertionError(msg) E AssertionError: Series are different E E Series values are different (66.66667 %) E [left]: [False, False, False] E [right]: [True, False, True] pandas/util/testing.py:1053: AssertionError F pandas/tests/series/test_replace.py:171 (TestSeriesReplace.test_replace_bool_with_string) self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_string> def test_replace_bool_with_string(self): # nonexistent elements s = pd.Series([True, False, True]) result = s.replace(True, '2u') expected = pd.Series(['2u', False, '2u']) > tm.assert_series_equal(expected, result) pandas/tests/series/test_replace.py:177: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/util/testing.py:1188: in assert_series_equal assert_attr_equal('dtype', left, right) pandas/util/testing.py:918: in assert_attr_equal left_attr, right_attr) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'Attributes', message = 'Attribute "dtype" are different' left = dtype('O'), right = dtype('bool'), diff = None def raise_assert_detail(obj, message, left, right, diff=None): if isinstance(left, np.ndarray): left = pprint_thing(left) if isinstance(right, np.ndarray): right = pprint_thing(right) msg = """{0} are different {1} [left]: {2} [right]: {3}""".format(obj, message, left, right) if diff is not None: msg = msg + "\n[diff]: {diff}".format(diff=diff) > raise AssertionError(msg) E AssertionError: Attributes are different E E Attribute "dtype" are different E [left]: object E [right]: bool pandas/util/testing.py:1053: AssertionError . . F pandas/tests/series/test_replace.py:123 (TestSeriesReplace.test_replace_mixed_types) self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_mixed_types> def test_replace_mixed_types(self): s = pd.Series(np.arange(5), dtype='int64') def check_replace(to_rep, val, expected): sc = s.copy() r = s.replace(to_rep, val) sc.replace(to_rep, val, inplace=True) tm.assert_series_equal(expected, r) tm.assert_series_equal(expected, sc) # MUST upcast to float e = pd.Series([0., 1., 2., 3., 4.]) tr, v = [3], [3.0] check_replace(tr, v, e) # MUST upcast to float e = pd.Series([0, 1, 2, 3.5, 4]) tr, v = [3], [3.5] check_replace(tr, v, e) # casts to object e = pd.Series([0, 1, 2, 3.5, 'a']) tr, v = [3, 4], [3.5, 'a'] check_replace(tr, v, e) # again casts to object e = pd.Series([0, 1, 2, 3.5, pd.Timestamp('20130101')]) tr, v = [3, 4], [3.5, pd.Timestamp('20130101')] check_replace(tr, v, e) # casts to object e = pd.Series([0, 1, 2, 3.5, True], dtype='object') tr, v = [3, 4], [3.5, True] check_replace(tr, v, e) # test an object with dates + floats + integers + strings dr = pd.date_range('1/1/2001', '1/10/2001', freq='D').to_series().reset_index(drop=True) result = dr.astype(object).replace( [dr[0], dr[1], dr[2]], [1.0, 2, 'a']) expected = pd.Series([1.0, 2, 'a'] + dr[3:].tolist(), dtype=object) > tm.assert_series_equal(result, expected) pandas/tests/series/test_replace.py:165: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/util/testing.py:1215: in assert_series_equal obj='{0}'.format(obj)) ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156) ??? ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274) ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'Series', message = 'Series values are different (30.0 %)' left = '[2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]' right = '[1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]' diff = None def raise_assert_detail(obj, message, left, right, diff=None): if isinstance(left, np.ndarray): left = pprint_thing(left) if isinstance(right, np.ndarray): right = pprint_thing(right) msg = """{0} are different {1} [left]: {2} [right]: {3}""".format(obj, message, left, right) if diff is not None: msg = msg + "\n[diff]: {diff}".format(diff=diff) > raise AssertionError(msg) E AssertionError: Series are different E E Series values are different (30.0 %) E [left]: [2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00] E [right]: [1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00] pandas/util/testing.py:1053: AssertionError . . . . =================================== FAILURES =================================== ________________________ TestSeriesReplace.test_replace ________________________ self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace> def test_replace(self): N = 100 ser = pd.Series(np.random.randn(N)) ser[0:4] = np.nan ser[6:10] = 0 # replace list with a single value ser.replace([np.nan], -1, inplace=True) exp = ser.fillna(-1) tm.assert_series_equal(ser, exp) rs = ser.replace(0., np.nan) ser[ser == 0.] = np.nan > tm.assert_series_equal(rs, ser) pandas/tests/series/test_replace.py:27: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/util/testing.py:1215: in assert_series_equal obj='{0}'.format(obj)) ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156) ??? ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274) ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'Series', message = 'Series values are different (4.0 %)' left = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]' right = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]' diff = None def raise_assert_detail(obj, message, left, right, diff=None): if isinstance(left, np.ndarray): left = pprint_thing(left) if isinstance(right, np.ndarray): right = pprint_thing(right) msg = """{0} are different {1} [left]: {2} [right]: {3}""".format(obj, message, left, right) if diff is not None: msg = msg + "\n[diff]: {diff}".format(diff=diff) > raise AssertionError(msg) E AssertionError: Series are different E E Series values are different (4.0 %) E [left]: [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557] E [right]: [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557] pandas/util/testing.py:1053: AssertionError _______________________ TestSeriesReplace.test_replace2 ________________________ self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace2> def test_replace2(self): N = 100 ser = pd.Series(np.fabs(np.random.randn(N)), tm.makeDateIndex(N), dtype=object) ser[:5] = np.nan ser[6:10] = 'foo' ser[20:30] = 'bar' # replace list with a single value rs = ser.replace([np.nan, 'foo', 'bar'], -1) > self.assertTrue((rs[:5] == -1).all()) E AssertionError: False is not true pandas/tests/series/test_replace.py:201: AssertionError ________________ TestSeriesReplace.test_replace_bool_with_bool _________________ self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_bool> def test_replace_bool_with_bool(self): s = pd.Series([True, False, True]) result = s.replace(True, False) expected = pd.Series([False] * len(s)) > tm.assert_series_equal(expected, result) pandas/tests/series/test_replace.py:183: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/util/testing.py:1215: in assert_series_equal obj='{0}'.format(obj)) ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156) ??? ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274) ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'Series', message = 'Series values are different (66.66667 %)' left = '[False, False, False]', right = '[True, False, True]', diff = None def raise_assert_detail(obj, message, left, right, diff=None): if isinstance(left, np.ndarray): left = pprint_thing(left) if isinstance(right, np.ndarray): right = pprint_thing(right) msg = """{0} are different {1} [left]: {2} [right]: {3}""".format(obj, message, left, right) if diff is not None: msg = msg + "\n[diff]: {diff}".format(diff=diff) > raise AssertionError(msg) E AssertionError: Series are different E E Series values are different (66.66667 %) E [left]: [False, False, False] E [right]: [True, False, True] pandas/util/testing.py:1053: AssertionError _______________ TestSeriesReplace.test_replace_bool_with_string ________________ self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_string> def test_replace_bool_with_string(self): # nonexistent elements s = pd.Series([True, False, True]) result = s.replace(True, '2u') expected = pd.Series(['2u', False, '2u']) > tm.assert_series_equal(expected, result) pandas/tests/series/test_replace.py:177: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/util/testing.py:1188: in assert_series_equal assert_attr_equal('dtype', left, right) pandas/util/testing.py:918: in assert_attr_equal left_attr, right_attr) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'Attributes', message = 'Attribute "dtype" are different' left = dtype('O'), right = dtype('bool'), diff = None def raise_assert_detail(obj, message, left, right, diff=None): if isinstance(left, np.ndarray): left = pprint_thing(left) if isinstance(right, np.ndarray): right = pprint_thing(right) msg = """{0} are different {1} [left]: {2} [right]: {3}""".format(obj, message, left, right) if diff is not None: msg = msg + "\n[diff]: {diff}".format(diff=diff) > raise AssertionError(msg) E AssertionError: Attributes are different E E Attribute "dtype" are different E [left]: object E [right]: bool pandas/util/testing.py:1053: AssertionError __________________ TestSeriesReplace.test_replace_mixed_types __________________ self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_mixed_types> def test_replace_mixed_types(self): s = pd.Series(np.arange(5), dtype='int64') def check_replace(to_rep, val, expected): sc = s.copy() r = s.replace(to_rep, val) sc.replace(to_rep, val, inplace=True) tm.assert_series_equal(expected, r) tm.assert_series_equal(expected, sc) # MUST upcast to float e = pd.Series([0., 1., 2., 3., 4.]) tr, v = [3], [3.0] check_replace(tr, v, e) # MUST upcast to float e = pd.Series([0, 1, 2, 3.5, 4]) tr, v = [3], [3.5] check_replace(tr, v, e) # casts to object e = pd.Series([0, 1, 2, 3.5, 'a']) tr, v = [3, 4], [3.5, 'a'] check_replace(tr, v, e) # again casts to object e = pd.Series([0, 1, 2, 3.5, pd.Timestamp('20130101')]) tr, v = [3, 4], [3.5, pd.Timestamp('20130101')] check_replace(tr, v, e) # casts to object e = pd.Series([0, 1, 2, 3.5, True], dtype='object') tr, v = [3, 4], [3.5, True] check_replace(tr, v, e) # test an object with dates + floats + integers + strings dr = pd.date_range('1/1/2001', '1/10/2001', freq='D').to_series().reset_index(drop=True) result = dr.astype(object).replace( [dr[0], dr[1], dr[2]], [1.0, 2, 'a']) expected = pd.Series([1.0, 2, 'a'] + dr[3:].tolist(), dtype=object) > tm.assert_series_equal(result, expected) pandas/tests/series/test_replace.py:165: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/util/testing.py:1215: in assert_series_equal obj='{0}'.format(obj)) ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156) ??? ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274) ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'Series', message = 'Series values are different (30.0 %)' left = '[2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]' right = '[1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]' diff = None def raise_assert_detail(obj, message, left, right, diff=None): if isinstance(left, np.ndarray): left = pprint_thing(left) if isinstance(right, np.ndarray): right = pprint_thing(right) msg = """{0} are different {1} [left]: {2} [right]: {3}""".format(obj, message, left, right) if diff is not None: msg = msg + "\n[diff]: {diff}".format(diff=diff) > raise AssertionError(msg) E AssertionError: Series are different E E Series values are different (30.0 %) E [left]: [2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00] E [right]: [1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00] pandas/util/testing.py:1053: AssertionError ====================== 5 failed, 6 passed in 0.54 seconds ====================== Process finished with exit code 0

I could invest time to find why those 5 tests now are failing, to then tackle the mixed support.... Or just build on my approach and only tackle the mixed support. Anyway, I'm here to learn, let me know what's the best approach and I'll follow. Thanks.

jreback · 2017-03-27T12:16:04Z

pandas/core/missing.py

@@ -447,7 +452,6 @@ def wrapper(arr, mask, limit=None):


 def pad_1d(values, limit=None, mask=None, dtype=None):
-


normally don't like to edit think not-associated with the PR (e.g. you may have some editor setting which change this)...no big deal

Ok... Sorry for that... I'm using IntelliJ IDEA, and it formatted all file with PEP8 standard

no problem. we dont' quite follow PEP8 (as flake8 doesn't actually)......

jreback · 2017-03-27T12:16:21Z

pandas/tests/series/test_replace.py

@@ -227,3 +226,10 @@ def test_replace_with_empty_dictlike(self):
        s = pd.Series(list('abcd'))
        tm.assert_series_equal(s, s.replace(dict()))
        tm.assert_series_equal(s, s.replace(pd.Series([])))
+
+    def test_replace_string_with_nan(self):


can you test this with unicode as well

jreback · 2017-03-28T14:23:48Z

thanks @ucals I pushed a more generalized soln to your branch.

This is actually a big area have been meaning to fix. There are quite a lot of subtleties w.r.t. numpy (and pandas) coercions.

…g tests

jreback · 2017-03-28T16:52:02Z

ok ping on green. (note if you feel up to adding more test cases to pandas/tests/types/test_cast.py go for it (could also be a follow).

jreback · 2017-03-28T18:27:16Z

thanks @ucals

as I said if you want to add some followup tests, pls do.

@jreback

closes pandas-dev#15743 Author: Carlos Souza <carlos@udacity.com> Author: Jeff Reback <jeff@reback.net> Closes pandas-dev#15812 from ucals/bug-fix-15743 and squashes the following commits: e6e4971 [Carlos Souza] Adding replace unicode with number and replace mixed types with string tests bd31b2b [Carlos Souza] Resolving merge conflict by incorporating @jreback suggestions 73805ce [Jeff Reback] CLN: add infer_dtype_from_array 45e67e4 [Carlos Souza] Fixing PEP8 line indent 0a98557 [Carlos Souza] BUG: replace of numeric by string fixed 97e1f18 [Carlos Souza] Test e62763c [Carlos Souza] Fixing PEP8 line indent 080c71e [Carlos Souza] BUG: replace of numeric by string fixed 8b463cb [Carlos Souza] Merge remote-tracking branch 'upstream/master' 9fc617b [Carlos Souza] Merge remote-tracking branch 'upstream/master' e12bca7 [Carlos Souza] Sync fork 676a4e5 [Carlos Souza] Test

Carlos Souza added 6 commits March 20, 2017 19:32

Test

676a4e5

Sync fork

e12bca7

Merge remote-tracking branch 'upstream/master'

9fc617b

Merge remote-tracking branch 'upstream/master'

8b463cb

BUG: replace of numeric by string fixed

080c71e

Fixing PEP8 line indent

e62763c

jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions labels Mar 27, 2017

jreback requested changes Mar 27, 2017

View reviewed changes

Carlos Souza and others added 4 commits March 28, 2017 09:08

Test

97e1f18

BUG: replace of numeric by string fixed

0a98557

Fixing PEP8 line indent

45e67e4

CLN: add infer_dtype_from_array

73805ce

jreback force-pushed the bug-fix-15743 branch from e62763c to 73805ce Compare March 28, 2017 14:22

Carlos Souza added 2 commits March 28, 2017 13:30

Resolving merge conflict by incorporating @jreback suggestions

bd31b2b

Adding replace unicode with number and replace mixed types with strin…

e6e4971

…g tests

jreback approved these changes Mar 28, 2017

View reviewed changes

jreback added this to the 0.20.0 milestone Mar 28, 2017

jreback closed this in 6f789e1 Mar 28, 2017

ucals deleted the bug-fix-15743 branch March 28, 2017 18:49

		@@ -447,7 +452,6 @@ def wrapper(arr, mask, limit=None):


		def pad_1d(values, limit=None, mask=None, dtype=None):

Uh oh!

BUG: replace of numeric by string / dtype coversion (GH15743) #15812

BUG: replace of numeric by string / dtype coversion (GH15743) #15812

Uh oh!

Conversation

ucals commented Mar 27, 2017

Uh oh!

codecov bot commented Mar 27, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Mar 28, 2017

Uh oh!

jreback commented Mar 28, 2017

Uh oh!

jreback commented Mar 28, 2017

Uh oh!

Uh oh!

codecov bot commented Mar 27, 2017 •

edited

Loading