Skip to content

BUG: replace of numeric by string / dtype coversion (GH15743) #15812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

ucals
Copy link

@ucals ucals commented Mar 27, 2017

@codecov
Copy link

codecov bot commented Mar 27, 2017

Codecov Report

Merging #15812 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15812      +/-   ##
==========================================
- Coverage   90.99%   90.97%   -0.02%     
==========================================
  Files         143      143              
  Lines       49403    49418      +15     
==========================================
+ Hits        44956    44960       +4     
- Misses       4447     4458      +11
Impacted Files Coverage Δ
pandas/core/missing.py 84.27% <100%> (-0.63%) ⬇️
pandas/types/cast.py 85.6% <100%> (+0.28%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.56% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1dab800...e6e4971. Read the comment docs.

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions labels Mar 27, 2017
@@ -985,3 +985,5 @@ Bug Fixes
- Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`)
- Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`)
- Bug in ``pd.read_msgpack`` which did not allow to load dataframe with an index of type ``CategoricalIndex`` (:issue:`15487`)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI for the future if you put this somewhere in the Bug Fixes section , rather than the end you won't have merge conflicts. (we have blank lines for this purpose)

if not isinstance(values_to_mask, (list, np.ndarray)):
if isinstance(values_to_mask, np.ndarray):
mask_type = values_to_mask.dtype.type
elif isinstance(values_to_mask, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can change this entire test to:

# import at top if its not
from pandas._libs.lib import infer_dtype
....
inferred = infer_dtype(values_to_mask)
if inferred in ['string', 'unicode']:
    mask_type = np.object
else:
    mask_type = np.asarray(values_to_mask).dtype

I think this will work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may need to include 'mixed' here as well, and tests this too:

mixed is [1, '1']

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change only to simplify? Or is this change a must do? I ask before I implemented and it broke all tests. I tried to investigate why, didn't understand yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what did this break?

yes, testing the first value is wrong (as it could also be 0-len), further it might have mixed values anyhow.

show me a test that broke?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could build on what I wrote and just add the mixed support. Anyway, following your approach, the beginning of the function is this:

def mask_missing(arr, values_to_mask):
    """
    Return a masking array of same size/shape as arr
    with entries equaling any member of values_to_mask set to True
    """
    inferred = infer_dtype(values_to_mask)
    if inferred in ['string', 'unicode']:
        mask_type = np.object
    else:
        mask_type = np.asarray(values_to_mask).dtype

    if not isinstance(values_to_mask, (list, np.ndarray)):
        values_to_mask = [values_to_mask]

    try:
        values_to_mask = np.array(values_to_mask, dtype=mask_type)
    except Exception:
        values_to_mask = np.array(values_to_mask, dtype=object)
...

This breaks the following tests:
image

Here's the output:

/Users/carlos/anaconda/envs/pandas_dev/bin/python3.6 "/Users/carlos/Library/Application Support/IntelliJIdea2017.1/python/helpers/pycharm/_jb_pytest_runner.py" --path /Users/carlos/Dropbox/opensource/pandas-ucals/pandas/tests/series/test_replace.py
Testing started at 21:32 ...
 Launching py.test with arguments /Users/carlos/Dropbox/opensource/pandas-ucals/pandas/tests/series/test_replace.py
============================= test session starts ==============================
platform darwin -- Python 3.6.0, pytest-3.0.7, py-1.4.32, pluggy-0.4.0
rootdir: /Users/carlos/Dropbox/opensource/pandas-ucals, inifile: setup.cfg
plugins: cov-2.3.1
collected 11 items
 
pandas/tests/series/test_replace.py       F 
pandas/tests/series/test_replace.py:12 (TestSeriesReplace.test_replace)
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace>

    def test_replace(self):
        N = 100
        ser = pd.Series(np.random.randn(N))
        ser[0:4] = np.nan
        ser[6:10] = 0
    
        # replace list with a single value
        ser.replace([np.nan], -1, inplace=True)
    
        exp = ser.fillna(-1)
        tm.assert_series_equal(ser, exp)
    
        rs = ser.replace(0., np.nan)
        ser[ser == 0.] = np.nan
>       tm.assert_series_equal(rs, ser)

pandas/tests/series/test_replace.py:27: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/util/testing.py:1215: in assert_series_equal
    obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
    ???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'Series', message = 'Series values are different (4.0 %)'
left = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]'
right = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]'
diff = None

    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
    
        msg = """{0} are different
    
    {1}
    [left]:  {2}
    [right]: {3}""".format(obj, message, left, right)
    
        if diff is not None:
            msg = msg + "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: Series are different
E       
E       Series values are different (4.0 %)
E       [left]:  [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]
E       [right]: [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]

pandas/util/testing.py:1053: AssertionError
F 
pandas/tests/series/test_replace.py:189 (TestSeriesReplace.test_replace2)
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace2>

    def test_replace2(self):
        N = 100
        ser = pd.Series(np.fabs(np.random.randn(N)), tm.makeDateIndex(N),
                        dtype=object)
        ser[:5] = np.nan
        ser[6:10] = 'foo'
        ser[20:30] = 'bar'
    
        # replace list with a single value
        rs = ser.replace([np.nan, 'foo', 'bar'], -1)
    
>       self.assertTrue((rs[:5] == -1).all())
E       AssertionError: False is not true

pandas/tests/series/test_replace.py:201: AssertionError
F 
pandas/tests/series/test_replace.py:178 (TestSeriesReplace.test_replace_bool_with_bool)
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_bool>

    def test_replace_bool_with_bool(self):
        s = pd.Series([True, False, True])
        result = s.replace(True, False)
        expected = pd.Series([False] * len(s))
>       tm.assert_series_equal(expected, result)

pandas/tests/series/test_replace.py:183: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/util/testing.py:1215: in assert_series_equal
    obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
    ???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'Series', message = 'Series values are different (66.66667 %)'
left = '[False, False, False]', right = '[True, False, True]', diff = None

    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
    
        msg = """{0} are different
    
    {1}
    [left]:  {2}
    [right]: {3}""".format(obj, message, left, right)
    
        if diff is not None:
            msg = msg + "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: Series are different
E       
E       Series values are different (66.66667 %)
E       [left]:  [False, False, False]
E       [right]: [True, False, True]

pandas/util/testing.py:1053: AssertionError
F 
pandas/tests/series/test_replace.py:171 (TestSeriesReplace.test_replace_bool_with_string)
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_string>

    def test_replace_bool_with_string(self):
        # nonexistent elements
        s = pd.Series([True, False, True])
        result = s.replace(True, '2u')
        expected = pd.Series(['2u', False, '2u'])
>       tm.assert_series_equal(expected, result)

pandas/tests/series/test_replace.py:177: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/util/testing.py:1188: in assert_series_equal
    assert_attr_equal('dtype', left, right)
pandas/util/testing.py:918: in assert_attr_equal
    left_attr, right_attr)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'Attributes', message = 'Attribute "dtype" are different'
left = dtype('O'), right = dtype('bool'), diff = None

    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
    
        msg = """{0} are different
    
    {1}
    [left]:  {2}
    [right]: {3}""".format(obj, message, left, right)
    
        if diff is not None:
            msg = msg + "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: Attributes are different
E       
E       Attribute "dtype" are different
E       [left]:  object
E       [right]: bool

pandas/util/testing.py:1053: AssertionError
. . F 
pandas/tests/series/test_replace.py:123 (TestSeriesReplace.test_replace_mixed_types)
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_mixed_types>

    def test_replace_mixed_types(self):
        s = pd.Series(np.arange(5), dtype='int64')
    
        def check_replace(to_rep, val, expected):
            sc = s.copy()
            r = s.replace(to_rep, val)
            sc.replace(to_rep, val, inplace=True)
            tm.assert_series_equal(expected, r)
            tm.assert_series_equal(expected, sc)
    
        # MUST upcast to float
        e = pd.Series([0., 1., 2., 3., 4.])
        tr, v = [3], [3.0]
        check_replace(tr, v, e)
    
        # MUST upcast to float
        e = pd.Series([0, 1, 2, 3.5, 4])
        tr, v = [3], [3.5]
        check_replace(tr, v, e)
    
        # casts to object
        e = pd.Series([0, 1, 2, 3.5, 'a'])
        tr, v = [3, 4], [3.5, 'a']
        check_replace(tr, v, e)
    
        # again casts to object
        e = pd.Series([0, 1, 2, 3.5, pd.Timestamp('20130101')])
        tr, v = [3, 4], [3.5, pd.Timestamp('20130101')]
        check_replace(tr, v, e)
    
        # casts to object
        e = pd.Series([0, 1, 2, 3.5, True], dtype='object')
        tr, v = [3, 4], [3.5, True]
        check_replace(tr, v, e)
    
        # test an object with dates + floats + integers + strings
        dr = pd.date_range('1/1/2001', '1/10/2001',
                           freq='D').to_series().reset_index(drop=True)
        result = dr.astype(object).replace(
            [dr[0], dr[1], dr[2]], [1.0, 2, 'a'])
        expected = pd.Series([1.0, 2, 'a'] + dr[3:].tolist(), dtype=object)
>       tm.assert_series_equal(result, expected)

pandas/tests/series/test_replace.py:165: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/util/testing.py:1215: in assert_series_equal
    obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
    ???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'Series', message = 'Series values are different (30.0 %)'
left = '[2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]'
right = '[1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]'
diff = None

    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
    
        msg = """{0} are different
    
    {1}
    [left]:  {2}
    [right]: {3}""".format(obj, message, left, right)
    
        if diff is not None:
            msg = msg + "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: Series are different
E       
E       Series values are different (30.0 %)
E       [left]:  [2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]
E       [right]: [1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]

pandas/util/testing.py:1053: AssertionError
. . . .                

=================================== FAILURES ===================================
________________________ TestSeriesReplace.test_replace ________________________

self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace>

    def test_replace(self):
        N = 100
        ser = pd.Series(np.random.randn(N))
        ser[0:4] = np.nan
        ser[6:10] = 0
    
        # replace list with a single value
        ser.replace([np.nan], -1, inplace=True)
    
        exp = ser.fillna(-1)
        tm.assert_series_equal(ser, exp)
    
        rs = ser.replace(0., np.nan)
        ser[ser == 0.] = np.nan
>       tm.assert_series_equal(rs, ser)

pandas/tests/series/test_replace.py:27: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/util/testing.py:1215: in assert_series_equal
    obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
    ???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'Series', message = 'Series values are different (4.0 %)'
left = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]'
right = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]'
diff = None

    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
    
        msg = """{0} are different
    
    {1}
    [left]:  {2}
    [right]: {3}""".format(obj, message, left, right)
    
        if diff is not None:
            msg = msg + "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: Series are different
E       
E       Series values are different (4.0 %)
E       [left]:  [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]
E       [right]: [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]

pandas/util/testing.py:1053: AssertionError
_______________________ TestSeriesReplace.test_replace2 ________________________

self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace2>

    def test_replace2(self):
        N = 100
        ser = pd.Series(np.fabs(np.random.randn(N)), tm.makeDateIndex(N),
                        dtype=object)
        ser[:5] = np.nan
        ser[6:10] = 'foo'
        ser[20:30] = 'bar'
    
        # replace list with a single value
        rs = ser.replace([np.nan, 'foo', 'bar'], -1)
    
>       self.assertTrue((rs[:5] == -1).all())
E       AssertionError: False is not true

pandas/tests/series/test_replace.py:201: AssertionError
________________ TestSeriesReplace.test_replace_bool_with_bool _________________

self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_bool>

    def test_replace_bool_with_bool(self):
        s = pd.Series([True, False, True])
        result = s.replace(True, False)
        expected = pd.Series([False] * len(s))
>       tm.assert_series_equal(expected, result)

pandas/tests/series/test_replace.py:183: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/util/testing.py:1215: in assert_series_equal
    obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
    ???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'Series', message = 'Series values are different (66.66667 %)'
left = '[False, False, False]', right = '[True, False, True]', diff = None

    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
    
        msg = """{0} are different
    
    {1}
    [left]:  {2}
    [right]: {3}""".format(obj, message, left, right)
    
        if diff is not None:
            msg = msg + "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: Series are different
E       
E       Series values are different (66.66667 %)
E       [left]:  [False, False, False]
E       [right]: [True, False, True]

pandas/util/testing.py:1053: AssertionError
_______________ TestSeriesReplace.test_replace_bool_with_string ________________

self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_string>

    def test_replace_bool_with_string(self):
        # nonexistent elements
        s = pd.Series([True, False, True])
        result = s.replace(True, '2u')
        expected = pd.Series(['2u', False, '2u'])
>       tm.assert_series_equal(expected, result)

pandas/tests/series/test_replace.py:177: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/util/testing.py:1188: in assert_series_equal
    assert_attr_equal('dtype', left, right)
pandas/util/testing.py:918: in assert_attr_equal
    left_attr, right_attr)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'Attributes', message = 'Attribute "dtype" are different'
left = dtype('O'), right = dtype('bool'), diff = None

    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
    
        msg = """{0} are different
    
    {1}
    [left]:  {2}
    [right]: {3}""".format(obj, message, left, right)
    
        if diff is not None:
            msg = msg + "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: Attributes are different
E       
E       Attribute "dtype" are different
E       [left]:  object
E       [right]: bool

pandas/util/testing.py:1053: AssertionError
__________________ TestSeriesReplace.test_replace_mixed_types __________________

self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_mixed_types>

    def test_replace_mixed_types(self):
        s = pd.Series(np.arange(5), dtype='int64')
    
        def check_replace(to_rep, val, expected):
            sc = s.copy()
            r = s.replace(to_rep, val)
            sc.replace(to_rep, val, inplace=True)
            tm.assert_series_equal(expected, r)
            tm.assert_series_equal(expected, sc)
    
        # MUST upcast to float
        e = pd.Series([0., 1., 2., 3., 4.])
        tr, v = [3], [3.0]
        check_replace(tr, v, e)
    
        # MUST upcast to float
        e = pd.Series([0, 1, 2, 3.5, 4])
        tr, v = [3], [3.5]
        check_replace(tr, v, e)
    
        # casts to object
        e = pd.Series([0, 1, 2, 3.5, 'a'])
        tr, v = [3, 4], [3.5, 'a']
        check_replace(tr, v, e)
    
        # again casts to object
        e = pd.Series([0, 1, 2, 3.5, pd.Timestamp('20130101')])
        tr, v = [3, 4], [3.5, pd.Timestamp('20130101')]
        check_replace(tr, v, e)
    
        # casts to object
        e = pd.Series([0, 1, 2, 3.5, True], dtype='object')
        tr, v = [3, 4], [3.5, True]
        check_replace(tr, v, e)
    
        # test an object with dates + floats + integers + strings
        dr = pd.date_range('1/1/2001', '1/10/2001',
                           freq='D').to_series().reset_index(drop=True)
        result = dr.astype(object).replace(
            [dr[0], dr[1], dr[2]], [1.0, 2, 'a'])
        expected = pd.Series([1.0, 2, 'a'] + dr[3:].tolist(), dtype=object)
>       tm.assert_series_equal(result, expected)

pandas/tests/series/test_replace.py:165: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/util/testing.py:1215: in assert_series_equal
    obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
    ???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'Series', message = 'Series values are different (30.0 %)'
left = '[2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]'
right = '[1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]'
diff = None

    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
    
        msg = """{0} are different
    
    {1}
    [left]:  {2}
    [right]: {3}""".format(obj, message, left, right)
    
        if diff is not None:
            msg = msg + "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: Series are different
E       
E       Series values are different (30.0 %)
E       [left]:  [2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]
E       [right]: [1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]

pandas/util/testing.py:1053: AssertionError
====================== 5 failed, 6 passed in 0.54 seconds ======================
     
Process finished with exit code 0

I could invest time to find why those 5 tests now are failing, to then tackle the mixed support.... Or just build on my approach and only tackle the mixed support. Anyway, I'm here to learn, let me know what's the best approach and I'll follow. Thanks.

@@ -447,7 +452,6 @@ def wrapper(arr, mask, limit=None):


def pad_1d(values, limit=None, mask=None, dtype=None):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normally don't like to edit think not-associated with the PR (e.g. you may have some editor setting which change this)...no big deal

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok... Sorry for that... I'm using IntelliJ IDEA, and it formatted all file with PEP8 standard

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem. we dont' quite follow PEP8 (as flake8 doesn't actually)......

@@ -227,3 +226,10 @@ def test_replace_with_empty_dictlike(self):
s = pd.Series(list('abcd'))
tm.assert_series_equal(s, s.replace(dict()))
tm.assert_series_equal(s, s.replace(pd.Series([])))

def test_replace_string_with_nan(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test this with unicode as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@jreback
Copy link
Contributor

jreback commented Mar 28, 2017

thanks @ucals I pushed a more generalized soln to your branch.

This is actually a big area have been meaning to fix. There are quite a lot of subtleties w.r.t. numpy (and pandas) coercions.

@jreback
Copy link
Contributor

jreback commented Mar 28, 2017

ok ping on green. (note if you feel up to adding more test cases to pandas/tests/types/test_cast.py go for it (could also be a follow).

@jreback jreback added this to the 0.20.0 milestone Mar 28, 2017
@jreback jreback closed this in 6f789e1 Mar 28, 2017
@jreback
Copy link
Contributor

jreback commented Mar 28, 2017

thanks @ucals

as I said if you want to add some followup tests, pls do.

@ucals ucals deleted the bug-fix-15743 branch March 28, 2017 18:49
mattip pushed a commit to mattip/pandas that referenced this pull request Apr 3, 2017
closes pandas-dev#15743

Author: Carlos Souza <carlos@udacity.com>
Author: Jeff Reback <jeff@reback.net>

Closes pandas-dev#15812 from ucals/bug-fix-15743 and squashes the following commits:

e6e4971 [Carlos Souza] Adding replace unicode with number and replace mixed types with string tests
bd31b2b [Carlos Souza] Resolving merge conflict by incorporating @jreback suggestions
73805ce [Jeff Reback] CLN: add infer_dtype_from_array
45e67e4 [Carlos Souza] Fixing PEP8 line indent
0a98557 [Carlos Souza] BUG: replace of numeric by string fixed
97e1f18 [Carlos Souza] Test
e62763c [Carlos Souza] Fixing PEP8 line indent
080c71e [Carlos Souza] BUG: replace of numeric by string fixed
8b463cb [Carlos Souza] Merge remote-tracking branch 'upstream/master'
9fc617b [Carlos Souza] Merge remote-tracking branch 'upstream/master'
e12bca7 [Carlos Souza] Sync fork
676a4e5 [Carlos Souza] Test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: replace of numeric by string / dtype coversion
2 participants