Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pd.Series.replace does not preserve the original dtype #33622

Closed

Conversation

kotamatsuoka
Copy link
Contributor

@kotamatsuoka kotamatsuoka commented Apr 18, 2020

@kotamatsuoka kotamatsuoka force-pushed the original-dtype-with-replace branch from 5f11fc1 to cc56667 Compare April 18, 2020 06:58
@kotamatsuoka kotamatsuoka force-pushed the original-dtype-with-replace branch from cc56667 to 9ef9f9e Compare April 18, 2020 08:25
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kotamatsuoka

Minor comments ahead of core-members' reviews

pandas/core/internals/blocks.py Outdated Show resolved Hide resolved
pandas/tests/series/methods/test_replace.py Show resolved Hide resolved
dtype, _ = infer_dtype_from(other, pandas_dtype=True)
if isinstance(other, str):
dtype = "string"
if is_dtype_equal(self.dtype, dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't hard-code things like this, instead you can try to teach infer_dtype_from to do this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry...I removed hard-code.

@kotamatsuoka kotamatsuoka requested a review from jreback April 21, 2020 16:42
@kotamatsuoka kotamatsuoka force-pushed the original-dtype-with-replace branch 2 times, most recently from 79ad94f to 7cbf0ea Compare April 22, 2020 12:58
@kotamatsuoka kotamatsuoka force-pushed the original-dtype-with-replace branch from 7cbf0ea to 078aca1 Compare April 22, 2020 13:40
dtype = np.object_
else:
# if we cannot then coerce to object
dtype, _ = infer_dtype_from(other, pandas_dtype=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion from @jreback here was to make infer_dtype_from return StringDtype when appropriate, so that L1090-L1013 here wouldnt be necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to move into infer_dtype_from(), but a lot of tests fail. So, I didn't move.

@kotamatsuoka kotamatsuoka force-pushed the original-dtype-with-replace branch from 41fb227 to c9f6d50 Compare April 23, 2020 12:37
@pep8speaks
Copy link

pep8speaks commented Apr 23, 2020

Hello @kotamatsuoka! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-06-06 10:18:53 UTC

@kotamatsuoka kotamatsuoka force-pushed the original-dtype-with-replace branch from c9f6d50 to 571ab8a Compare April 23, 2020 12:41
@kotamatsuoka kotamatsuoka force-pushed the original-dtype-with-replace branch from 6b2f09c to 0d244c2 Compare May 1, 2020 15:26
@kotamatsuoka kotamatsuoka force-pushed the original-dtype-with-replace branch from 0d244c2 to b227024 Compare May 1, 2020 15:42
pandas/core/internals/blocks.py Outdated Show resolved Hide resolved
@jreback jreback added replace replace method Strings String extension data type and string data Dtype Conversions Unexpected or buggy dtype conversions labels May 2, 2020
@kotamatsuoka kotamatsuoka force-pushed the original-dtype-with-replace branch 3 times, most recently from 2844c80 to 38a5c83 Compare May 3, 2020 06:11
@kotamatsuoka kotamatsuoka force-pushed the original-dtype-with-replace branch from 38a5c83 to a62c896 Compare May 3, 2020 10:50
@kotamatsuoka kotamatsuoka requested a review from jreback May 3, 2020 12:08
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also needs a whatsnew note, EA section, bug fixes for 1.1

if self.is_bool or is_object_dtype(dtype) or is_bool_dtype(dtype):
if (
is_extension_array_dtype(self.dtype)
and not is_categorical_dtype(self.dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just use is_extension_array_dtype here

[
(
pd.Series(["one", "two"], dtype="string"),
{"one": "1", "two": "2"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a bunch more tests which can hit the other EA types, e.g. Intervval, Period, Datetime w/tz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added test cases.

@kotamatsuoka kotamatsuoka force-pushed the original-dtype-with-replace branch 3 times, most recently from 6e507a4 to aa2f973 Compare June 6, 2020 09:04
@kotamatsuoka kotamatsuoka force-pushed the original-dtype-with-replace branch 2 times, most recently from 918b4e2 to a933ad8 Compare June 6, 2020 10:18
@kotamatsuoka kotamatsuoka force-pushed the original-dtype-with-replace branch from a933ad8 to b777345 Compare June 6, 2020 13:09
@MarcoGorelli
Copy link
Member

Hi @kotamatsuoka - is this still active? Looks like there's some test failures and linting errors (from black), if you fix them up we'll take another look

@github-actions github-actions bot added the Stale label Sep 23, 2020
@simonjayhawkins
Copy link
Member

@kotamatsuoka closing as stale. please comment if you want to continue and the PR will be reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions replace replace method Stale Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd.Series.replace(to_replace=...) does not preserve the original dtype of the series
6 participants