Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: Regression in to_csv for ea dtype categorical #47347

Merged
merged 9 commits into from
Jun 16, 2022

Conversation

phofl
Copy link
Member

@phofl phofl commented Jun 14, 2022

@phofl phofl added Regression Functionality that used to work in a prior pandas version IO CSV read_csv, to_csv labels Jun 14, 2022
@phofl phofl added this to the 1.4.3 milestone Jun 14, 2022
@simonjayhawkins simonjayhawkins mentioned this pull request Jun 15, 2022
4 tasks
)
df["a"] = df["a"].astype("category") # astype("object") does not raise an error
result = df.to_csv()
expected_rows = [",a", '0,"[2020-01-01, 2020-01-02]"']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has the default closed argument changed on main?

on 1.3.5 and 1.4.2, the expected would be [",a", '0,"(2020-01-01, 2020-01-02]"'] so this test would fail on backport?

print(pd.__version__)
i = pd.Interval(pd.Timestamp("2020-01-01"), pd.Timestamp("2020-01-02"))
print(repr(i))
print(i)
1.4.2
Interval('2020-01-01', '2020-01-02', closed='right')
(2020-01-01, 2020-01-02]
1.5.0.dev0+958.gf7be58a477
Interval('2020-01-01', '2020-01-02', inclusive='both')
[2020-01-01, 2020-01-02]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it. Inclusive was added which defaults to both, when close is not given. I could not find this in the release notes, so we should either add it or change back to right. Probably would have to deprecate first.

For this pr, I will add closed as a keyword and catch the deprecation warning for main, then we can adjust the test as a follow up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this pr, I will add closed as a keyword and catch the deprecation warning for main, then we can adjust the test as a follow up

sgtm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened #47365

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed here

df = DataFrame(
{"a": [pd.Interval(Timestamp("2020-01-01"), Timestamp("2020-01-02"))]}
)
df["a"] = df["a"].astype("category") # astype("object") does not raise an error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably can remove this comment that was in the issue OP

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, should have paid more attention…

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


def test_to_csv_categorical_and_interval(self):
# GH#46297
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we assert_produces_warning we will need to remove this on the backport. (which is fine imo)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah Right, this would raise without right?

then let’s change to inclusive immediately and change on the backport to right, this makes more sense imo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

res = cache.get(item)
# res = None
Copy link
Member

@simonjayhawkins simonjayhawkins Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting, no definitely not. They got mixed in somehow. Removed

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @phofl lgtm pending green(ish)

@simonjayhawkins simonjayhawkins merged commit 89578fe into pandas-dev:main Jun 16, 2022
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jun 16, 2022
@phofl phofl deleted the 46812 branch June 16, 2022 19:45
simonjayhawkins added a commit that referenced this pull request Jun 17, 2022
… dtype categorical) (#47388)

* Backport PR #47347: REGR: Regression in to_csv for ea dtype categorical

* inclusive -> closed

Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com>
Co-authored-by: Simon Hawkins <simonjayhawkins@gmail.com>
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Projects
None yet
2 participants