TST (string dtype): clean-up xpasssing tests with future string dtype #59323

jorisvandenbossche · 2024-07-26T11:24:25Z

Follow-up PR on #59320, doing some superficial clean-up of the tests (largely removing some xfails that were actually passing in the meantime)

xref #54792

WillAyd · 2024-07-26T21:50:57Z

pandas/tests/base/test_unique.py

    uval = "\ud83d"  # smiley emoji

-    obj = index_or_series([uval] * 2)
+    obj = index_or_series([uval] * 2, dtype=object)


Why is the solution here to add dtype=object? Shouldn't this just work naturally with the inferred string type?

This doesn't work with a string dtype because this test is about "bad unicode". And an actual string dtype cannot represent invalid unicode (at least when using pyarrow under the hood. I assume that our object-dtype based one will be able to hold it).

To keep the spirit of the test (ensure our unique implementation can work with bad unicode in object dtype), I made it explicitly used object dtype.

See also the "Invalid unicode input" section in #59328 (that issue I started yesterday to start record breaking changes / things that are no longer supported with the string dtype)

Makes sense - also makes me think how we can leverage a BinaryDtype in the future, though that is a different topic for a different day

WillAyd · 2024-07-26T21:51:30Z

pandas/tests/frame/methods/test_info.py

    df.columns = dtypes

-    df_with_object_index = DataFrame({"a": [1]}, index=["foo"])
+    df_with_object_index = DataFrame({"a": [1]}, index=Index(["foo"], dtype=object))


Similar comment on all of these - might be overlooking something simple but unsure why dtype=object is the solution

Because what we are testing here is that if you have object dtype, the full memory usage is not known (and you get this "+"):

In [1]: df = DataFrame({"a": ["a", "b"]}) In [2]: df.info() <class 'pandas.DataFrame'> RangeIndex: 2 entries, 0 to 1 Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 a 2 non-null object dtypes: object(1) memory usage: 148.0+ bytes In [3]: pd.options.future.infer_string = True In [4]: df = DataFrame({"a": ["a", "b"]}) In [5]: df.info() <class 'pandas.DataFrame'> RangeIndex: 2 entries, 0 to 1 Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 a 2 non-null string dtypes: string(1) memory usage: 150.0 bytes

So 148.0+ bytes vs 150.0 bytes.

Of course I could also update the expected result to be accurate instead of an estimate, but that would 1) complicate the test (since we still have to account for both current and future behaviour), and 2) we still need to test the case of object dtype explicitly anyway.

We should probably add a test specifically for string dtype, though, where we can assert that if you have a proper string dtype the memory is now always the full number and not a lower estimate.

Cool - yea would be nice to add a test for exact memory representation with the StringDtype + pyarrow, though can be done separately

WillAyd

Lgtm

jorisvandenbossche · 2024-07-27T15:13:52Z

The failures with python-dev / numpy-dev seem unrelated.

…#59323)

…pandas-dev#59323)

…#59323)

jorisvandenbossche added Testing pandas testing functions or related to the test suite Strings String extension data type and string data labels Jul 26, 2024

jorisvandenbossche requested a review from WillAyd as a code owner July 26, 2024 11:24

This was referenced Jul 26, 2024

TRACKER: new default String dtype (pyarrow-backed, numpy NaN semantics) #54792

Open

TST (string dtype): xfail all currently failing tests with future.infer_string #59329

Merged

TST / string dtype: clean-up xpasssing tests with future string dtype

dd174ef

jorisvandenbossche force-pushed the string-dtype-tests-initial-cleanup branch from b3b65a3 to dd174ef Compare July 26, 2024 18:17

jorisvandenbossche changed the title ~~TST / string dtype: clean-up xpasssing tests with future string dtype~~ TST (string dtype): clean-up xpasssing tests with future string dtype Jul 26, 2024

WillAyd requested changes Jul 26, 2024

View reviewed changes

xpass in categorical replace test

d67ea44

WillAyd approved these changes Jul 27, 2024

View reviewed changes

jorisvandenbossche merged commit 9b375be into pandas-dev:main Jul 27, 2024

jorisvandenbossche deleted the string-dtype-tests-initial-cleanup branch July 27, 2024 15:14

WillAyd pushed a commit that referenced this pull request Aug 13, 2024

TST (string dtype): clean-up xpasssing tests with future string dtype (…

332624b

…#59323)

WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 14, 2024

TST (string dtype): clean-up xpasssing tests with future string dtype (…

9523a7b

…pandas-dev#59323)

WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 15, 2024

TST (string dtype): clean-up xpasssing tests with future string dtype (…

a04274b

…pandas-dev#59323)

WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 15, 2024

TST (string dtype): clean-up xpasssing tests with future string dtype (…

a202688

…pandas-dev#59323)

WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 15, 2024

TST (string dtype): clean-up xpasssing tests with future string dtype (…

3600907

…pandas-dev#59323)

jorisvandenbossche added this to the 2.3 milestone Aug 20, 2024

WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024

TST (string dtype): clean-up xpasssing tests with future string dtype (…

c94c33a

…pandas-dev#59323)

WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 22, 2024

TST (string dtype): clean-up xpasssing tests with future string dtype (…

02df6b4

…pandas-dev#59323)

WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Aug 27, 2024

TST (string dtype): clean-up xpasssing tests with future string dtype (…

ab9d1db

…pandas-dev#59323)

WillAyd pushed a commit to WillAyd/pandas that referenced this pull request Sep 20, 2024

TST (string dtype): clean-up xpasssing tests with future string dtype (…

35aa9a2

…pandas-dev#59323)

jorisvandenbossche added a commit to WillAyd/pandas that referenced this pull request Oct 2, 2024

TST (string dtype): clean-up xpasssing tests with future string dtype (…

3827a02

…pandas-dev#59323)

jorisvandenbossche added a commit to WillAyd/pandas that referenced this pull request Oct 2, 2024

TST (string dtype): clean-up xpasssing tests with future string dtype (…

231f5ae

…pandas-dev#59323)

jorisvandenbossche added a commit to WillAyd/pandas that referenced this pull request Oct 3, 2024

TST (string dtype): clean-up xpasssing tests with future string dtype (…

74e9460

…pandas-dev#59323)

jorisvandenbossche added a commit to WillAyd/pandas that referenced this pull request Oct 7, 2024

TST (string dtype): clean-up xpasssing tests with future string dtype (…

06dbb7a

…pandas-dev#59323)

jorisvandenbossche added a commit that referenced this pull request Oct 9, 2024

TST (string dtype): clean-up xpasssing tests with future string dtype (…

f5ca683

…#59323)

jorisvandenbossche added the backported label Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TST (string dtype): clean-up xpasssing tests with future string dtype #59323

TST (string dtype): clean-up xpasssing tests with future string dtype #59323

Uh oh!

jorisvandenbossche commented Jul 26, 2024 •

edited

Loading

Uh oh!

WillAyd Jul 26, 2024

Uh oh!

jorisvandenbossche Jul 27, 2024

Uh oh!

WillAyd Jul 27, 2024

Uh oh!

WillAyd Jul 26, 2024

Uh oh!

jorisvandenbossche Jul 27, 2024

Uh oh!

WillAyd Jul 27, 2024

Uh oh!

WillAyd left a comment

Uh oh!

jorisvandenbossche commented Jul 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

TST (string dtype): clean-up xpasssing tests with future string dtype #59323

TST (string dtype): clean-up xpasssing tests with future string dtype #59323

Uh oh!

Conversation

jorisvandenbossche commented Jul 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WillAyd Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Jul 27, 2024

Choose a reason for hiding this comment

Uh oh!

WillAyd Jul 27, 2024

Choose a reason for hiding this comment

Uh oh!

WillAyd Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Jul 27, 2024

Choose a reason for hiding this comment

Uh oh!

WillAyd Jul 27, 2024

Choose a reason for hiding this comment

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Jul 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jorisvandenbossche commented Jul 26, 2024 •

edited

Loading