Add `np.intc` to `_factorizers` in `pd.merge` #52478

hoxbro · 2023-04-06T07:06:48Z

closes BUG: pd.merge fail with numpy.intc on Windows #52451 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

phofl · 2023-04-06T08:36:01Z

Did this fail on main as well? Couldn't reproduce on main yesterday

hoxbro · 2023-04-06T08:40:56Z

Did not test on main. I don't have a Windows computer before next week.

Did you test on Windows? Because I can see on Linux np.intc == np.int32.

jbrockmendel · 2023-04-06T14:33:17Z

pandas/core/reshape/merge.py

@@ -109,6 +109,7 @@
    np.int64: libhashtable.Int64Factorizer,
    np.longlong: libhashtable.Int64Factorizer,
    np.int32: libhashtable.Int32Factorizer,
+    np.intc: libhashtable.Int32Factorizer,


is this going to be platform-dependent?

As far as I can see, an actual np.intc is only available on Windows, and for Linux (and probably also Mac OS) it is np.int32.

TBH, I did not know about np.intc before some tests started failing.

Do we have a generic IntFactorizer? It is a bit pedantic since I think most distributions will have an int be 32 bits, but that definitely is not a guarantee

iirc @seberg mentioned hoping to get rid of np.intc in numpy 2.0

I am not worried about intc, bu its system dependend, it doesn't have to be 32bit, though. cnp.int_t is what worries me a bit, because its long, but if we want 64bit on windows it won't match up with np.array([1]) anymore.

I assume this regression came as part of #49876
To avoid going down a rabbit hole I would be OK with just setting this specific case back to the Int64Factorizer. @phofl

github-actions · 2023-05-11T00:05:28Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

hoxbro · 2023-05-11T10:28:27Z

I accidentally force-pushed the main into this branch, which was why this PR was closed. I will open a new PR with the changes, though I will change Int32Factorizer to Int64Factorizer.

jbrockmendel reviewed Apr 6, 2023

View reviewed changes

mroeschke added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Apr 7, 2023

github-actions bot added the Stale label May 11, 2023

hoxbro closed this May 11, 2023

hoxbro force-pushed the merge_intc branch from ec2ca66 to 3827caf Compare May 11, 2023 08:52

hoxbro deleted the merge_intc branch May 11, 2023 09:03

hoxbro mentioned this pull request May 11, 2023

Add np.intc to _factorizers in pd.merge #53175

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add `np.intc` to `_factorizers` in `pd.merge` #52478

Add `np.intc` to `_factorizers` in `pd.merge` #52478

Uh oh!

hoxbro commented Apr 6, 2023 •

edited

Loading

Uh oh!

phofl commented Apr 6, 2023

Uh oh!

hoxbro commented Apr 6, 2023

Uh oh!

jbrockmendel Apr 6, 2023

Uh oh!

hoxbro Apr 6, 2023

Uh oh!

WillAyd Apr 7, 2023

Uh oh!

jbrockmendel Apr 7, 2023

Uh oh!

seberg Apr 9, 2023

Uh oh!

WillAyd Apr 10, 2023

Uh oh!

github-actions bot commented May 11, 2023

Uh oh!

hoxbro commented May 11, 2023

Uh oh!

Uh oh!

Uh oh!

Add np.intc to _factorizers in pd.merge #52478

Add np.intc to _factorizers in pd.merge #52478

Uh oh!

Conversation

hoxbro commented Apr 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phofl commented Apr 6, 2023

Uh oh!

hoxbro commented Apr 6, 2023

Uh oh!

jbrockmendel Apr 6, 2023

Choose a reason for hiding this comment

Uh oh!

hoxbro Apr 6, 2023

Choose a reason for hiding this comment

Uh oh!

WillAyd Apr 7, 2023

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Apr 7, 2023

Choose a reason for hiding this comment

Uh oh!

seberg Apr 9, 2023

Choose a reason for hiding this comment

Uh oh!

WillAyd Apr 10, 2023

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 11, 2023

Uh oh!

hoxbro commented May 11, 2023

Uh oh!

Uh oh!

Add `np.intc` to `_factorizers` in `pd.merge` #52478

Add `np.intc` to `_factorizers` in `pd.merge` #52478

hoxbro commented Apr 6, 2023 •

edited

Loading