-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG fix _Unstacker int32 limit in dataframe sizes (pandas-dev#26314) #34827
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls always add a test and make sure it fails first before adding a path
how long does this take / how much memory.? you can mark tests as well. |
can u merge master and see if can get passing |
@KaonToPion closing as stale. please LMK if you want to continue and the PR will be reopened. |
@simonjayhawkins could you re-open it? I can take it again |
I am having trouble with the fix. This is the code
The issue right now is that "raise ValueError" generates troubles in all the other systems so there are two ways out of this:
My first commit tried the first option but trying to test it I find that the second option is probably the best option. Any opinion on this @jreback @simonjayhawkins ? |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
I have tested it with :
df = pd.DataFrame(np.random.randint(low=0, high=1500000, size=(90000, 2)), columns=['a', 'b'])
df.set_index(['a', 'b']).unstack()
I am not sure if I should add it as a test, it requires quite some memory. I am also hesitant about where the test should be located in case it's added.