-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TYP: libgroupby int64->intp #40635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TYP: libgroupby int64->intp #40635
Conversation
Just out of curiosity what is the motivation behind this change? I think ideally would have been a better starting point, but feels a bit strange to go back from 64 bit integers to 32 bit integers on 32 bit platforms with some of these |
Hmm I see your response already here: Do we have tests that run on 32 bit platforms that use groups exceeding 2 ** 32 -1 in size? IIUC this might introduce a regression in that case, though I may also not be thinking through these sizes and how they are mapped in the entire ecosystem of things correctly |
IIUC a 32bit system should have no more than 4GB of memory, so shouldn't be able to create such a test |
32 bit systems can still represent 64 bit integers. The compiler will just have to generate extra instructions that makes it inefficient compared to using a 64 bit system for the same calculations, but it is still something you can do (much like performing complex128 math on a 64 bit system) https://stackoverflow.com/questions/28977587/is-it-ok-to-use-64bit-integers-in-a-32bit-application |
From some of these changes I don't think it's the ngroups actually that matters as much as the values being stored. Moving these from 64 down to 32 I think could introduce undefined behavior on those platforms |
Its not about being able to represent 64 bit integers. Its about how many groups you can possibly have. |
Oh OK I see that this is just labels that you've changed this for |
It wouldn't be a big deal to introduce an intp_t-HashTable, in a similar way to all other HashTables. But the question is, is it really needed? One could just use int64_t-HashTable on 64bit systems and int32_t-HashTable on 32bit system. That would have the advantage of so/pyd being smaller. If you think, that having a dedicated intp_t-HashTable is a cleaner way (which is probably true), I'll add it. |
thanks @jbrockmendel if you can do an asv since before you started moving things to intp to make sure things are about the same (maybe slightly better). |
i'll do that instead of turning on the heater this morning. i expect it to be swamped by unrelated changes in the interim |
This makes sense. Would that just be a matter of:
? |
@realead pushing this any further requires getting a HashTable for intp_t. Would that be difficult?