Replies: 2 comments 5 replies
-
I think the group by needs to keep in memory, hence the performance hit. The filtering step reduces the search space so that the multi column group by is manageable. |
Beta Was this translation helpful? Give feedback.
-
I am trying to pre-calculate the hash, so that groupby only needs to work on one column rather than ten, but am having challenges vectorising the following:
I get
I've looked into the vaex.hash code but couldn't seen an easy way to use it for this purpose. |
Beta Was this translation helpful? Give feedback.
-
Hi, I have a number of large datasets.
It appears to be much slower to group on two separate columns, than it is to do on the columns individually.
I am doing like in the Pandas example of
I saw the groupby code was added later, and perhaps I am pushing it further than is typically the case - as I didn't see any examples of this in the documentation.
Is there a better approach to tackling this problem?
Would converting to categorical improve performance?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions