-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Release GIL for Merge #13745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@mrocklin do you have an example case of such an intensive merge where you do not see an speedup by parallellizing? On some examples I tried, I already do see some speedup (but probably can be improved). For example, with
I already see some speedup:
When I profile this merge operation (prof_merge3.out), the main operations that take time are (the number are for this specific example, but with others I get similar trends):
So from a first quick exploration, there are certainly some small improvements to be made, but seems the bigger ones are already done (but with further analysis quite possible that it can further be improved). |
OK. Let me come up with a few examples and get back to you. If as you say most of this is already done then I'll be quite happy to be incorrect here :) |
FYI: jreback@a295e83 this makes factorization about 30% faster and releases the gil in the core parts (but this currently breaks other stuff). |
But still, I only get a speed improvement of factor 1.5 on 4 cores, so it also not that impressive. |
@mrocklin I think to make this a truly parallel merge, you would need to change the problem a bit I think. e.g. partition across workers, replicate the Dataframe, then concat? |
@jreback yes, it could be that by operating on different dataframes we have less memory contention and would see larger speedups? @jorisvandenbossche I'm hearing two things:
This raises the question of fundamentally why isn't something closer to a 4x speedup possible? Is this a memory hierarchy bound operation? |
@jorisvandenbossche is your test with processes? or threads? |
|
Yes, I was using the I don't have much experience with this, but the fact that the GIL free operations are spread throughout the merge operation (the full merge operation separately releases the GIL in potentially 5 or 6 different algos), is this is a reason for overhead and less efficient use of multiple threads / less speedup? |
allows releasing the GIL on these dtypes xref pandas-dev#13745
allows releasing the GIL on these dtypes xref pandas-dev#13745
allows releasing the GIL on these dtypes xref pandas-dev#13745
allows releasing the GIL on these dtypes xref pandas-dev#13745
allows releasing the GIL on these dtypes xref pandas-dev#13745
allows releasing the GIL on these dtypes xref pandas-dev#13745
allows releasing the GIL on these dtypes xref pandas-dev#13745
xref #13745 provides a modest speedup for all string hashing. The key thing is, it will release the GIL on more operations where this is possible (mainly factorize). can be easily extended to value_counts() and .duplicated() (for strings) Author: Jeff Reback <jeff@reback.net> Closes #14859 from jreback/string and squashes the following commits: 98f46c2 [Jeff Reback] PERF: use StringHashTable for strings in factorizing
xref pandas-dev#13745 provides a modest speedup for all string hashing. The key thing is, it will release the GIL on more operations where this is possible (mainly factorize). can be easily extended to value_counts() and .duplicated() (for strings) Author: Jeff Reback <jeff@reback.net> Closes pandas-dev#14859 from jreback/string and squashes the following commits: 98f46c2 [Jeff Reback] PERF: use StringHashTable for strings in factorizing
I think that the title says it all. The
pd.merge
function can be compute intensive and can benefit (I think) from parallel computing.It does not appear to currently release the GIL. I can easily push my CPU to 100% but no higher when performing parallel joins.
The text was updated successfully, but these errors were encountered: