Release GIL for Merge #13745

mrocklin · 2016-07-22T00:11:04Z

I think that the title says it all. The pd.merge function can be compute intensive and can benefit (I think) from parallel computing.

It does not appear to currently release the GIL. I can easily push my CPU to 100% but no higher when performing parallel joins.

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2016-12-02T14:19:05Z

@mrocklin do you have an example case of such an intensive merge where you do not see an speedup by parallellizing? On some examples I tried, I already do see some speedup (but probably can be improved).

For example, with

left = pd.DataFrame({'key': list(range(1,11)) * 100000})
right = pd.DataFrame({'key': range(10), 'val': range(10)})

I already see some speedup:

def f():
    left.merge(right, how='inner')
    
def g4():
    for i in range(4):
        f()
        
from pandas.util.testing import test_parallel

@test_parallel(num_threads=4)
def pg4():
    f()

In [21]: %timeit g4()
10 loops, best of 3: 149 ms per loop

In [22]: %timeit pg4()
10 loops, best of 3: 99.2 ms per loop

When I profile this merge operation (prof_merge3.out), the main operations that take time are (the number are for this specific example, but with others I get similar trends):

factorization (ca 36%) -> hastable Factorizer -> this is already releases the GIL where possible I think
the actual inner join (ca 31%)
- ca 2/3 of the time is spend in algos.groupsort_indexer -> this also already releases the GIL(code)
- the remaining logic in the _join.inner_join function itself -> this can further release the GIL, but I think is only ca 10% of overall time of merge operation
combining the results (ca 20%) -> comes down to mainly take_1d/2d algos -> these also already release the GIL to some extent (at least the 1d ones, 2d for some reason not)

So from a first quick exploration, there are certainly some small improvements to be made, but seems the bigger ones are already done (but with further analysis quite possible that it can further be improved).

mrocklin · 2016-12-02T14:29:43Z

OK. Let me come up with a few examples and get back to you. If as you say most of this is already done then I'll be quite happy to be incorrect here :)

jreback · 2016-12-02T14:35:36Z

FYI: jreback@a295e83

this makes factorization about 30% faster and releases the gil in the core parts (but this currently breaks other stuff).

jorisvandenbossche · 2016-12-02T14:37:23Z

But still, I only get a speed improvement of factor 1.5 on 4 cores, so it also not that impressive.

jreback · 2016-12-02T14:42:47Z

@mrocklin I think to make this a truly parallel merge, you would need to change the problem a bit I think. e.g. partition across workers, replicate the Dataframe, then concat?

mrocklin · 2016-12-02T15:15:31Z

@jreback yes, it could be that by operating on different dataframes we have less memory contention and would see larger speedups?

@jorisvandenbossche I'm hearing two things:

We can get about a 50% speedup on 4 cores
Most of the gains have already occurred

This raises the question of fundamentally why isn't something closer to a 4x speedup possible? Is this a memory hierarchy bound operation?

jreback · 2016-12-02T15:18:46Z

@jorisvandenbossche is your test with processes? or threads?

mrocklin · 2016-12-02T15:23:37Z

from pandas.util.testing import test_parallel

@test_parallel(num_threads=4)
def pg4():
    f()

jorisvandenbossche · 2016-12-02T15:39:18Z

Yes, I was using the test_parallel decorator, so was testing with threads.

I don't have much experience with this, but the fact that the GIL free operations are spread throughout the merge operation (the full merge operation separately releases the GIL in potentially 5 or 6 different algos), is this is a reason for overhead and less efficient use of multiple threads / less speedup?

allows releasing the GIL on these dtypes xref pandas-dev#13745

xref #13745 provides a modest speedup for all string hashing. The key thing is, it will release the GIL on more operations where this is possible (mainly factorize). can be easily extended to value_counts() and .duplicated() (for strings) Author: Jeff Reback <jeff@reback.net> Closes #14859 from jreback/string and squashes the following commits: 98f46c2 [Jeff Reback] PERF: use StringHashTable for strings in factorizing

xref pandas-dev#13745 provides a modest speedup for all string hashing. The key thing is, it will release the GIL on more operations where this is possible (mainly factorize). can be easily extended to value_counts() and .duplicated() (for strings) Author: Jeff Reback <jeff@reback.net> Closes pandas-dev#14859 from jreback/string and squashes the following commits: 98f46c2 [Jeff Reback] PERF: use StringHashTable for strings in factorizing

sinhrks added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jul 22, 2016

chris-b1 mentioned this issue Aug 6, 2016

CLN/PERF: remove ndarray.take and platform int conversions #13924

Closed

jreback added a commit to jreback/pandas that referenced this issue Dec 12, 2016

PERF: use StringHashTable for strings in factorizing

0102c3d

allows releasing the GIL on these dtypes xref pandas-dev#13745

jreback mentioned this issue Dec 12, 2016

PERF: use StringHasTable for strings #14859

Closed

jreback added a commit to jreback/pandas that referenced this issue Dec 12, 2016

PERF: use StringHashTable for strings in factorizing

0cb98a4

allows releasing the GIL on these dtypes xref pandas-dev#13745

jreback added a commit to jreback/pandas that referenced this issue Dec 14, 2016

PERF: use StringHashTable for strings in factorizing

979ecb3

allows releasing the GIL on these dtypes xref pandas-dev#13745

jreback added a commit to jreback/pandas that referenced this issue Dec 15, 2016

PERF: use StringHashTable for strings in factorizing

a68c402

allows releasing the GIL on these dtypes xref pandas-dev#13745

jreback added a commit to jreback/pandas that referenced this issue Dec 15, 2016

PERF: use StringHashTable for strings in factorizing

a8a01d6

allows releasing the GIL on these dtypes xref pandas-dev#13745

jreback added a commit to jreback/pandas that referenced this issue Dec 15, 2016

PERF: use StringHashTable for strings in factorizing

ade23d1

allows releasing the GIL on these dtypes xref pandas-dev#13745

jreback added a commit to jreback/pandas that referenced this issue Dec 15, 2016

PERF: use StringHashTable for strings in factorizing

98f46c2

allows releasing the GIL on these dtypes xref pandas-dev#13745

mzeitlin11 mentioned this issue Jun 16, 2021

PERF: contiguity, less gil in join algos #42057

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release GIL for Merge #13745

Release GIL for Merge #13745

mrocklin commented Jul 22, 2016

jorisvandenbossche commented Dec 2, 2016

mrocklin commented Dec 2, 2016

jreback commented Dec 2, 2016

jorisvandenbossche commented Dec 2, 2016

jreback commented Dec 2, 2016

mrocklin commented Dec 2, 2016

jreback commented Dec 2, 2016

mrocklin commented Dec 2, 2016

jorisvandenbossche commented Dec 2, 2016

Release GIL for Merge #13745

Release GIL for Merge #13745

Comments

mrocklin commented Jul 22, 2016

jorisvandenbossche commented Dec 2, 2016

mrocklin commented Dec 2, 2016

jreback commented Dec 2, 2016

jorisvandenbossche commented Dec 2, 2016

jreback commented Dec 2, 2016

mrocklin commented Dec 2, 2016

jreback commented Dec 2, 2016

mrocklin commented Dec 2, 2016

jorisvandenbossche commented Dec 2, 2016