Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store first value in Dict directly in innerjoin #29

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

non-Jedi
Copy link

@non-Jedi non-Jedi commented Aug 28, 2020

I saw some of the discussion in
JuliaData/DataFrames.jl#2340 and got
curious about what was possible.

This avoids allocating a Vector for the case where l does not have
multiple indices with the same value. For the smoke-test benchmark in
JuliaData/DataFrames.jl#2340 (comment),
this reduces allocations by half and overall runtime by 10%.

Most of the allocations still come from this
line

which it's much less clear how to reduce allocations in. I'm not sure
how much JuliaLang/julia#24909 affects
performance in this case. One option would be to heuristically
estimate the size of out based on the size of l and r and call
sizehint! on it; this didn't seem to help in my testing.

I realize I'm only optimizing a single method of innerjoin, but I'm
not super familiar with this field nor with the inner workings of this
package, so I leave it to you to decide if this is a worthwhile
complication and if it's relevant elsewhere in the package.

This avoids allocating a Vector for the case where l does not have
multiple indices with the same value. For the smoke-test benchmark in
<JuliaData/DataFrames.jl#2340 (comment)>,
this reduces allocations by half and overall runtime by 10%.
@andyferris
Copy link
Member

Ah thanks.

Another strategy I tried a while back was to first assume the join keys are distinct and then bail to a more general implementation when that's not the case. It would be awesome to compare the performance vesus this.

(Sorry @non-Jedi I haven't been keeping track of PRs...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants