-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multicore collapsing #128
Multicore collapsing #128
Conversation
I'm pretty sure Travis is failing only because I don't know how to get it to run |
Should just be a matter of adding: script: cargo test --no-fail-fast --verbose --all -- --test-threads=1 just below I'll try to get around to reviewing this some time this week :) |
Good stuff. Definitely take your time. In the meantime I'll try to fix the |
4f3d473
to
762e213
Compare
FYI - Don't take a look at this yet. Still working on getting the commits split out. Will shoot you a note once that's finished. |
355517b
to
1e7277d
Compare
Ok - I think that should be much better (although probably not perfect). The substantive commit is |
I'm still seeing basically all of those code blocks moved around in 3a3e669. Am I missing something? |
Ah - missed the bins, I think. One sec. |
1e7277d
to
d3429d5
Compare
Ok - This should be better. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good now! I left a few inline comments, but nothing too bad I think!
Co-Authored-By: Jon Gjengset <jon@thesquareplanet.com>
Co-Authored-By: Jon Gjengset <jon@thesquareplanet.com>
Co-Authored-By: Jon Gjengset <jon@thesquareplanet.com>
Co-Authored-By: Jon Gjengset <jon@thesquareplanet.com>
...to separate line.
Ok. I think I'm done working on the most recent comments (assuming the CI keeps passing). There are still a couple of open conversations (see above), but hopefully I addressed everything else at least somewhat adequately. Looking forward to your feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loving the detailed commits -- makes it so easy to review 👍
Changes all look good to me. I think the one case highlighted above should be made an error as you suggest, and I think we should probably not implement Clone
as discussed in the other comment chain, but beyond that I think this is ready to land 🎉
Pending CI, I think all is incorporated! |
Using your machines for the benchmarks, you should probably update the |
Merged! 🎉 |
So after a bit more time working on this, I got
collapse-perf
,collapse-dtrace
andcollapse-guess
to all work with multiple cores.The speedup is pretty good. The collapse/perf benchmark on my machine went from 186 MiB/s throughput to 421 MiB/s (+226% improvement).
All the test pass. That said, because we're testing log messages you need to run the test suite with
cargo test -- --test-threads=1
to ensure the global test logger is not being trampled on by multiple tests running at the same time.There are a lot of changes here; so I'll give you a chance to look over the code. Of course, if you have any questions at any time, please don't hesitate to reach out.
At a high level, what has changed is that we now have two code paths for each folder: a "single threaded" code path, which hasn't changed, and a new, "multi-threaded" code path. Which one is taken depends on a configuration option (
nthreads
).In the multi-threaded code path, we do an initial pass over the data in order to figure out how to chunk it up. I've tried to make this initial pass as efficient as possible, as it's extra work that doesn't need to be done in the single-threaded situation.
After that's done, we spin up several threads and send the chunked input data to each (actually, we just send a reference). Each thread then uses the single-threaded code path on their chunk.
To bring all the output together ... instead of getting back output from each thread and then "reducing" these outputs, I elected to just use a concurrent hashmap; so there's no need to collate the results after each thread has done it's work. It's sufficient to just write out the contents of the shared hashmap in the same way we were writing out the contents of the single-threaded hashmap in the old code.
It might be interesting to experiment later with a model that doesn't use a concurrent hashmap but instead gives each thread it's own "regular" hashmap and then reduces them in the end. Not sure if that would be faster or slower.