Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workers on same host: include code from filesystem, not node1 #6855

Closed
wants to merge 1 commit into from

Conversation

timholy
Copy link
Member

@timholy timholy commented May 15, 2014

I've found that when using multiple processes, loading a large pile of code is particularly slow. This patch narrows the gap somewhat, for the case when all processes are on the same host. It works by avoiding interprocess communication and fetching the source files directly from the filesystem, presumably reducing the amount of time workers spend waiting for node 1 to get around to serving their requests.

Timing results:
For using Distributions:
Single-threaded (julia): ~15s
Multi-process, master (julia -p 1): ~24s
Multi-process, this patch (julia -p 1): ~20s

For using Optim (with its own internal using Distributions commented out):
Single-threaded (julia): ~4.2s
Multi-process, master (julia -p 1): ~7.3s
Multi-process, this patch (julia -p 1): ~5.7s

If anyone has suggestions to reduce the overhead even further, I'm all ears.

@timholy
Copy link
Member Author

timholy commented May 15, 2014

I noticed one place where this isn't "free": push!(LOAD_PATH, newpath) needs to become @everywhere push!(LOAD_PATH, newpath).

@eschnett
Copy link
Contributor

One approach would be to introduce a tree-structure for these requests. That is, the nodes are grouped hierarchically, and instead of requesting things from node 1, they request it from the next layer up in the hierarchy. This layer may need to forward the request to the next layer. This increases the latency a bit, since the requests need to be propagated, but with a bit of caching, node 1 has much less work to do. My estimate would be that somewhere between 10 and 100 nodes, such an approach will be significantly faster.

@timholy
Copy link
Member Author

timholy commented May 16, 2015

🎂 Happy 1 year old, PR! My, how you've grown!

I haven't been developing multiprocess stuff much recently, so comments from people who do are still desired. Is this still relevant? Any thoughts? I agree that @eschnett's idea might be better for a large cluster, but without such a cluster to test on I'd hesitate to develop a more complex approach. So if this seems like a step in the right direction, this is about as far as I want to take this now.

@tkelman tkelman added the parallelism Parallel or distributed computation label May 16, 2015
@tkelman
Copy link
Contributor

tkelman commented May 16, 2015

related, I think? #11093

@timholy
Copy link
Member Author

timholy commented Apr 20, 2016

I suspect this is irrelevant now.

@timholy timholy closed this Apr 20, 2016
@timholy timholy deleted the teh/multiprocloading branch April 20, 2016 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelism Parallel or distributed computation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants