-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup worker-worker connections lazily #42
Comments
This would solve major connection time issues on large clusters that we have repeatedly seen. |
Just wanted mention that it also seemed that JuliaLang/julia#22588 made adding remote workers noticeably faster. |
I wonder how and why JuliaLang/julia#22588 affected worker startup time. @vtjnash ? |
@andreasnoack / @ViralBShah care to comment on the interface for lazy connection setup in JuliaLang/julia#22814? |
Sorry for the noise here. Just did some more systematic timings and my previous impression must have been based on differences in the connection. |
Bump – are we still planning on doing this? |
bump |
The default
all_to_all
topology connects all processes to each other. While this is fine for small clusters, the total number of TCP connections increases rapidly as (N^2)/2.Considering that a large class of parallel problems only need master-worker connections we should change the default topology to
all_to_all_lazy
where worker-worker connections are setup only on the first request from a worker to another worker. And also introduce another topologymaster_routed
which only connects master to workers, and in case of a worker-worker call, routes the request through the master.To summarize, implement 2 new topologies:
all_to_all_lazy
where worker-worker connections are setup lazily, and is the default for addprocs andmaster_routed
in which only the master connects to workers and worker-worker messages are routed via the master.The text was updated successfully, but these errors were encountered: