-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel / distributed training #1140
Conversation
I just want to say Kudos quickly! This is surely a great improvement :) |
Round of applause! This is an excellent PR of a long-awaited feature (and then some, since this covers CPU, GPU, and node-to-node distributed computation). Accomplishing this while insulating the core Caffe code and parallelizing models without modification is certainly a strong plus too. How about we promote this to a BVLC/caffe branch now to collaborate on the last steps to groom for a swift merge to dev? |
Kudos! Long awaited feature! |
Great PR! On Monday, September 22, 2014, Abhinav Shrivastava notifications@github.com
Sergio |
Finally a PR on one of the top in caffe wishlist. IPython.parallel seems interesting in this schema. |
Sounds good! Ty for the PR! |
@cypof in lieu of merging I promoted your commit to a BVLC feature branch to collaborate on review, grooming, and merge to dev. The new branch is BVLC/caffe:parallel. Everyone please join #1148 to help prepare parallelism for merge! |
That's great news, thanks! @abhi2610 I would love to help benchmarking. |
Hi, @cypof , thank you very much for this great PR! |
Closing in favor of feature branch for review and collaboration: see #1148. |
A set of classes to synchronize SGD between multiple solvers. Based on the Hogwild paper, and our work at Flickr to extend the model to GPUs and distributed configurations by streaming gradients between solvers.
Features
Limitations
Tests
Early results on MNIST seem to show linear scaling. We tested on up to 6 machines with 4 solvers each for CPU, and 2 machines with 2 GPUs each. GPUs do not perform well on this small network but still seem to scale linearly.
In the weeks to come we plan to start testing on larger networks and clusters. Currently our GPU machines are connected through 1G Ethernet, please contact us if you are interested to help benchmarking on better hardware.
Architecture
We made the Caffe singleton thread-local, to allow multiple solvers to run in parallel on their own thread. Synchronization works by sharing the weight buffers between solvers in the same address space, and by asynchronously measuring and exchanging gradients between address spaces.
Bugs / Todos