-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI-enabled PaddlePaddle #9405
Comments
It looks to me that in order to utilize InfiniBank and GPDirect via MPI, we need to call MPI_AllReduce? MPI_AllReduce is mutually exclusive with parameter server, fault tolerance, and elastic scheduling. It would be important to draft a design doc to make sure that Fluid supports both modes -- AllReduce and ParameterServer. |
Also, I noticed that the current distributed computing solution includes a transpiler that generates the ProgramDesc messages for trainers and parameters. If we are going to use MPI_AllReduce to replace parameter servers, do we need a new transpiler that generates the ProgramDesc for trainers in the AllReduce mode? |
We current target speed up PaddlePaddle with Distribution. And for that, We need to call API
|
By using MPI API, we enables PaddlePaddle to take advantage of high performance low latency networks such as Infiniband.
There are two benefits:
The text was updated successfully, but these errors were encountered: