-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C vs C++ MPI Usage #127
Comments
Thanks! PRs are welcomed! Just currently due to MPI backend doesn't support fault tolerance so we don't really use it. @chenqin may provide more inputs here. |
Thanks Travis. MPI were not prioritized because it doesn’t support fault recovery we expected from socket implementation. I put some thoughts on building a overlay on top and founds it’s not very straightforward. Meanwhile, we are happy to work with you on this if that’s what’s your passioned about. |
If it's just a straightforward translation from the C++ calls to the C ones I don't see why we wouldn't do it. Note that AFAI understand, my research use cases still make use of the sockets version for communication, I'm only using MPI/SLURM as a tracker. Re. what @chenqin suggested I've started recently looking into LightGBM and it has a nice thin layer between MPI and their sockets implementation, with collectives making heavy use of a sendrecv operation internally, meaning that it doesn't take too much duplication to support the two. For the future, it might be worth taking a look. |
Great, thanks! One thing I would add is that changing to manual send/recv patterns will likely come at the cost of performance. I think most MPI implementations use recursive doubling for allreduce, which is a communication-avoiding strategy. And on HPC systems with boutique interconnects, collectives are optimized even further to take advantage of the network topology. |
Agreed, no point in reinventing the wheel, we should keep it as high level as possible. |
Closing, as Rabit have been moved into dmlc/xgboost. See discussion in dmlc/xgboost#5995. |
As you know, the MPI standard deprecated C++ bindings. Some implementations still offer them (e.g. OpenMPI) while others do not (e.g. MSMPI). Would you be receptive to a re-write of the MPI-backend internals to use the MPI C API? Looking at #31 it seems like you're open to the change at least in principle. I am willing to do this, but I wanted to make sure before I started.
The text was updated successfully, but these errors were encountered: