-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
intra-node memory sharing? #22
Comments
This sounds a interesting feature. To have different processes in same node to share memory, an easier way might be enable multi-threading in each process and use openmp locally, which might be readily cleaner than using the memory map. |
@tqchen I don't really like openmp solution, this adds extra programming complexity (as well as debugging) to the user. Consequently, this also forces user to use one process per node (for the sake of memory saving) therefore must spend a lot time in multi-threading part (yet the algorithm itself is as simple as using message passing). I think it conflicts the goal of rabit. |
Just to clarify the openmp part: many parallel algorithms (with MPI or similar interface) in machine learning can still benefit from vectorized computation (like BLAS), while openmp obviously can not do that. |
OpenMP can still benefit from vectorized code because code different threads can still use vectorization like eigen or mshadow. Personally I feel having openmp code won't add too much complexity, it is usually a parallel for. But I agree that in general programming interface maybe pure message passing could be more appealing in some cases:) |
Yes, I agree. If it is a for-loop, openmp can still recognize that, and possibly optimize for speed. But user will have to be careful in implementing the for-loop, ensuring there is no dependencies between consecutive steps. It is relatively simple, if the for-loop only does a few things. But it would become rather tricky to write a complex for-loop and still expect openmp to optimize that. Mixing OpenMP with message passing is not a good idea, in general. BTW, the newest version of Mac doesnot natively support OpenMP anymore. One will have to rebuild the clang package to use that ... |
I agree that multi-thread programs are easily done wrong, especially when On Wed, Jul 29, 2015 at 1:42 AM, Jianbo Ye notifications@github.com wrote:
HONG Chuntao |
@hjk41 That's possible. Then Rabit will have to know node specific local rank of processors. By knowing the local rank, one can implement a standalone module outside. |
As long as we need to do local optimization (local combine of gradients, On Wed, Jul 29, 2015 at 10:15 AM, Jianbo Ye notifications@github.com
HONG Chuntao |
Not really, I think. MPI-3 adds MPI_Comm_split_type function. We can make On the other hand, local optimization can be too much for programming https://www.open-mpi.org/doc/v1.8/man3/MPI_Comm_split_type.3.php On Tue, Jul 28, 2015 at 10:42 PM, hjk41 notifications@github.com wrote:
|
This might be useful as I need the local rank. |
Closing, as Rabit have been moved into dmlc/xgboost. See discussion in dmlc/xgboost#5995. |
I am interested to see if it is possible to add the feature of intra-node memory sharing to rabit. That said, is it possible for different processors at the same node to work with the same array. As long as the user guarantees the data race will not happen in the design of algorithm, this can be quite useful. For example, a machine learning algorithm whose inputs are (SkipGram) word vectors may want to ensure the embedding of dictionary is one copy per node (saving memory).
In the backend implementation, I believe it is the three commands: shm_open() - ftruncate() - mmap(), which I think is not very difficult to implement into rabit.
The text was updated successfully, but these errors were encountered: