Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intra-node memory sharing? #22

Closed
bobye opened this issue Jul 28, 2015 · 11 comments
Closed

intra-node memory sharing? #22

bobye opened this issue Jul 28, 2015 · 11 comments

Comments

@bobye
Copy link

bobye commented Jul 28, 2015

I am interested to see if it is possible to add the feature of intra-node memory sharing to rabit. That said, is it possible for different processors at the same node to work with the same array. As long as the user guarantees the data race will not happen in the design of algorithm, this can be quite useful. For example, a machine learning algorithm whose inputs are (SkipGram) word vectors may want to ensure the embedding of dictionary is one copy per node (saving memory).

In the backend implementation, I believe it is the three commands: shm_open() - ftruncate() - mmap(), which I think is not very difficult to implement into rabit.

@tqchen
Copy link
Member

tqchen commented Jul 28, 2015

This sounds a interesting feature. To have different processes in same node to share memory, an easier way might be enable multi-threading in each process and use openmp locally, which might be readily cleaner than using the memory map.

@bobye
Copy link
Author

bobye commented Jul 28, 2015

@tqchen I don't really like openmp solution, this adds extra programming complexity (as well as debugging) to the user. Consequently, this also forces user to use one process per node (for the sake of memory saving) therefore must spend a lot time in multi-threading part (yet the algorithm itself is as simple as using message passing). I think it conflicts the goal of rabit.

@bobye
Copy link
Author

bobye commented Jul 28, 2015

Just to clarify the openmp part: many parallel algorithms (with MPI or similar interface) in machine learning can still benefit from vectorized computation (like BLAS), while openmp obviously can not do that.

@tqchen
Copy link
Member

tqchen commented Jul 28, 2015

OpenMP can still benefit from vectorized code because code different threads can still use vectorization like eigen or mshadow. Personally I feel having openmp code won't add too much complexity, it is usually a parallel for.

But I agree that in general programming interface maybe pure message passing could be more appealing in some cases:)

@bobye
Copy link
Author

bobye commented Jul 28, 2015

Yes, I agree. If it is a for-loop, openmp can still recognize that, and possibly optimize for speed. But user will have to be careful in implementing the for-loop, ensuring there is no dependencies between consecutive steps. It is relatively simple, if the for-loop only does a few things. But it would become rather tricky to write a complex for-loop and still expect openmp to optimize that.

Mixing OpenMP with message passing is not a good idea, in general. BTW, the newest version of Mac doesnot natively support OpenMP anymore. One will have to rebuild the clang package to use that ...

@hjk41
Copy link
Member

hjk41 commented Jul 29, 2015

I agree that multi-thread programs are easily done wrong, especially when
multiple threads tries to share writable data. A tradeoff might be to
provide shared data that is readonly. However, I don't think it is a good
idea to put it in Rabit. Rabit is supposed to be a communication library,
and I suggest keeping it that way. A better solution would be to build a
standalone module which provides this functionality and have it work with
Rabit.

On Wed, Jul 29, 2015 at 1:42 AM, Jianbo Ye notifications@github.com wrote:

Yes, I agree. If it is a for-loop, openmp can still recognize that, and
possibly optimize for speed. But user will have to be careful in
implementing the for-loop, ensuring there is no dependencies between
consecutive steps. It is relatively simple, if the for-loop only does a few
things. But it would become rather non-trivial to write a complex for-loop
and still expect openmp to optimize that.


Reply to this email directly or view it on GitHub
#22 (comment).

HONG Chuntao
System Research Group
Microsoft Research Asia

@bobye
Copy link
Author

bobye commented Jul 29, 2015

@hjk41 That's possible. Then Rabit will have to know node specific local rank of processors. By knowing the local rank, one can implement a standalone module outside.

@hjk41
Copy link
Member

hjk41 commented Jul 29, 2015

As long as we need to do local optimization (local combine of gradients,
local sharing of readonly data, etc.), we need to be aware of local
processes. That is unavoidable.

On Wed, Jul 29, 2015 at 10:15 AM, Jianbo Ye notifications@github.com
wrote:

@hjk41 https://github.com/hjk41 That's possible. Then Rabit will have
to know node specific local rank of processors.


Reply to this email directly or view it on GitHub
#22 (comment).

HONG Chuntao
System Research Group
Microsoft Research Asia

@bobye
Copy link
Author

bobye commented Jul 29, 2015

Not really, I think. MPI-3 adds MPI_Comm_split_type function. We can make
the "share memory" feature conditional on MPI-3. But this is something new,
I guess. I am not sure rabit, as a communication interface wrapper can be
feasible with this new feature. I am not familiar with other platforms.
Maybe not all platforms have enabled this type of feature.

On the other hand, local optimization can be too much for programming
efforts.

https://www.open-mpi.org/doc/v1.8/man3/MPI_Comm_split_type.3.php

On Tue, Jul 28, 2015 at 10:42 PM, hjk41 notifications@github.com wrote:

As long as we need to do local optimization (local combine of gradients,
local sharing of readonly data, etc.), we need to be aware of local
processes. That is unavoidable.

On Wed, Jul 29, 2015 at 10:15 AM, Jianbo Ye notifications@github.com
wrote:

@hjk41 https://github.com/hjk41 That's possible. Then Rabit will have
to know node specific local rank of processors.


Reply to this email directly or view it on GitHub
#22 (comment).

HONG Chuntao
System Research Group
Microsoft Research Asia


Reply to this email directly or view it on GitHub
#22 (comment).

@trivialfis
Copy link
Member

This might be useful as I need the local rank.

@hcho3
Copy link
Contributor

hcho3 commented Nov 5, 2020

Closing, as Rabit have been moved into dmlc/xgboost. See discussion in dmlc/xgboost#5995.

@hcho3 hcho3 closed this as completed Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants