intra-node memory sharing? #22

bobye · 2015-07-28T01:49:17Z

I am interested to see if it is possible to add the feature of intra-node memory sharing to rabit. That said, is it possible for different processors at the same node to work with the same array. As long as the user guarantees the data race will not happen in the design of algorithm, this can be quite useful. For example, a machine learning algorithm whose inputs are (SkipGram) word vectors may want to ensure the embedding of dictionary is one copy per node (saving memory).

In the backend implementation, I believe it is the three commands: shm_open() - ftruncate() - mmap(), which I think is not very difficult to implement into rabit.

tqchen · 2015-07-28T02:01:30Z

This sounds a interesting feature. To have different processes in same node to share memory, an easier way might be enable multi-threading in each process and use openmp locally, which might be readily cleaner than using the memory map.

bobye · 2015-07-28T02:58:18Z

@tqchen I don't really like openmp solution, this adds extra programming complexity (as well as debugging) to the user. Consequently, this also forces user to use one process per node (for the sake of memory saving) therefore must spend a lot time in multi-threading part (yet the algorithm itself is as simple as using message passing). I think it conflicts the goal of rabit.

bobye · 2015-07-28T03:06:10Z

Just to clarify the openmp part: many parallel algorithms (with MPI or similar interface) in machine learning can still benefit from vectorized computation (like BLAS), while openmp obviously can not do that.

tqchen · 2015-07-28T04:49:02Z

OpenMP can still benefit from vectorized code because code different threads can still use vectorization like eigen or mshadow. Personally I feel having openmp code won't add too much complexity, it is usually a parallel for.

But I agree that in general programming interface maybe pure message passing could be more appealing in some cases:)

bobye · 2015-07-28T17:42:17Z

Yes, I agree. If it is a for-loop, openmp can still recognize that, and possibly optimize for speed. But user will have to be careful in implementing the for-loop, ensuring there is no dependencies between consecutive steps. It is relatively simple, if the for-loop only does a few things. But it would become rather tricky to write a complex for-loop and still expect openmp to optimize that.

Mixing OpenMP with message passing is not a good idea, in general. BTW, the newest version of Mac doesnot natively support OpenMP anymore. One will have to rebuild the clang package to use that ...

hjk41 · 2015-07-29T02:05:05Z

I agree that multi-thread programs are easily done wrong, especially when
multiple threads tries to share writable data. A tradeoff might be to
provide shared data that is readonly. However, I don't think it is a good
idea to put it in Rabit. Rabit is supposed to be a communication library,
and I suggest keeping it that way. A better solution would be to build a
standalone module which provides this functionality and have it work with
Rabit.

On Wed, Jul 29, 2015 at 1:42 AM, Jianbo Ye notifications@github.com wrote:

Yes, I agree. If it is a for-loop, openmp can still recognize that, and
possibly optimize for speed. But user will have to be careful in
implementing the for-loop, ensuring there is no dependencies between
consecutive steps. It is relatively simple, if the for-loop only does a few
things. But it would become rather non-trivial to write a complex for-loop
and still expect openmp to optimize that.

—
Reply to this email directly or view it on GitHub
#22 (comment).

HONG Chuntao
System Research Group
Microsoft Research Asia

bobye · 2015-07-29T02:15:16Z

@hjk41 That's possible. Then Rabit will have to know node specific local rank of processors. By knowing the local rank, one can implement a standalone module outside.

hjk41 · 2015-07-29T02:42:02Z

As long as we need to do local optimization (local combine of gradients,
local sharing of readonly data, etc.), we need to be aware of local
processes. That is unavoidable.

On Wed, Jul 29, 2015 at 10:15 AM, Jianbo Ye notifications@github.com
wrote:

@hjk41 https://github.com/hjk41 That's possible. Then Rabit will have
to know node specific local rank of processors.

—
Reply to this email directly or view it on GitHub
#22 (comment).

HONG Chuntao
System Research Group
Microsoft Research Asia

bobye · 2015-07-29T02:52:50Z

Not really, I think. MPI-3 adds MPI_Comm_split_type function. We can make
the "share memory" feature conditional on MPI-3. But this is something new,
I guess. I am not sure rabit, as a communication interface wrapper can be
feasible with this new feature. I am not familiar with other platforms.
Maybe not all platforms have enabled this type of feature.

On the other hand, local optimization can be too much for programming
efforts.

https://www.open-mpi.org/doc/v1.8/man3/MPI_Comm_split_type.3.php

On Tue, Jul 28, 2015 at 10:42 PM, hjk41 notifications@github.com wrote:

As long as we need to do local optimization (local combine of gradients,
local sharing of readonly data, etc.), we need to be aware of local
processes. That is unavoidable.

On Wed, Jul 29, 2015 at 10:15 AM, Jianbo Ye notifications@github.com
wrote:

@hjk41 https://github.com/hjk41 That's possible. Then Rabit will have
to know node specific local rank of processors.

—
Reply to this email directly or view it on GitHub
#22 (comment).

HONG Chuntao
System Research Group
Microsoft Research Asia

—
Reply to this email directly or view it on GitHub
#22 (comment).

trivialfis · 2019-09-19T03:11:01Z

This might be useful as I need the local rank.

hcho3 · 2020-11-05T06:11:12Z

Closing, as Rabit have been moved into dmlc/xgboost. See discussion in dmlc/xgboost#5995.

hcho3 closed this as completed Nov 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

intra-node memory sharing? #22

intra-node memory sharing? #22

bobye commented Jul 28, 2015

tqchen commented Jul 28, 2015

bobye commented Jul 28, 2015

bobye commented Jul 28, 2015

tqchen commented Jul 28, 2015

bobye commented Jul 28, 2015

hjk41 commented Jul 29, 2015

bobye commented Jul 29, 2015

hjk41 commented Jul 29, 2015

bobye commented Jul 29, 2015

trivialfis commented Sep 19, 2019

hcho3 commented Nov 5, 2020

intra-node memory sharing? #22

intra-node memory sharing? #22

Comments

bobye commented Jul 28, 2015

tqchen commented Jul 28, 2015

bobye commented Jul 28, 2015

bobye commented Jul 28, 2015

tqchen commented Jul 28, 2015

bobye commented Jul 28, 2015

hjk41 commented Jul 29, 2015

bobye commented Jul 29, 2015

hjk41 commented Jul 29, 2015

bobye commented Jul 29, 2015

trivialfis commented Sep 19, 2019

hcho3 commented Nov 5, 2020