Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Merge rabit #6001

Closed
wants to merge 575 commits into from

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Aug 11, 2020

See #5995 for detail. This PR handles merging RABIT with subtree merge. I'm not entirely sure how will github handle it so right now I created a branch call dmlc-merge-rabit on dmlc/xgboost.

This PR is for testing, for the real merge I plan to do it locally on my machine with subtree merge and push to master branch directly. This way we can merge it smoothly without creating a merge node in git. Also as the PR setting is squashing only so I can't use PR to merge a submodule.

trivialfis and others added 21 commits October 13, 2019 00:09
* Don't assert buffer size.
* Fix compilation failure on Windows

* Fix lint

* throw dmlc::Error handled by xgboost jni
* add unittests

* Expose RabitAllGatherRing and RabitGetRingPrevRank

* Enabled TCP_NODELAY to decrease latency
…#128)

* allow timeout to 0 to eanble immediate exit

* disable duplicated signature check, overwrite results with same key
* fix missing allgether rabit declaration

* fix allgather signature mismatch

* fix type conversion

* fix GetRingPrevRank
* fix hanging connections

* remove logging
* Add RABIT_DLL tag to definitions of rabit APIs.
* Fix Travis tests.
…dmlc#136)

* De-duplicate macro _CRT_SECURE_NO_WARNINGS / _CRT_SECURE_NO_DEPRECATE

* Move all macros to base.h

* Fix CI
* set a minimal reducer msg size. Receive the same data size from parent each time.

* When parent read from a child, check it receive minimal reduce size.
 fix bug. Rewrite the minimal reducer size check, make sure it's 1~N times of minimal reduce size

 Assume the minimal reduce size is X, the logic here is
 1: each child upload total_size of message
 2: each parent receive X message at least, up to total_size
 3: parent reduce X or NxX or total_size message
 4: parent sends X or NxX or total_size message to its parent
 4: parent's parent receive X message at least, up to total_size. Then reduce X or NxX or total_size message
 6: parent's parent sends X or NxX or total_size message to its children
 7: parent receives X or NxX or total_size message, sends to its children
 8: child receive X or NxN or total_size message.

 During the whole process, each transfer is (1~N)xX Byte message or up to total_size.

 if X is larger than total_size, then allreduce allways reduce the whole messages and pass down.

* Follow style check rule

* fix the cpplint check

* fix allreduce_base header seq

Co-authored-by: Philip Hyunsu Cho <chohyu01@cs.washington.edu>
@trivialfis
Copy link
Member Author

@dmlc/xgboost-committer I plan to create a fast forward merge using command line and push to master branch directly. Please feel free to share your suggestion.

@codecov-commenter
Copy link

codecov-commenter commented Aug 11, 2020

Codecov Report

Merging #6001 into dmlc-merge-rabit will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@                Coverage Diff                @@
##           dmlc-merge-rabit    #6001   +/-   ##
=================================================
  Coverage             78.52%   78.52%           
=================================================
  Files                    12       12           
  Lines                  3013     3013           
=================================================
  Hits                   2366     2366           
  Misses                  647      647           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6f7112a...b50de81. Read the comment docs.

@trivialfis trivialfis changed the title Merge rabit [DO NOT MERGE] Merge rabit Aug 11, 2020
@hcho3
Copy link
Collaborator

hcho3 commented Aug 14, 2020

@trivialfis Let us do a merge commit instead. We can temporarily enable option to merge pull request via a merge commit.

@trivialfis trivialfis closed this Aug 17, 2020
@trivialfis trivialfis deleted the merge-rabit branch August 17, 2020 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.