Use heuristic to select histogram node, avoid rabit call #4951
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR considerably improves performance for the distributed GPU algorithm. We use the sum of Hessian in the left and right node to estimate which has fewer training instances instead of using rabit to sync and calculate this exactly. In particular it improves latency.
Performance numbers from training on a DGX-1 with https://github.com/NVIDIA/gbm-bench
I also removed the old HostAllReduce function, which is redundant now that we no longer manage multiple GPUs with threads.