Optimize cpu sketch allreduce for sparse data. #6009

trivialfis · 2020-08-12T21:23:34Z

Follow up on #5880 . @ShvetsKS @SmirnovEgorRu Previously sketching on URL dataset on distributed environment was not possible due to memory usage blow up. After this PR I can unify the sketching for hist and approx and proceed with other refactoring to unify the codebase.

Also fixed a bug in distributed training with incorrect allreduce call, and added tests.

Perf

Previous perf is from #5880.

Single Node

Before

URL
GmatInitialization: 10.3033s, 1 calls @ 10303323us
GmatInitialization: 10.2406s, 1 calls @ 10240576us
GmatInitialization: 10.224s, 1 calls @ 10224020us
HIGGS
GmatInitialization: 3.74322s, 1 calls @ 3743224us
GmatInitialization: 3.63079s, 1 calls @ 3630793us
GmatInitialization: 3.67905s, 1 calls @ 3679049us

After

HIGGS
GmatInitialization: 3.7794s, 1 calls @ 3779401us
GmatInitialization: 3.81085s, 1 calls @ 3810851us
GmatInitialization: 3.77258s, 1 calls @ 3772581us
GmatInitialization: 3.76844s, 1 calls @ 3768436us
URL
GmatInitialization: 10.315s, 1 calls @ 10314976us
GmatInitialization: 10.3202s, 1 calls @ 10320246us
GmatInitialization: 10.3059s, 1 calls @ 10305877us

Multi Node 4x4

Before

HIGGS
GmatInitialization: 3.7682s, 1 calls @ 3768198us
GmatInitialization: 3.707s, 1 calls @ 3707000us
GmatInitialization: 3.70828s, 1 calls @ 3708276us
URL
NAN

After

HIGGS
GmatInitialization: 3.64623s, 1 calls @ 3646232us
GmatInitialization: 3.5942s, 1 calls @ 3594197us
GmatInitialization: 3.73079s, 1 calls @ 3730792us
URL
GmatInitialization: 6.53294s, 1 calls @ 6532936us
GmatInitialization: 6.43215s, 1 calls @ 6432153us
GmatInitialization: 6.67619s, 1 calls @ 6676192us

@ShvetsKS Also I might have found the cause of the slow down you mentioned, this is memory usage from massif when training on URL on distributed env, but should be related to performance:

I can't get a complete run as my home machine only has 32 GB of memory. Hope that helps.

trivialfis · 2020-08-17T19:23:14Z

@ShvetsKS Could you please help taking a look?

src/common/quantile.cc

codecov-commenter · 2020-08-18T05:03:21Z

Codecov Report

Merging #6009 into master will increase coverage by 0.10%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #6009      +/-   ##
==========================================
+ Coverage   79.04%   79.14%   +0.10%     
==========================================
  Files          12       12              
  Lines        3025     3040      +15     
==========================================
+ Hits         2391     2406      +15     
  Misses        634      634

Impacted Files	Coverage Δ
python-package/xgboost/sklearn.py	`91.44% <0.00%> (+0.06%)`	⬆️
python-package/xgboost/core.py	`78.22% <0.00%> (+0.08%)`	⬆️
python-package/xgboost/data.py	`59.40% <0.00%> (+0.85%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a418278...e3c53e8. Read the comment docs.

trivialfis · 2020-08-18T09:58:18Z

@ShvetsKS I'm merging it as it contains a bug fix that's blocking the CI. If you have concerns feel free to revert or let me know how should I make follow up PRs.

trivialfis requested a review from SmirnovEgorRu August 12, 2020 21:25

trivialfis force-pushed the optimize-cpu-sketch-allreduce branch 3 times, most recently from 13e690e to a0757a1 Compare August 17, 2020 18:17

trivialfis mentioned this pull request Aug 17, 2020

[CI] Debug failing tests in the master branch #6014

Closed

RAMitchell approved these changes Aug 18, 2020

View reviewed changes

src/common/quantile.cc Show resolved Hide resolved

trivialfis added 4 commits August 18, 2020 09:46

Optimize CPU sketching on distributed environment.

69a5175

Split up function.

c6e661f

Cleanup.

70aa226

Relax the tests.

51a285e

trivialfis force-pushed the optimize-cpu-sketch-allreduce branch from a0757a1 to 51a285e Compare August 18, 2020 03:45

trivialfis added 2 commits August 18, 2020 19:57

Increase the test size.

567eaee

Relax.

e3c53e8

trivialfis merged commit 29b7fea into dmlc:master Aug 19, 2020

trivialfis deleted the optimize-cpu-sketch-allreduce branch August 19, 2020 02:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize cpu sketch allreduce for sparse data. #6009

Optimize cpu sketch allreduce for sparse data. #6009

trivialfis commented Aug 12, 2020 •

edited

Loading

trivialfis commented Aug 17, 2020

codecov-commenter commented Aug 18, 2020 •

edited

Loading

trivialfis commented Aug 18, 2020

Optimize cpu sketch allreduce for sparse data. #6009

Optimize cpu sketch allreduce for sparse data. #6009

Conversation

trivialfis commented Aug 12, 2020 • edited Loading

Perf

Single Node

Before

After

Multi Node 4x4

Before

After

trivialfis commented Aug 17, 2020

codecov-commenter commented Aug 18, 2020 • edited Loading

Codecov Report

trivialfis commented Aug 18, 2020

trivialfis commented Aug 12, 2020 •

edited

Loading

codecov-commenter commented Aug 18, 2020 •

edited

Loading