Optimize cpu sketch allreduce for sparse data. #6009
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Follow up on #5880 . @ShvetsKS @SmirnovEgorRu Previously sketching on URL dataset on distributed environment was not possible due to memory usage blow up. After this PR I can unify the sketching for
hist
andapprox
and proceed with other refactoring to unify the codebase.Also fixed a bug in distributed training with incorrect allreduce call, and added tests.
Perf
Previous perf is from #5880.
Single Node
Before
URL
GmatInitialization: 10.3033s, 1 calls @ 10303323us
GmatInitialization: 10.2406s, 1 calls @ 10240576us
GmatInitialization: 10.224s, 1 calls @ 10224020us
HIGGS
GmatInitialization: 3.74322s, 1 calls @ 3743224us
GmatInitialization: 3.63079s, 1 calls @ 3630793us
GmatInitialization: 3.67905s, 1 calls @ 3679049us
After
HIGGS
GmatInitialization: 3.7794s, 1 calls @ 3779401us
GmatInitialization: 3.81085s, 1 calls @ 3810851us
GmatInitialization: 3.77258s, 1 calls @ 3772581us
GmatInitialization: 3.76844s, 1 calls @ 3768436us
URL
GmatInitialization: 10.315s, 1 calls @ 10314976us
GmatInitialization: 10.3202s, 1 calls @ 10320246us
GmatInitialization: 10.3059s, 1 calls @ 10305877us
Multi Node 4x4
Before
HIGGS
GmatInitialization: 3.7682s, 1 calls @ 3768198us
GmatInitialization: 3.707s, 1 calls @ 3707000us
GmatInitialization: 3.70828s, 1 calls @ 3708276us
URL
NAN
After
HIGGS
GmatInitialization: 3.64623s, 1 calls @ 3646232us
GmatInitialization: 3.5942s, 1 calls @ 3594197us
GmatInitialization: 3.73079s, 1 calls @ 3730792us
URL
GmatInitialization: 6.53294s, 1 calls @ 6532936us
GmatInitialization: 6.43215s, 1 calls @ 6432153us
GmatInitialization: 6.67619s, 1 calls @ 6676192us
@ShvetsKS Also I might have found the cause of the slow down you mentioned, this is memory usage from massif when training on URL on distributed env, but should be related to performance:
I can't get a complete run as my home machine only has 32 GB of memory. Hope that helps.