Add device memory usage profiling. #3795

trivialfis · 2018-10-15T05:15:40Z

This is another small step toward optimizing memory usage of gpu_hist method. There might be a better way to handle the backtrace information instead of printing everything. Suggestions are welcomed.

hcho3 · 2018-10-17T00:17:47Z

@trivialfis I'm curious, what is the benefit of building a custom monitor as supposed to using nvprof?

trivialfis · 2018-10-17T01:01:33Z

@hcho3 It's really strange that I am able to find all the really detailed information by using nvprof , but memory usage.

hcho3 · 2018-10-17T01:03:07Z

@trivialfis So nvprof doesn't give you memory usage information?

trivialfis · 2018-10-17T01:05:22Z

@hcho3 As far as I know, it doesn't.

hcho3 · 2018-10-17T01:07:13Z

@trivialfis That's interesting. I now see why you are working on this pull request.

trivialfis · 2018-10-17T08:46:45Z

@hcho3 Hi, do you have any insight on how to force the linking in make file? Currently the R test failed with:

dlopen(/Users/travis/build/dmlc/xgboost/xgboost/xgboost.Rcheck/xgboost/libs/xgboost.so, 6): Symbol not found: __ZN7xgboost6common34__dmlc_registry_file_tag_monitor__Ev

Actually, this added dmlc tag was my attemp to solve the problem. Previously the error is some member functions' definition gone missing.

One way to solve it is inlining everything, but that will bring a huge burden for compilation, since the printing messages of Monitor and DeviceMemStat are changed frequently. An added period mark would force a recompilation of the whole project.

An odd thing I notice is that there are two "_" prefixed to the missing symbol, while there should be only one of it. If it's something in the makefile I'm missing please let me know. Otherwise I have to inline all these functions before finishing this PR. Thanks!

hcho3 · 2018-10-17T10:26:24Z

@trivialfis You should add your source file to amalgamation/xgboost-all0.cc, since this is how the R package links all source files.

trivialfis · 2018-10-17T13:59:23Z

@hcho3 Thanks! You just saved my day.

trivialfis · 2018-10-19T02:06:31Z

@RAMitchell , @hcho3
Initial result from my simple testing with epsilon dataset (500000 x 2000 dense), with:

params = {'n_gpus': -1, 'debug_verbose': 5,
          'tree_method': 'gpu_hist',
          'objective': 'binary:logistic'}
num_boost_round = 256

Device memory usage summary: ===========
Peak usage: 9.177299 GB.
Number of allocations: 72 times.

And the biggest memory consumer seems to be the single GPU predictor, which also causes extremely imbalanced memory usage across all GPUs. Here is the obtained trace for related memory allocation:

Pointer: 0x7fedde000000
Peak usage: 5.963445 GB
Traces:
  XGBoosterUpdateOneIter+0x48
  dmlc::StackTrace[abi:cxx11]()+0x53
  ffi_call_unix64+0x4c
  xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*)+0x220
  xgboost::common::DeviceMemoryStat::Allocate(void*, unsigned long)+0x1ec
  xgboost::gbm::GBTree::CommitModel(std::vector<std::vector<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> >, std::allocator<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> > > >, std::allocator<std::vector<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> >, std::allocator<std::unique_ptr<xgboost::RegTree, std::default_delete<xgboost::RegTree> > > > > >&&)+0x2d0
  xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix*, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::ObjFunction*)+0x85b
  xgboost::predictor::DeviceMatrix::DeviceMatrix(xgboost::DMatrix*, int, bool)+0x16f
  xgboost::predictor::GPUPredictor::DevicePredictInternal(xgboost::DMatrix*, xgboost::HostDeviceVector<float>*, xgboost::gbm::GBTreeModel const&, unsigned long, unsigned long)+0xef0
  xgboost::predictor::GPUPredictor::UpdatePredictionCache(xgboost::gbm::GBTreeModel const&, std::vector<std::unique_ptr<xgboost::TreeUpdater, std::default_delete<xgboost::TreeUpdater> >, std::allocator<std::unique_ptr<xgboost::TreeUpdater, std::default_delete<xgboost::TreeUpdater> > > >*, int)+0xad

We should try it again after #3738 is merged.

trivialfis · 2018-10-25T02:32:51Z

New profiling result with multigpu predictor in place:
usage.txt

codecov-io · 2018-11-05T04:18:17Z

Codecov Report

Merging #3795 into master will decrease coverage by 0.19%.
The diff coverage is 20.51%.

@@             Coverage Diff             @@
##             master    #3795     +/-   ##
===========================================
- Coverage     51.95%   51.76%   -0.2%     
+ Complexity      203      196      -7     
===========================================
  Files           182      182             
  Lines         14500    14479     -21     
  Branches        495      489      -6     
===========================================
- Hits           7534     7495     -39     
- Misses         6728     6750     +22     
+ Partials        238      234      -4

Impacted Files	Coverage Δ	Complexity Δ
src/gbm/gblinear.cc	`14.45% <ø> (ø)`	`0 <0> (ø)`	⬇️
src/gbm/gbtree.cc	`18.67% <ø> (ø)`	`0 <0> (ø)`	⬇️
src/linear/updater_coordinate.cc	`100% <ø> (ø)`	`0 <0> (ø)`	⬇️
src/learner.cc	`30.35% <100%> (+2.73%)`	`0 <0> (ø)`	⬇️
src/common/transform.h	`88.88% <100%> (-0.4%)`	`0 <0> (ø)`
src/common/common.cc	`100% <100%> (ø)`	`0 <0> (ø)`	⬇️
src/common/monitor.cc	`4.91% <4.91%> (ø)`	`0 <0> (?)`
src/common/monitor.h	`72.41% <72.41%> (ø)`	`0 <0> (?)`
tests/cpp/helpers.cc	`90.47% <0%> (-0.44%)`	`0% <0%> (ø)`
tests/cpp/data/test_simple_dmatrix.cc	`97.56% <0%> (-0.17%)`	`0% <0%> (ø)`
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1698fe6...2164176. Read the comment docs.

hcho3 · 2018-11-12T22:23:23Z

@trivialfis FYI, I am going to update the dmlc-core submodule after dmlc/dmlc-core#481 is merged. Then we can set the stacktrace size at runtime.

trivialfis · 2018-11-13T01:21:07Z

@hcho3 The code itself is ready now. It's just we are not sure whether it's worthy to merge this trunk of code. The wrappers around memory allocations are not trivial.

hcho3 · 2018-11-13T01:23:27Z

@trivialfis Okay, maybe we can come back to it after merging other re-factor PRs.

trivialfis · 2018-11-13T23:53:43Z

@RAMitchell @hcho3 Good news. The large memory usage from GPUPredictor::UpdatePredictionCache in previous profiling is mainly due to I put too much data in evals={}. I don't know why these datasets have caches, but for simple training without evals_set, the memory usage is quite modest.

trivialfis · 2018-12-26T04:44:07Z

Closing for now.

trivialfis mentioned this pull request Oct 16, 2018

Enable specifying stacktrace size at runtime. dmlc/dmlc-core#470

Merged

trivialfis force-pushed the profiling branch 2 times, most recently from e6c4758 to ebcdce0 Compare October 18, 2018 09:46

trivialfis force-pushed the profiling branch from ebcdce0 to bc39731 Compare October 25, 2018 01:53

trivialfis mentioned this pull request Oct 25, 2018

[MGPU] python-gpu/test_large_sizes.py failed. #3794

Closed

Add device memory usage profiling.

2164176

trivialfis force-pushed the profiling branch from bc39731 to 2164176 Compare November 5, 2018 03:24

hcho3 mentioned this pull request Nov 13, 2018

Update dmlc-core submodule #3891

Merged

trivialfis closed this Dec 26, 2018

lock bot locked as resolved and limited conversation to collaborators Mar 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add device memory usage profiling. #3795

Add device memory usage profiling. #3795

trivialfis commented Oct 15, 2018

hcho3 commented Oct 17, 2018

trivialfis commented Oct 17, 2018

hcho3 commented Oct 17, 2018

trivialfis commented Oct 17, 2018

hcho3 commented Oct 17, 2018

trivialfis commented Oct 17, 2018 •

edited

Loading

hcho3 commented Oct 17, 2018

trivialfis commented Oct 17, 2018 •

edited

Loading

trivialfis commented Oct 19, 2018 •

edited

Loading

trivialfis commented Oct 25, 2018

codecov-io commented Nov 5, 2018 •

edited

Loading

hcho3 commented Nov 12, 2018 •

edited

Loading

trivialfis commented Nov 13, 2018

hcho3 commented Nov 13, 2018

trivialfis commented Nov 13, 2018 •

edited

Loading

trivialfis commented Dec 26, 2018

Add device memory usage profiling. #3795

Add device memory usage profiling. #3795

Conversation

trivialfis commented Oct 15, 2018

hcho3 commented Oct 17, 2018

trivialfis commented Oct 17, 2018

hcho3 commented Oct 17, 2018

trivialfis commented Oct 17, 2018

hcho3 commented Oct 17, 2018

trivialfis commented Oct 17, 2018 • edited Loading

hcho3 commented Oct 17, 2018

trivialfis commented Oct 17, 2018 • edited Loading

trivialfis commented Oct 19, 2018 • edited Loading

trivialfis commented Oct 25, 2018

codecov-io commented Nov 5, 2018 • edited Loading

Codecov Report

hcho3 commented Nov 12, 2018 • edited Loading

trivialfis commented Nov 13, 2018

hcho3 commented Nov 13, 2018

trivialfis commented Nov 13, 2018 • edited Loading

trivialfis commented Dec 26, 2018

trivialfis commented Oct 17, 2018 •

edited

Loading

trivialfis commented Oct 17, 2018 •

edited

Loading

trivialfis commented Oct 19, 2018 •

edited

Loading

codecov-io commented Nov 5, 2018 •

edited

Loading

hcho3 commented Nov 12, 2018 •

edited

Loading

trivialfis commented Nov 13, 2018 •

edited

Loading