-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add device memory usage profiling. #3795
Conversation
@trivialfis I'm curious, what is the benefit of building a custom monitor as supposed to using |
@hcho3 It's really strange that I am able to find all the really detailed information by using nvprof , but memory usage. |
@trivialfis So |
@hcho3 As far as I know, it doesn't. |
@trivialfis That's interesting. I now see why you are working on this pull request. |
@hcho3 Hi, do you have any insight on how to force the linking in make file? Currently the R test failed with:
Actually, this added dmlc tag was my attemp to solve the problem. Previously the error is some member functions' definition gone missing. One way to solve it is inlining everything, but that will bring a huge burden for compilation, since the printing messages of An odd thing I notice is that there are two "_" prefixed to the missing symbol, while there should be only one of it. If it's something in the makefile I'm missing please let me know. Otherwise I have to inline all these functions before finishing this PR. Thanks! |
@trivialfis You should add your source file to |
@hcho3 Thanks! You just saved my day. |
e6c4758
to
ebcdce0
Compare
@RAMitchell , @hcho3
And the biggest memory consumer seems to be the single GPU predictor, which also causes extremely imbalanced memory usage across all GPUs. Here is the obtained trace for related memory allocation:
We should try it again after #3738 is merged. |
New profiling result with multigpu predictor in place: |
Codecov Report
@@ Coverage Diff @@
## master #3795 +/- ##
===========================================
- Coverage 51.95% 51.76% -0.2%
+ Complexity 203 196 -7
===========================================
Files 182 182
Lines 14500 14479 -21
Branches 495 489 -6
===========================================
- Hits 7534 7495 -39
- Misses 6728 6750 +22
+ Partials 238 234 -4
Continue to review full report at Codecov.
|
@trivialfis FYI, I am going to update the dmlc-core submodule after dmlc/dmlc-core#481 is merged. Then we can set the stacktrace size at runtime. |
@hcho3 The code itself is ready now. It's just we are not sure whether it's worthy to merge this trunk of code. The wrappers around memory allocations are not trivial. |
@trivialfis Okay, maybe we can come back to it after merging other re-factor PRs. |
@RAMitchell @hcho3 Good news. The large memory usage from |
Closing for now. |
This is another small step toward optimizing memory usage of gpu_hist method. There might be a better way to handle the backtrace information instead of printing everything. Suggestions are welcomed.