Move thread local entry into Learner. #5396

trivialfis · 2020-03-06T17:50:39Z

Extracted from #5389 .

This is an attempt to workaround CUDA context issue in static variable, where
the CUDA context can be released before device vector.
Fix training with GPU data on multi-threaded environment. Calling HostVector in Transform causes race condition.
Add PredictionEntry to thread local entry.

This eliminates one copy of prediction vector.

Don't define CUDA C API in a namespace.

Extracted from dmlc#5389 . This is an attempt to workaround CUDA context issue in static variable, where the CUDA context can be released before device vector. * Add PredictionEntry to thread local entry. This eliminates one copy of prediction vector. * Don't define CUDA C API in a namespace.

This reverts commit 398b40c.

mli · 2020-03-06T21:29:17Z

Codecov Report

Merging #5396 into master will not change coverage by %.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #5396   +/-   ##
=======================================
  Coverage   84.07%   84.07%           
=======================================
  Files          11       11           
  Lines        2411     2411           
=======================================
  Hits         2027     2027           
  Misses        384      384

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d06878...f23b9c2. Read the comment docs.

RAMitchell · 2020-03-07T02:27:11Z

include/xgboost/learner.h

@@ -167,6 +184,8 @@ class Learner : public Model, public Configurable, public rabit::Serializable {
  virtual std::vector<std::string> DumpModel(const FeatureMap& fmap,
                                             bool with_stats,
                                             std::string format) const = 0;
+
+  virtual XGBAPIThreadLocalEntry& GetThreadLocal() const = 0;


We could make Learner a concrete class. I don't see us subclassing different learners any time soon.

Actually I don't like Learner, as linear and tree are so different.

RAMitchell · 2020-03-07T02:29:48Z

src/common/transform.h

@@ -105,6 +105,17 @@ class Transform {
      return Span<T const> {_vec->ConstHostPointer(),
            static_cast<typename Span<T>::index_type>(_vec->Size())};
    }
+    // Recursive sync host


What made you do this? Just curious.

Train hist with cupy data. I modified a test in this PR.

trivialfis added 4 commits March 6, 2020 14:34

Find.

6ea0e1d

Remove temp preds.

868f9c1

Debug build.

419d318

trivialfis closed this Mar 6, 2020

trivialfis reopened this Mar 6, 2020

trivialfis added 7 commits March 7, 2020 02:20

Remove passing tests.

398b40c

Keep trying.

b44ca82

Long night.

20ed72d

Try dmlc thread local.

327cac7

Remove local function.

b18b916

Revert "Remove passing tests."

cc2a2cb

This reverts commit 398b40c.

Revert debug test.

f23b9c2

trivialfis changed the title ~~Global return buffer~~ Move thread local entry into Learner. Mar 6, 2020

trivialfis requested a review from RAMitchell March 6, 2020 20:52

RAMitchell approved these changes Mar 7, 2020

View reviewed changes

trivialfis merged commit 0dd97c2 into dmlc:master Mar 7, 2020

trivialfis deleted the global-return-buffer branch March 7, 2020 07:38

trivialfis mentioned this pull request Apr 21, 2020

[Roadmap] 1.1.0 Roadmap #5337

Closed

12 tasks

lock bot locked as resolved and limited conversation to collaborators Jun 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move thread local entry into Learner. #5396

Move thread local entry into Learner. #5396

trivialfis commented Mar 6, 2020 •

edited

Loading

mli commented Mar 6, 2020

RAMitchell Mar 7, 2020

trivialfis Mar 7, 2020

RAMitchell Mar 7, 2020

trivialfis Mar 7, 2020

Move thread local entry into Learner. #5396

Move thread local entry into Learner. #5396

Conversation

trivialfis commented Mar 6, 2020 • edited Loading

mli commented Mar 6, 2020

Codecov Report

RAMitchell Mar 7, 2020

Choose a reason for hiding this comment

trivialfis Mar 7, 2020

Choose a reason for hiding this comment

RAMitchell Mar 7, 2020

Choose a reason for hiding this comment

trivialfis Mar 7, 2020

Choose a reason for hiding this comment

trivialfis commented Mar 6, 2020 •

edited

Loading