Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support categorical split in tree model dump. #7036

Merged
merged 2 commits into from
Jun 18, 2021

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Jun 11, 2021

Support categorical tree node for text dump.

Related #6503 .
Close #6298 .

@trivialfis trivialfis mentioned this pull request Jun 11, 2021
67 tasks
@codecov-commenter
Copy link

codecov-commenter commented Jun 11, 2021

Codecov Report

Merging #7036 (b223b53) into master (b56614e) will increase coverage by 0.00%.
The diff coverage is 50.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #7036   +/-   ##
=======================================
  Coverage   81.86%   81.87%           
=======================================
  Files          13       13           
  Lines        3916     3917    +1     
=======================================
+ Hits         3206     3207    +1     
  Misses        710      710           
Impacted Files Coverage Δ
python-package/xgboost/plotting.py 73.78% <50.00%> (+0.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1faad82...b223b53. Read the comment docs.

src/predictor/cpu_predictor.cc Show resolved Hide resolved
Comment on lines 100 to 102
auto name = split_index < fmap_.Size()
? fmap_.Name(split_index)
: "feat" + std::to_string(split_index);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since name is only needed for the error message, can you put this inside if (!is_categorical) block? Inside, use LOG(FATAL) to throw error

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, how come we are not writing anything to result for case FeatureMap::kCategorical:, like all the other cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the possibility that users don't provide a feature map with categorical type. Categorical is a different split type, all the others are just variations of numerical splits. So no matter what the feature map says it has to go through a dedicated print procedure.

I can restrict it and ensure a correct feature map is provided.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove this logic, now users either not provide a feature map or it has to be consistent with the training data.

@trivialfis
Copy link
Member Author

trivialfis commented Jun 12, 2021

@hcho3 @RAMitchell could you please take a look at how the output should look. I'm currently printing the categories as sets, but we don't have an \in operator here, so I used =, which is inaccurate. A possible alternative is something like 0 = 0 || 1 || 3. Here is an example text dump, notice the 0=[0, 14, 32] part:

0:[0={0,14,32}] yes=2,no=1,missing=1,gain=11,cover=2
        1:[f1<1] yes=3,no=4,missing=4,gain=0,cover=0
                3:leaf=0,cover=0
                4:leaf=0,cover=0
        2:[0={0,14,32}] yes=6,no=5,missing=5,gain=11,cover=2
                5:leaf=2,cover=3
                6:leaf=3,cover=4

and here's the graphviz output:
figure

@trivialfis
Copy link
Member Author

Changed to using colon:

Text dump:

0:[0:{0,14,32}] yes=2,no=1,missing=1,gain=11,cover=2
        1:[f1<1] yes=3,no=4,missing=4,gain=0,cover=0
                3:leaf=0,cover=0
                4:leaf=0,cover=0
        2:[0:{0,14,32}] yes=6,no=5,missing=5,gain=11,cover=2
                5:leaf=2,cover=3
                6:leaf=3,cover=4

Graphviz:
figure

@trivialfis trivialfis requested a review from hcho3 June 13, 2021 07:01
Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the format looks fine to me.

@trivialfis
Copy link
Member Author

Hmm .. gpu coordinate descent is failing non-deterministically..

@trivialfis trivialfis merged commit 29f8fd6 into dmlc:master Jun 18, 2021
@trivialfis trivialfis deleted the cat-tree-dump branch June 18, 2021 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: TestGPUUpdaters::test_categorical
4 participants