Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-package] How to get the record count number of each leaves in a well trained model? #3419

Closed
USCYuandaDu opened this issue Jun 28, 2018 · 4 comments

Comments

@USCYuandaDu
Copy link

USCYuandaDu commented Jun 28, 2018

Hi all!

I want to know whether we could get the number of data in leaves nodes after training.
After training, we could get a model which contains a certain number of decision trees. And for each leaf node of each tree, I want to know how many data were split into that leaves nodes. If xgboost doesn't have this function, I want to know will the committer accept it if I contribute to the committee.

Thank you!
Bests,
Yuanda

@ywskycn
Copy link

ywskycn commented Jul 2, 2018

cc @tqchen @CodingCat

@hcho3
Copy link
Collaborator

hcho3 commented Jul 2, 2018

Currently, XGBoost model embeds Hessian statistics at leaf nodes but not data counts. Since the leaf data counts are already known at the time of training, we could potentially embed that information inside the XGBoost model. I think the leaf_vector_ field in the tree model struct is unused. We can use it to store leaf node counts

@USCYuandaDu
Copy link
Author

@hcho3 Sounds great! I'll have a look and give it a shot if this could work.

@tqchen tqchen closed this as completed Jul 4, 2018
@hcho3 hcho3 mentioned this issue Jul 4, 2018
32 tasks
@hcho3
Copy link
Collaborator

hcho3 commented Jul 4, 2018

Consolidating to #3439. A new issue should be opened if someone decides to actively work on implementing this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants