Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add statistics to the plot from a dataset, its labels and predictions #6614

Open
wil70 opened this issue Aug 16, 2024 · 1 comment

Comments

@wil70
Copy link

wil70 commented Aug 16, 2024

Summary

Add statistics to the plot from a dataset, its labels and predictions

Motivation

lgb.create_tree_digraph and plot_tree are really great to visualize the tree, but without the statistics attached this less useful.

Description

Be able to plot a tree with the result from a csv or bin dataset (ie. show_info=["statistics"])
I would pass a huge dataset bin file (with the right labels) and either LGBM does a prediction or I can pass a prediction file for those and then it would plot and add

  • the number of instances that hit this leaf (corretly and incorrectly classify)
  • the number of instances that were correctly classify that hit this leaf
  • and the percentage "correctly classify / (corretly and incorrectly classify)"

I guess there is no easy way to combines those trees to represents the classes and the iterations....

Later (lower priority), being able to generate this from CLI would be nice too.

Thanks
--w

@wil70 wil70 changed the title Add statistics to the plot from a dataset, its labels and predictions [Feature] Add statistics to the plot from a dataset, its labels and predictions Aug 16, 2024
@jameslamb
Copy link
Collaborator

Thanks for using LightGBM, and for your interest in improving model inspection!

I'm not convinced that this is so generically useful that it should be in lightgbm directly, with all the documentation, testing, and maintenance work that entails. It sounds like this would be a better fit for your own custom code, and lightgbm (the Python package for LightGBM) provides all the core APIs to get the input data for it:

  • .dump_model() or .trees_to_dataframe() for the tree structure
  • predict(pred_leaf=True) to get leaf indices (which could be aggregated by leaf index to get those counts)
  • predict() to get the actual predictions (which could be used to compute the "% correctly classified" you refer to)

If you want to see something like this in lightgbm, the best way to make that happen is to implement it yourself and try to contribute it. If you do that, please be ready to work with us on testing, documentation, etc. There is very limited maintainer availability in this project (as you may have noticed), and situations like #5488 take some of that maintainer availability away from other parts of the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants