Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LeanerRanker summary output as Pandas DataFrame #50

Closed
jason-bentley opened this issue Sep 8, 2020 · 3 comments · Fixed by #95
Closed

LeanerRanker summary output as Pandas DataFrame #50

jason-bentley opened this issue Sep 8, 2020 · 3 comments · Fixed by #95
Assignees
Labels
API New feature or request

Comments

@jason-bentley
Copy link
Contributor

Is your feature request related to a problem? Please describe.
When looking at the summary output from the LearnerRanker() it would be great to have an alternative to the print(ranker.summary_report(5)) which prints the top 5 models for example.

An option to allow further summaries generated by the user could be to output a Pandas DataFrame. This would allow flexibility for subsequent uses, for example, outputting to csv's for reports or creating summary figures of performance. The ability to store a DF would also allow users to combine with similar DFs from future runs if updating models to see the changes, etc.

Describe the solution you'd like
One option could be to add an option to export the ranker.summary_report(5) to a Pandas DataFrame with something like rank_summary = ranker.summary_report(as_dataframe=True). Which I would then expect to produce something along the lines of the following as a Pandas DataFrame.

Rank Learner Ranking_score Mean_score SD_score Tuned_parameters N_folds Socring_metric
1 LGBMClassifierDF 0.656 0.680 0.0122 classifier__n_estimators=400 10 roc_auc
2 LGBMClassifierDF 0.655 0.677 0.0111 classifier__n_estimators=500 10 roc_auc
3 RandomForestClassifierDF 0.650 0.695 0.0224 classifier__n_estimators=200 10 roc_auc
4 RandomForestClassifierDF 0.647 0.696 0.0244 classifier__n_estimators=300 10 roc_auc
5 RandomForestClassifierDF 0.646 0.697 0.0255 classifier__n_estimators=400 10 roc_auc

Describe alternatives you've considered
Have not considered alternatives.

@jason-bentley jason-bentley added the API New feature or request label Sep 8, 2020
@jason-bentley jason-bentley self-assigned this Sep 8, 2020
@j-ittner
Copy link
Member

j-ittner commented Oct 9, 2020

We might even choose to ditch the text format for summary reports - they are hard to read anyway given the long lines. And you can always print a data frame to stdout.

@j-ittner
Copy link
Member

j-ittner commented Oct 9, 2020

We might even want to create one column per tuned parameter. They do differ across learner types, but in that case columns would simply be NaN

@jason-bentley
Copy link
Contributor Author

Agreed. Something that looks like the below would be reasonable output. Would we also want the ability to get it per fold?

  • Would we also want the ability to have either as a summary of per fold?
  • Would we also want to include any default hyperparameters as well? Could be useful information but would also greatly expand the table
Rank Learner Ranking_score Mean_score SD_score n_estimators max_depth N_folds Socring_metric
1 LGBMClassifierDF 0.656 0.680 0.0122 400 na 10 roc_auc
2 LGBMClassifierDF 0.655 0.677 0.0111 500 na 10 roc_auc
3 RandomForestClassifierDF 0.650 0.695 0.0224 200 4 10 roc_auc
4 RandomForestClassifierDF 0.647 0.696 0.0244 300 4 10 roc_auc
5 RandomForestClassifierDF 0.646 0.697 0.0255 400 4 10 roc_auc
6 RandomForestClassifierDF 0.644 0.695 0.0224 200 8 10 roc_auc
7 RandomForestClassifierDF 0.641 0.696 0.0244 300 8 10 roc_auc
8 RandomForestClassifierDF 0.638 0.697 0.0255 400 8 10 roc_auc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants