generate LearnerRanker summary reports as data frames, not text #95

j-ittner · 2020-10-09T20:53:17Z

closes #50

jason-bentley

Changes look great! I checked expected results for the following scenarios using the getting started example as the basis:

Single learner single hyperparameter
Single learner multiple hyperparameters
Two learners with distinct multiple hyperparameters
Two learners with distinct multiple hyperparameters and a common one (n_estimators)

DF outputs were as expected. Couple of follow-up questions:

Would it make sense to add the performance metric (i.e., accuracy or AUC etc) as a column to the output?
Would it make sense to also add the number of folds or something about the CV scheme so I know the mean and SD are based off say 10 values or 25?

Once this PR is merged I will update all notebooks accordingly in a separate PR.

j-ittner · 2020-10-10T07:02:19Z

Good ideas!

On 1. I suggest we include the name of the metric in the relevant column headings, e.g. roc_auc_mean

For 2 I am not so sure, this would create a column with the same value in every row, and it is an input to the ranker not a result. However we could think about the meaning of the number of splits and add a derived metric. E.G., standard error estimate in % of the mean score (easy) and of the standard deviation (tricky). That would help folks determine the number of splits.

Thoughts?

jason-bentley · 2020-10-10T09:44:29Z

On proposed solution for 1. completely agree!

On 2, I think if we do try to add this information we need to be direct and clear so the user doesn't need to further interpret/calculate from. Perhaps we either (1) don't add anything additional or (2) create a column with the CV object string (if possible) - the learner ranker will always have something passed to the CV argument so just take that argument as a string and drop into a column. What do you think?

j-ittner · 2020-10-10T09:55:27Z

My only worry with adding the CV object to a column is that it will take up a lot of space for a constant that is repeated in every row. What use did you have in mind (as opposed to getting the CV object directly from the ranker object not the summary report table) On 10 Oct 2020, at 11:44, Jason <notifications@github.com> wrote: EXTERNAL email from: notifications@github.com On proposed solution for 1. completely agree! On 2, I think if we do try to add this information we need to be direct and clear so the user doesn't need to further interpret/calculate from. Perhaps we either (1) don't add anything additional or (2) create a column with the CV object string (if possible) - the learner ranker will always have something passed to the CV argument so just take that argument as a string and drop into a column. What do you think? — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub<#95 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHGJ6FE3QNFVMQ75AVUWA7DSKAUIRANCNFSM4SKQZYYQ>.

…

______________________________________________________________________________ The Boston Consulting Group GmbH Sitz der Gesellschaft München Amtsgericht München HRB 132429 Geschäftsführer: https://www.bcg.com/de-de/about/geschaeftsfuehrer.aspx This e-mail message may contain confidential and/or privileged information. If you are not an addressee or otherwise authorized to receive this message, you should not use, copy, disclose or take any action based on this e-mail or any information contained in the message. If you have received this material in error, please advise the sender immediately by reply e-mail and delete this message. We may share your contact details with other BCG entities and our third party service providers. Please see BCG privacy policy https://www.bcg.com/about/privacy-policy.aspx for further information. Thank you.

jason-bentley · 2020-10-10T10:07:41Z

For 2 I was thinking more of the use case where someone might export the table and then someone else looks at it without the context of the code. However, I agree that it could just bloat the table. In my example the user them selves could choose to add this type of information to the table themselves if needed, so perhaps best not to add anything explicitly for 2.

j-ittner · 2020-10-10T13:09:39Z

Agree. I have pushed updates, obviously your approval can wait until Monday!

jason-bentley

Changes look great, and get the expected output when scoring metric is specified:

However, when this is not specified (i.e., left as default) it does not (see image below) - can we cover this fringe case as well. Thanks!

j-ittner · 2020-10-11T14:01:36Z

Hmmm... is it safe to assume that the default scorer for regression is always r2 and the default scorer for classification is always accuracy? See sklearn docs for RegressorMixin and ClassifierMixin ..

jason-bentley · 2020-10-11T15:58:16Z

Hmmm, good point. In that case let's keep things simple and maybe note in the docstring that the naming of the columns with the model performance metric is only when scoring='' is specified. Good practice is to always have scoring='' anyway.

j-ittner · 2020-10-11T18:43:50Z

I had a closer look at the sklearn docs and code. There is a very clear default behaviour so I will use that for naming.

The regressive score method uses r2_score
The classifier score method uses accuracy_score

Let me make this change to the code.

Meanwhile could you check if you get meaningful names when you pass a scoring function (as a callable) to the ranker, instead of a string?

j-ittner · 2020-10-11T22:38:05Z

Ok I made the changes required to identify the default scoring function for regressors and classifiers - could you please have a look? Thanks!

jason-bentley · 2020-10-12T08:36:09Z

Can confirm I get r2 and accuracy as the metric names in the DF output when not specifying a value for scoring for regression and classification, respectively.

jason-bentley

Looks great! Thanks so much!

generate LearnerRanker summary reports as data frames, not text

b2ccdb9

j-ittner added the API New feature or request label Oct 9, 2020

j-ittner requested a review from jason-bentley October 9, 2020 20:53

j-ittner self-assigned this Oct 9, 2020

jason-bentley reviewed Oct 9, 2020

View reviewed changes

j-ittner added 5 commits October 10, 2020 14:37

add property LearnerRanker.scoring_name

782352d

add attribute LearnerEvaluation.scoring_name

a6c945e

calculate std of learner scores using ddof=1

7c494a2

name score columns in the summary report after the scoring metric

d8ff4a7

add standard error of the mean metric to LearnerRanker summary report

a546196

jason-bentley suggested changes Oct 11, 2020

View reviewed changes

j-ittner added 2 commits October 12, 2020 00:35

prevent mixed regressor/classifier grids in LearnerRanker initializer

7a66013

use regressor/classifier default scoring name in ranker summary report

30a65db

j-ittner requested a review from jason-bentley October 11, 2020 22:38

use "score" as the scoring name if the scoring callable has no __name__

ec031eb

jason-bentley approved these changes Oct 12, 2020

View reviewed changes

j-ittner merged commit 05ce3d3 into develop Oct 12, 2020

j-ittner deleted the feature/ranker_summary_report_as_frame branch October 20, 2020 15:57

j-ittner added this to the 1.0.1 milestone Mar 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generate LearnerRanker summary reports as data frames, not text #95

generate LearnerRanker summary reports as data frames, not text #95

j-ittner commented Oct 9, 2020

jason-bentley left a comment •

edited

Loading

j-ittner commented Oct 10, 2020

jason-bentley commented Oct 10, 2020

j-ittner commented Oct 10, 2020 via email

jason-bentley commented Oct 10, 2020

j-ittner commented Oct 10, 2020

jason-bentley left a comment

j-ittner commented Oct 11, 2020

jason-bentley commented Oct 11, 2020

j-ittner commented Oct 11, 2020

j-ittner commented Oct 11, 2020

jason-bentley commented Oct 12, 2020

jason-bentley left a comment

generate LearnerRanker summary reports as data frames, not text #95

generate LearnerRanker summary reports as data frames, not text #95

Conversation

j-ittner commented Oct 9, 2020

jason-bentley left a comment • edited Loading

Choose a reason for hiding this comment

j-ittner commented Oct 10, 2020

jason-bentley commented Oct 10, 2020

j-ittner commented Oct 10, 2020 via email

jason-bentley commented Oct 10, 2020

j-ittner commented Oct 10, 2020

jason-bentley left a comment

Choose a reason for hiding this comment

j-ittner commented Oct 11, 2020

jason-bentley commented Oct 11, 2020

j-ittner commented Oct 11, 2020

j-ittner commented Oct 11, 2020

jason-bentley commented Oct 12, 2020

jason-bentley left a comment

Choose a reason for hiding this comment

jason-bentley left a comment •

edited

Loading