-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generate LearnerRanker summary reports as data frames, not text #95
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look great! I checked expected results for the following scenarios using the getting started example as the basis:
- Single learner single hyperparameter
- Single learner multiple hyperparameters
- Two learners with distinct multiple hyperparameters
- Two learners with distinct multiple hyperparameters and a common one (n_estimators)
DF outputs were as expected. Couple of follow-up questions:
- Would it make sense to add the performance metric (i.e., accuracy or AUC etc) as a column to the output?
- Would it make sense to also add the number of folds or something about the CV scheme so I know the mean and SD are based off say 10 values or 25?
Once this PR is merged I will update all notebooks accordingly in a separate PR.
Good ideas! On 1. I suggest we include the name of the metric in the relevant column headings, e.g. For 2 I am not so sure, this would create a column with the same value in every row, and it is an input to the ranker not a result. However we could think about the meaning of the number of splits and add a derived metric. E.G., standard error estimate in % of the mean score (easy) and of the standard deviation (tricky). That would help folks determine the number of splits. Thoughts? |
On proposed solution for 1. completely agree! On 2, I think if we do try to add this information we need to be direct and clear so the user doesn't need to further interpret/calculate from. Perhaps we either (1) don't add anything additional or (2) create a column with the CV object string (if possible) - the learner ranker will always have something passed to the CV argument so just take that argument as a string and drop into a column. What do you think? |
My only worry with adding the CV object to a column is that it will take up a lot of space for a constant that is repeated in every row.
What use did you have in mind (as opposed to getting the CV object directly from the ranker object not the summary report table)
On 10 Oct 2020, at 11:44, Jason <notifications@github.com> wrote:
EXTERNAL email from: notifications@github.com
On proposed solution for 1. completely agree!
On 2, I think if we do try to add this information we need to be direct and clear so the user doesn't need to further interpret/calculate from. Perhaps we either (1) don't add anything additional or (2) create a column with the CV object string (if possible) - the learner ranker will always have something passed to the CV argument so just take that argument as a string and drop into a column. What do you think?
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub<#95 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHGJ6FE3QNFVMQ75AVUWA7DSKAUIRANCNFSM4SKQZYYQ>.
…______________________________________________________________________________
The Boston Consulting Group GmbH
Sitz der Gesellschaft München
Amtsgericht München HRB 132429
Geschäftsführer: https://www.bcg.com/de-de/about/geschaeftsfuehrer.aspx
This e-mail message may contain confidential and/or privileged information.
If you are not an addressee or otherwise authorized to receive this message,
you should not use, copy, disclose or take any action based on this e-mail or
any information contained in the message. If you have received this material
in error, please advise the sender immediately by reply e-mail and delete this
message.
We may share your contact details with other BCG entities and our third party service providers. Please see BCG privacy policy https://www.bcg.com/about/privacy-policy.aspx for further information.
Thank you.
|
For 2 I was thinking more of the use case where someone might export the table and then someone else looks at it without the context of the code. However, I agree that it could just bloat the table. In my example the user them selves could choose to add this type of information to the table themselves if needed, so perhaps best not to add anything explicitly for 2. |
Agree. I have pushed updates, obviously your approval can wait until Monday! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... is it safe to assume that the default scorer for regression is always r2 and the default scorer for classification is always accuracy? See sklearn docs for RegressorMixin and ClassifierMixin .. |
Hmmm, good point. In that case let's keep things simple and maybe note in the docstring that the naming of the columns with the model performance metric is only when scoring='' is specified. Good practice is to always have scoring='' anyway. |
I had a closer look at the sklearn docs and code. There is a very clear default behaviour so I will use that for naming. The regressive score method uses Let me make this change to the code. Meanwhile could you check if you get meaningful names when you pass a scoring function (as a callable) to the ranker, instead of a string? |
Ok I made the changes required to identify the default scoring function for regressors and classifiers - could you please have a look? Thanks! |
Can confirm I get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks so much!
closes #50