-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Competitions feature #121
Competitions feature #121
Conversation
* call hub evaluate endpoint from client evaluate_competitions method * add super basic test for evaluating competitions * be more specific in evaluate_benchmark signature * Update polaris/hub/client.py Co-authored-by: Andrew Quirke <75542075+Andrewq11@users.noreply.github.com> * start refactoring object dependencies out of evaluation logic * refactor test subset object out of evaluation logic * clean up as much as possible for now * updating date serializer * call hub evaluate endpoint from client evaluate_competitions method * Update polaris/competition/_competition.py Co-authored-by: Andrew Quirke <75542075+Andrewq11@users.noreply.github.com> * updating date serializer * call hub evaluate endpoint from client evaluate_competitions method * add super basic test for evaluating competitions * comp wip * updating date serializer * call hub evaluate endpoint from client evaluate_competitions method * fix bad merge resolution * only send competition artifact ID to hub --------- Co-authored-by: Andrew Quirke <75542075+Andrewq11@users.noreply.github.com> Co-authored-by: Andrew Quirke <andrewquirke99@gmail.com>
* use evaluation logic directly in hub, no need for wrapper * include evaluate_benchmark in package * remove unnecessary imports * read incoming scores sent as json
* integrating results for comps * Update polaris/hub/client.py Co-authored-by: Cas Wognum <caswognum@outlook.com> * addressing comments & adding CompetitionResults class * test competition evalution works for multi-column dataframes * add single column test to competition evaluation * fix multitask-single-test-set cases * fix bug with multi-test-set benchmarks * adding functions to serialize & deserialize pred objs for external eval * updating return for evaluate_competition method in client * updating evaluate_competition method to pass additional result info to hub --------- Co-authored-by: Cas Wognum <caswognum@outlook.com> Co-authored-by: Kira McLean <email@kiramclean.com>
* integrating results for comps * Update polaris/hub/client.py Co-authored-by: Cas Wognum <caswognum@outlook.com> * addressing comments & adding CompetitionResults class * test competition evalution works for multi-column dataframes * add single column test to competition evaluation * fix multitask-single-test-set cases * fix bug with multi-test-set benchmarks * adding functions to serialize & deserialize pred objs for external eval * updating return for evaluate_competition method in client * updating evaluate_competition method to pass additional result info to hub * refuse early to upload a competition with a zarr-based dataset * removing merge conflicts --------- Co-authored-by: Andrew Quirke <andrewquirke99@gmail.com> Co-authored-by: Andrew Quirke <75542075+Andrewq11@users.noreply.github.com> Co-authored-by: Cas Wognum <caswognum@outlook.com>
…into feat/competitions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let Cas comment on the domain-specific changes, he's better placed to evaluate. It looks pretty from my part. I did have some questions and comments. I have one concern in particular where the competition upload user experience might not be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Monumental work and a long time coming! Thank you @Andrewq11 and @kiramclean ! 🚀
70d2ec7
to
250c612
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks once more @Andrewq11 and @kiramclean ! 🚀
I'm just left with some minor comments and question. The main difficulty may come from the checksum. Since I made those changes to the checksum, I'm of course happy to help if so! 😄
I did break my head over the predictions type in reviewing this. This is becoming increasingly complex and having a single interface where the predictions are standardized would give me peace of mind. In another PR though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good stuff, very nice! A few things here and there, nothing major. I do think with the multiplication of artifact types, we should start to centralize their relevant attributes in one place, rather than having conditionals all over the code.
Changelogs
This branch contains all of the changes to Polaris client required to support competitions in the hub, including:
Some key notes about the structure of the whole system (i.e. including Polaris hub and supporting services):
Corresponding changes to the hub and the implementation of these new external services will be discussed in separate PRs in the relevant repos.
Checklist:
feature
,fix
ortest
(or ask a maintainer to do it for you).Discussion
The original design proposal is discussed in this doc, with further details in this initial project proposal doc. This implementation closely follows the agreed-upon design in the first linked doc.
Briefly, this feature allows users to launch a competition on Polaris Hub, which are essentially regular benchmarks with hidden test labels. Predictions and evaluation results associated with a particular competition can be uploaded to the hub for a given period of time, after which the competition benchmark labels are made available and the dataset is converted to a "regular" dataset on Polaris Hub. A typical user workflow would involve uploading a dataset to polaris hub, launching the competition, and then collecting results.
Competition evaluation logic is shared with regular benchmarks, and there has been some refactoring to facilitate re-using this code. Note: Currently competition benchmarks accept only predictions, not probabilities. Support for probabilities was adding part way through the development of the competitions feature and will need to be implemented in the evaluation service before competitions can support probabilities.
Other large and potentially interesting changes to discuss are highlighted below in comments in the relevant context.
Deployment
We will need to coordinate deployment of these changes to make sure the the corresponding changes to the hub and other services are available before we release a version of the client containing these changes. No existing functionality should break, but the newly available competition-related APIs will not work until the updated hub and other services are live.