Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Competitions feature #121

Merged
merged 48 commits into from
Aug 19, 2024
Merged

Competitions feature #121

merged 48 commits into from
Aug 19, 2024

Conversation

kirahowe
Copy link
Contributor

@kirahowe kirahowe commented Jun 28, 2024

Changelogs

This branch contains all of the changes to Polaris client required to support competitions in the hub, including:

  • uploading new competition datasets and benchmarks to the hub
  • retrieving competition datasets and benchmarks
  • evaluating competition benchmarks

Some key notes about the structure of the whole system (i.e. including Polaris hub and supporting services):

  • Competition datasets and competition benchmarks are new entities in the system. They behave very similarly to current datasets and benchmarks except for how results are evaluated.
  • Evaluation logic is shared with regular benchmarks.
  • Preparing competitions (splitting the target labels into a separate dataset) is done in a separate, standalone service.
  • Evaluating competition benchmarks is done in a separate, standalone service.

Corresponding changes to the hub and the implementation of these new external services will be discussed in separate PRs in the relevant repos.


Checklist:

  • Was this PR discussed in an issue? It is recommended to first discuss a new feature into a GitHub issue before opening a PR.
  • Add tests to cover the fixed bug(s) or the newly introduced feature(s) (if appropriate).
  • Update the API documentation if a new function is added, or an existing one is deleted.
  • Write concise and explanatory changelogs above.
  • If possible, assign one of the following labels to the PR: feature, fix or test (or ask a maintainer to do it for you).

Discussion

The original design proposal is discussed in this doc, with further details in this initial project proposal doc. This implementation closely follows the agreed-upon design in the first linked doc.

Briefly, this feature allows users to launch a competition on Polaris Hub, which are essentially regular benchmarks with hidden test labels. Predictions and evaluation results associated with a particular competition can be uploaded to the hub for a given period of time, after which the competition benchmark labels are made available and the dataset is converted to a "regular" dataset on Polaris Hub. A typical user workflow would involve uploading a dataset to polaris hub, launching the competition, and then collecting results.

Competition evaluation logic is shared with regular benchmarks, and there has been some refactoring to facilitate re-using this code. Note: Currently competition benchmarks accept only predictions, not probabilities. Support for probabilities was adding part way through the development of the competitions feature and will need to be implemented in the evaluation service before competitions can support probabilities.

Other large and potentially interesting changes to discuss are highlighted below in comments in the relevant context.

Deployment

We will need to coordinate deployment of these changes to make sure the the corresponding changes to the hub and other services are available before we release a version of the client containing these changes. No existing functionality should break, but the newly available competition-related APIs will not work until the updated hub and other services are live.

Andrewq11 and others added 20 commits April 11, 2024 14:40
* call hub evaluate endpoint from client evaluate_competitions method

* add super basic test for evaluating competitions

* be more specific in evaluate_benchmark signature

* Update polaris/hub/client.py

Co-authored-by: Andrew Quirke <75542075+Andrewq11@users.noreply.github.com>

* start refactoring object dependencies out of evaluation logic

* refactor test subset object out of evaluation logic

* clean up as much as possible for now

* updating date serializer

* call hub evaluate endpoint from client evaluate_competitions method

* Update polaris/competition/_competition.py

Co-authored-by: Andrew Quirke <75542075+Andrewq11@users.noreply.github.com>

* updating date serializer

* call hub evaluate endpoint from client evaluate_competitions method

* add super basic test for evaluating competitions

* comp wip

* updating date serializer

* call hub evaluate endpoint from client evaluate_competitions method

* fix bad merge resolution

* only send competition artifact ID to hub

---------

Co-authored-by: Andrew Quirke <75542075+Andrewq11@users.noreply.github.com>
Co-authored-by: Andrew Quirke <andrewquirke99@gmail.com>
* use evaluation logic directly in hub, no need for wrapper

* include evaluate_benchmark in package

* remove unnecessary imports

* read incoming scores sent as json
* integrating results for comps

* Update polaris/hub/client.py

Co-authored-by: Cas Wognum <caswognum@outlook.com>

* addressing comments & adding CompetitionResults class

* test competition evalution works for multi-column dataframes

* add single column test to competition evaluation

* fix multitask-single-test-set cases

* fix bug with multi-test-set benchmarks

* adding functions to serialize & deserialize pred objs for external eval

* updating return for evaluate_competition method in client

* updating evaluate_competition method to pass additional result info to hub

---------

Co-authored-by: Cas Wognum <caswognum@outlook.com>
Co-authored-by: Kira McLean <email@kiramclean.com>
* integrating results for comps

* Update polaris/hub/client.py

Co-authored-by: Cas Wognum <caswognum@outlook.com>

* addressing comments & adding CompetitionResults class

* test competition evalution works for multi-column dataframes

* add single column test to competition evaluation

* fix multitask-single-test-set cases

* fix bug with multi-test-set benchmarks

* adding functions to serialize & deserialize pred objs for external eval

* updating return for evaluate_competition method in client

* updating evaluate_competition method to pass additional result info to hub

* refuse early to upload a competition with a zarr-based dataset

* removing merge conflicts

---------

Co-authored-by: Andrew Quirke <andrewquirke99@gmail.com>
Co-authored-by: Andrew Quirke <75542075+Andrewq11@users.noreply.github.com>
Co-authored-by: Cas Wognum <caswognum@outlook.com>
@kirahowe kirahowe added the feature Annotates any PR that adds new features; Used in the release process label Jun 28, 2024
@kirahowe kirahowe requested a review from cwognum as a code owner June 28, 2024 16:33
Copy link
Contributor

@jstlaurent jstlaurent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let Cas comment on the domain-specific changes, he's better placed to evaluate. It looks pretty from my part. I did have some questions and comments. I have one concern in particular where the competition upload user experience might not be great.

polaris/benchmark/_base.py Outdated Show resolved Hide resolved
polaris/hub/client.py Outdated Show resolved Hide resolved
polaris/hub/client.py Outdated Show resolved Hide resolved
polaris/hub/client.py Outdated Show resolved Hide resolved
polaris/hub/client.py Outdated Show resolved Hide resolved
polaris/hub/client.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@cwognum cwognum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Monumental work and a long time coming! Thank you @Andrewq11 and @kiramclean ! 🚀

polaris/benchmark/_base.py Outdated Show resolved Hide resolved
polaris/competition/_competition.py Outdated Show resolved Hide resolved
polaris/benchmark/_base.py Outdated Show resolved Hide resolved
polaris/competition/_competition.py Outdated Show resolved Hide resolved
polaris/competition/_competition.py Outdated Show resolved Hide resolved
polaris/hub/client.py Outdated Show resolved Hide resolved
polaris/hub/client.py Outdated Show resolved Hide resolved
polaris/hub/client.py Outdated Show resolved Hide resolved
polaris/hub/client.py Outdated Show resolved Hide resolved
polaris/hub/client.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@cwognum cwognum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks once more @Andrewq11 and @kiramclean ! 🚀

I'm just left with some minor comments and question. The main difficulty may come from the checksum. Since I made those changes to the checksum, I'm of course happy to help if so! 😄

I did break my head over the predictions type in reviewing this. This is becoming increasingly complex and having a single interface where the predictions are standardized would give me peace of mind. In another PR though!

docs/tutorials/competition.participate.ipynb Outdated Show resolved Hide resolved
polaris/competition/_competition.py Outdated Show resolved Hide resolved
polaris/competition/_competition.py Show resolved Hide resolved
polaris/dataset/_competition_dataset.py Outdated Show resolved Hide resolved
polaris/evaluate/_results.py Outdated Show resolved Hide resolved
polaris/benchmark/_base.py Outdated Show resolved Hide resolved
polaris/evaluate/utils.py Outdated Show resolved Hide resolved
polaris/evaluate/utils.py Outdated Show resolved Hide resolved
tests/test_integration.py Show resolved Hide resolved
polaris/evaluate/utils.py Show resolved Hide resolved
Copy link
Contributor

@jstlaurent jstlaurent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good stuff, very nice! A few things here and there, nothing major. I do think with the multiplication of artifact types, we should start to centralize their relevant attributes in one place, rather than having conditionals all over the code.

polaris/benchmark/_base.py Show resolved Hide resolved
polaris/benchmark/_base.py Show resolved Hide resolved
polaris/benchmark/_base.py Show resolved Hide resolved
polaris/benchmark/_base.py Show resolved Hide resolved
polaris/competition/_competition.py Show resolved Hide resolved
polaris/evaluate/_results.py Show resolved Hide resolved
polaris/evaluate/_results.py Show resolved Hide resolved
polaris/evaluate/utils.py Show resolved Hide resolved
polaris/hub/client.py Outdated Show resolved Hide resolved
polaris/hub/client.py Show resolved Hide resolved
@Andrewq11 Andrewq11 merged commit 2c785a5 into main Aug 19, 2024
4 checks passed
@cwognum cwognum deleted the feat/competitions branch September 11, 2024 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Annotates any PR that adds new features; Used in the release process
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants