Competitions feature #121

kirahowe · 2024-06-28T16:33:04Z

Changelogs

This branch contains all of the changes to Polaris client required to support competitions in the hub, including:

uploading new competition datasets and benchmarks to the hub
retrieving competition datasets and benchmarks
evaluating competition benchmarks

Some key notes about the structure of the whole system (i.e. including Polaris hub and supporting services):

Competition datasets and competition benchmarks are new entities in the system. They behave very similarly to current datasets and benchmarks except for how results are evaluated.
Evaluation logic is shared with regular benchmarks.
Preparing competitions (splitting the target labels into a separate dataset) is done in a separate, standalone service.
Evaluating competition benchmarks is done in a separate, standalone service.

Corresponding changes to the hub and the implementation of these new external services will be discussed in separate PRs in the relevant repos.

Checklist:

Was this PR discussed in an issue? It is recommended to first discuss a new feature into a GitHub issue before opening a PR.
Add tests to cover the fixed bug(s) or the newly introduced feature(s) (if appropriate).
Update the API documentation if a new function is added, or an existing one is deleted.
Write concise and explanatory changelogs above.
If possible, assign one of the following labels to the PR: feature, fix or test (or ask a maintainer to do it for you).

Discussion

The original design proposal is discussed in this doc, with further details in this initial project proposal doc. This implementation closely follows the agreed-upon design in the first linked doc.

Briefly, this feature allows users to launch a competition on Polaris Hub, which are essentially regular benchmarks with hidden test labels. Predictions and evaluation results associated with a particular competition can be uploaded to the hub for a given period of time, after which the competition benchmark labels are made available and the dataset is converted to a "regular" dataset on Polaris Hub. A typical user workflow would involve uploading a dataset to polaris hub, launching the competition, and then collecting results.

Competition evaluation logic is shared with regular benchmarks, and there has been some refactoring to facilitate re-using this code. Note: Currently competition benchmarks accept only predictions, not probabilities. Support for probabilities was adding part way through the development of the competitions feature and will need to be implemented in the evaluation service before competitions can support probabilities.

Other large and potentially interesting changes to discuss are highlighted below in comments in the relevant context.

Deployment

We will need to coordinate deployment of these changes to make sure the the corresponding changes to the hub and other services are available before we release a version of the client containing these changes. No existing functionality should break, but the newly available competition-related APIs will not work until the updated hub and other services are live.

* call hub evaluate endpoint from client evaluate_competitions method * add super basic test for evaluating competitions * be more specific in evaluate_benchmark signature * Update polaris/hub/client.py Co-authored-by: Andrew Quirke <75542075+Andrewq11@users.noreply.github.com> * start refactoring object dependencies out of evaluation logic * refactor test subset object out of evaluation logic * clean up as much as possible for now * updating date serializer * call hub evaluate endpoint from client evaluate_competitions method * Update polaris/competition/_competition.py Co-authored-by: Andrew Quirke <75542075+Andrewq11@users.noreply.github.com> * updating date serializer * call hub evaluate endpoint from client evaluate_competitions method * add super basic test for evaluating competitions * comp wip * updating date serializer * call hub evaluate endpoint from client evaluate_competitions method * fix bad merge resolution * only send competition artifact ID to hub --------- Co-authored-by: Andrew Quirke <75542075+Andrewq11@users.noreply.github.com> Co-authored-by: Andrew Quirke <andrewquirke99@gmail.com>

* use evaluation logic directly in hub, no need for wrapper * include evaluate_benchmark in package * remove unnecessary imports * read incoming scores sent as json

* integrating results for comps * Update polaris/hub/client.py Co-authored-by: Cas Wognum <caswognum@outlook.com> * addressing comments & adding CompetitionResults class * test competition evalution works for multi-column dataframes * add single column test to competition evaluation * fix multitask-single-test-set cases * fix bug with multi-test-set benchmarks * adding functions to serialize & deserialize pred objs for external eval * updating return for evaluate_competition method in client * updating evaluate_competition method to pass additional result info to hub --------- Co-authored-by: Cas Wognum <caswognum@outlook.com> Co-authored-by: Kira McLean <email@kiramclean.com>

* integrating results for comps * Update polaris/hub/client.py Co-authored-by: Cas Wognum <caswognum@outlook.com> * addressing comments & adding CompetitionResults class * test competition evalution works for multi-column dataframes * add single column test to competition evaluation * fix multitask-single-test-set cases * fix bug with multi-test-set benchmarks * adding functions to serialize & deserialize pred objs for external eval * updating return for evaluate_competition method in client * updating evaluate_competition method to pass additional result info to hub * refuse early to upload a competition with a zarr-based dataset * removing merge conflicts --------- Co-authored-by: Andrew Quirke <andrewquirke99@gmail.com> Co-authored-by: Andrew Quirke <75542075+Andrewq11@users.noreply.github.com> Co-authored-by: Cas Wognum <caswognum@outlook.com>

…ue (#116)

…into feat/competitions

polaris/benchmark/_base.py

polaris/competition/_competition.py

polaris/dataset/_competition_dataset.py

polaris/evaluate/utils.py

jstlaurent

I'll let Cas comment on the domain-specific changes, he's better placed to evaluate. It looks pretty from my part. I did have some questions and comments. I have one concern in particular where the competition upload user experience might not be great.

polaris/benchmark/_base.py

polaris/hub/client.py

cwognum

Monumental work and a long time coming! Thank you @Andrewq11 and @kiramclean ! 🚀

polaris/benchmark/_base.py

polaris/competition/_competition.py

polaris/benchmark/_base.py

polaris/competition/_competition.py

polaris/hub/client.py

…model

… label struct

cwognum

Thanks once more @Andrewq11 and @kiramclean ! 🚀

I'm just left with some minor comments and question. The main difficulty may come from the checksum. Since I made those changes to the checksum, I'm of course happy to help if so! 😄

I did break my head over the predictions type in reviewing this. This is becoming increasingly complex and having a single interface where the predictions are standardized would give me peace of mind. In another PR though!

docs/tutorials/competition.participate.ipynb

polaris/competition/_competition.py

polaris/dataset/_competition_dataset.py

polaris/evaluate/_results.py

polaris/benchmark/_base.py

polaris/evaluate/utils.py

tests/test_integration.py

polaris/evaluate/utils.py

…class

jstlaurent

This is good stuff, very nice! A few things here and there, nothing major. I do think with the multiplication of artifact types, we should start to centralize their relevant attributes in one place, rather than having conditionals all over the code.

polaris/benchmark/_base.py

polaris/competition/_competition.py

polaris/evaluate/_results.py

polaris/evaluate/utils.py

polaris/hub/client.py

Andrewq11 and others added 20 commits April 11, 2024 14:40

competition wip

1700882

wip

1cdd330

wip

cb60c04

Merge branch 'main' into feat/competitions

252bae0

adding methods for interfacing w/ competitions

9e14e89

Continuing to integrate polaris client with the Hub for comps

3a55b27

comp wip

b09fd08

updating date serializer

26210b2

Use evaluation logic directly in hub, no need for wrapper (#109)

54c42e2

* use evaluation logic directly in hub, no need for wrapper * include evaluate_benchmark in package * remove unnecessary imports * read incoming scores sent as json

light formatting updates

9edd693

updating fallback version for dev build

818ff12

updates to enable fetching & interacting with comps

8c2daae

updating requirement for eval name

a4cfcbe

test that all rows of a competition test set will have at least a val…

66c1913

…ue (#116)

Merge branch 'main' into feat/competitions

83a77a5

update competition evaluation to support y_prob

36f04c0

Merge remote-tracking branch 'refs/remotes/origin/feat/competitions' …

173a8e3

…into feat/competitions

kirahowe added the feature Annotates any PR that adds new features; Used in the release process label Jun 28, 2024

kirahowe requested review from jstlaurent and Andrewq11 June 28, 2024 16:33

kirahowe requested a review from cwognum as a code owner June 28, 2024 16:33

kirahowe commented Jun 28, 2024

View reviewed changes

polaris/benchmark/_base.py Show resolved Hide resolved

kirahowe commented Jun 28, 2024

View reviewed changes

polaris/competition/_competition.py Show resolved Hide resolved

kirahowe commented Jun 28, 2024

View reviewed changes

polaris/dataset/_competition_dataset.py Outdated Show resolved Hide resolved

kirahowe commented Jun 28, 2024

View reviewed changes

polaris/evaluate/utils.py Show resolved Hide resolved

jstlaurent reviewed Jul 9, 2024

View reviewed changes

cwognum reviewed Jul 10, 2024

View reviewed changes

Andrewq11 added 6 commits August 1, 2024 10:49

updating name of ArtifactType to ArtifactSubtype

1169c15

updating comments & removing redundant class attributes

09f5e7d

moving split validator logic from comp spec to benchmark spec

c7c5fb7

removing redundant checks from CompetitionDataset class

7096a0f

creating pydantic model for comp predictions

fd009f2

split validator logic, redundant pydantic checks, comp pred pydantic …

250c612

…model

Andrewq11 force-pushed the feat/competitions branch from 70d2ec7 to 250c612 Compare August 2, 2024 15:09

Andrewq11 added 10 commits August 6, 2024 15:30

changes for comps wrap up

b774242

Adding CompetitionsPredictionsType

ef94203

adding conversion validator for comp prediction type

7378d1f

setting predictions validator as class method

d5179f3

Using self instead of cls for field validators

ea4cb22

removing model validation on fetch from hub

889f147

Creating HubOwner object in comp result eval method

5686411

Documentation & tutorials for competitions

363938b

Removing create comp method, fixing failing tests, updating benchmark…

ce7b4d5

… label struct

Updating docs for create comp & benchmark pred structure

6b3bfe3

cwognum mentioned this pull request Aug 13, 2024

Add validators for the structure of predictions supplied to prediction evaluation methods #169

Closed

cwognum reviewed Aug 13, 2024

View reviewed changes

Andrewq11 added 5 commits August 13, 2024 22:05

tiny wording change in competition tutorial

0005c78

Addressing PR feedback

802d3bd

fixing tests & removing dataset redefinition from CompetitionDataset …

54533da

…class

Commenting out line in tutorial to fix test

d6f2472

fixing formatting

fa39613

jstlaurent approved these changes Aug 16, 2024

View reviewed changes

small fixes & depending on tableContent for dataset storage info

4bac9d2

cwognum mentioned this pull request Aug 16, 2024

Add a converter from PDB to Zarr to the DatasetFactory #171

Merged

5 tasks

Andrewq11 merged commit 2c785a5 into main Aug 19, 2024
4 checks passed

Andrewq11 mentioned this pull request Aug 20, 2024

Introduce a new enum class to clean up conditional URL logic in the PolarisHubClient #180

Open

cwognum deleted the feat/competitions branch September 11, 2024 20:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Competitions feature #121

Competitions feature #121

kirahowe commented Jun 28, 2024 •

edited

Loading

jstlaurent left a comment

cwognum left a comment

cwognum left a comment •

edited

Loading

jstlaurent left a comment

Competitions feature #121

Competitions feature #121

Conversation

kirahowe commented Jun 28, 2024 • edited Loading

Changelogs

Discussion

Deployment

jstlaurent left a comment

Choose a reason for hiding this comment

cwognum left a comment

Choose a reason for hiding this comment

cwognum left a comment • edited Loading

Choose a reason for hiding this comment

jstlaurent left a comment

Choose a reason for hiding this comment

kirahowe commented Jun 28, 2024 •

edited

Loading

cwognum left a comment •

edited

Loading