[Test] fix test_evaluator.py #675

baberabb · 2023-07-13T00:22:56Z

linked to #656

I got the test running at least. Using a single task for the time being, not sure how extensive you guys want the tests to be.

I was thinking for the CI, rather than using the dummy, if we could use a small model from hf on a single task and asserting if the results are as expected. Thoughts?

StellaAthena · 2023-07-13T04:48:05Z

This is excellent!

Would it be possible to parameterize the tests so that we can launch only the ones corresponding to tasks that have changed? That would massively cut down on the numbers of tests that need to be run each time and would make running more than cursory correctness testing plausible.

I was thinking for the CI, rather than using the dummy, if we could use a small model from hf on a single task and asserting if the results are as expected. Thoughts?

I am weakly in favor of this, assuming it’s not computationally prohibitive. We could also potentially connect it to a Kubernetes cluster so that we can run tests on actual GPUs. Not sure how to do that, but I’m 90% sure it’s possible for someone who knows what they’re doing.

baberabb · 2023-07-13T06:50:01Z

Thanks! Most of the implementation was there already. Pretty minor edits.

Would it be possible to parameterize the tests so that we can launch only the ones corresponding to tasks that have changed? That would massively cut down on the numbers of tests that need to be run each time and would make running more than cursory correctness testing plausible.

Yeah that shouldn't be too hard. Just need to get the logic straight. Maybe check which task(s) folder the changes were made in and get the tasks name(s) from there. Will see if I can come up with something.

I am weakly in favor of this, assuming it’s not computationally prohibitive. We could also potentially connect it to a Kubernetes cluster so that we can run tests on actual GPUs. Not sure how to do that, but I’m 90% sure it’s possible for someone who knows what they’re doing.

Yeah you could definitely link up a custom runner and I was thinking of something fast and quick here to run on every PR just go check if the api's running as it should.

haileyschoelkopf · 2023-07-13T12:54:17Z

If you have any experience setting up such a runner for GPU tests, that'd be very helpful to discuss! I'm sure it can be done but I don't currently have the knowledge of github actions to know how yet.

…refactor_testeval

testing

pytest only on test_evaluator

…-refactor_testeval

…testeval

baberabb · 2023-07-17T03:11:09Z

Hey @haileyschoelkopf @lintangsutawika . Got a working implementation ready. Right now this runs on every PR/push:

the evaluator
checks the task methods are working properly
if a .yaml orpyis amended in a tasks folder then it checks the class methods of those are working

haileyschoelkopf

This looks really really great, thanks so much for working on it!

I think, if you're alright with it, we can merge this for now so that it can run on big-refactor PRs and open another PR for continued testing work--we should also fix the errors raised by the linter but that doesn't have to delay this PR.

haileyschoelkopf · 2023-07-17T17:57:50Z

.github/workflows/new_tasks.yml

+        id: changed-tasks
+        uses: tj-actions/changed-files@v37.1.2
+        with:
+          files_yaml: |


We should consider selecting a set of tasks that we'll treat as "modified" if lm_eval/api/task.py is edited, perhaps.

Added! Checks if any files changed in api/* then runs a couple of more tasks. Feel free to change them. Feel like should check for different tasks but I mostly know of multiple choice.

Thanks, looks good! Will want to add 1 greedy_until task that's suitable at some point but those choices look good for now.

baberabb · 2023-07-17T21:41:53Z

we should also fix the errors raised by the linter but that doesn't have to delay this PR.

lol. Thats mostly type checking from mypy. That's all you 😂.The pre commit failing was weird though.

Looks like the tests are all working!

tests/test_tasks.py

haileyschoelkopf · 2023-07-18T12:15:23Z

lol. Thats mostly type checking from mypy. That's all you 😂.The pre commit failing was weird though.

Yes, this was the royal we--I'll have to go fix those errors :) Looks like precommit passes, merging this! Thanks a bunch, this is really great progress!

fix test_evaluator.py

d9b547b

baberabb requested review from haileyschoelkopf and lintangsutawika as code owners July 13, 2023 00:22

baberabb added 16 commits July 14, 2023 13:22

test pythia-70m and pytest only test_evaluator.py

4225df5

test pythia-70m and pytest only test_evaluator.py

196fbbb

Merge remote-tracking branch 'origin/big-refactor_testeval' into big-…

a8b32e8

…refactor_testeval

fix

2f386a9

fix

e704762

Create python-package.yml

9c26a95

testing

fix

64c15e4

Update python-package.yml

0a48d7c

pytest only on test_evaluator

Add CI setup

94cba28

fix typo

455c52c

unstop flake8

10103d5

fix

b182d1b

fix

82c12e9

fix

d4ef6e5

fix

708bc65

fix

1ec0a12

StellaAthena added the bug Something isn't working. label Jul 14, 2023

updated test_tasks

fae09c2

baberabb marked this pull request as draft July 14, 2023 21:13

baberabb added 4 commits July 14, 2023 22:14

Merge remote-tracking branch 'origin/big-refactor_testtasks' into big…

58aa729

…-refactor_testeval

typo

1c27e37

Merge remote-tracking branch 'origin/big-refactor' into big-refactor_…

3199516

…testeval

edited test_tasks.py and workflow files

4c25086

baberabb force-pushed the big-refactor_testeval branch from 434e70d to 4c25086 Compare July 17, 2023 02:56

Delete test file

fa660ad

baberabb marked this pull request as ready for review July 17, 2023 03:05

Update and rename __innit__.py to __init__.py

8e3f1a4

haileyschoelkopf approved these changes Jul 17, 2023

View reviewed changes

baberabb added 2 commits July 17, 2023 22:23

add CI test case if api/* files changed then pytest more tasks

80b8432

fix typos

8c0db2a

fix new_tasks function

705e57c

baberabb commented Jul 17, 2023

View reviewed changes

tests/test_tasks.py Show resolved Hide resolved

haileyschoelkopf merged commit a8bdf20 into EleutherAI:big-refactor Jul 18, 2023
3 of 4 checks passed

baberabb deleted the big-refactor_testeval branch August 2, 2023 13:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Test] fix test_evaluator.py #675

[Test] fix test_evaluator.py #675

baberabb commented Jul 13, 2023

StellaAthena commented Jul 13, 2023

baberabb commented Jul 13, 2023

haileyschoelkopf commented Jul 13, 2023

baberabb commented Jul 17, 2023

haileyschoelkopf left a comment

haileyschoelkopf Jul 17, 2023

baberabb Jul 17, 2023

haileyschoelkopf Jul 18, 2023

baberabb commented Jul 17, 2023 •

edited

Loading

haileyschoelkopf commented Jul 18, 2023 •

edited

Loading

[Test] fix test_evaluator.py #675

[Test] fix test_evaluator.py #675

Conversation

baberabb commented Jul 13, 2023

StellaAthena commented Jul 13, 2023

baberabb commented Jul 13, 2023

haileyschoelkopf commented Jul 13, 2023

baberabb commented Jul 17, 2023

haileyschoelkopf left a comment

Choose a reason for hiding this comment

haileyschoelkopf Jul 17, 2023

Choose a reason for hiding this comment

baberabb Jul 17, 2023

Choose a reason for hiding this comment

haileyschoelkopf Jul 18, 2023

Choose a reason for hiding this comment

baberabb commented Jul 17, 2023 • edited Loading

haileyschoelkopf commented Jul 18, 2023 • edited Loading

baberabb commented Jul 17, 2023 •

edited

Loading

haileyschoelkopf commented Jul 18, 2023 •

edited

Loading