Add new evaluation metrics #934

jainapurva · 2024-09-24T20:04:45Z

Added new tests to llama/eval.sh for more extensive testing on larger metrics

pytorch-bot · 2024-09-24T20:04:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/934

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d0743e1 with merge base 72d2518 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2024-09-24T21:08:34Z

if not sure if all eval metrics are relevant for llama quant, cc @HDCharles to take a look

HDCharles · 2024-09-25T19:25:10Z

we can already specify any number of tasks with the --tasks argument, normally if I wanted to make it easier to run a small set of hard coded sets of experiments i would write an sh file that specified these things, not modify the lower level eval runner to have multiple ways to specify the tasks.

the current solution is specifying all tests explicitly in benchmarks.sh and we made evals.sh to do the same though I haven't added all the tests there yet. If you want to make it easier to run those sets i'd maybe add them there?

Is this a larger suggestion that the benchmarks we list in the README should be changed as well?

HDCharles

see comment

jainapurva · 2024-09-26T23:42:23Z

the current solution is specifying all tests explicitly in benchmarks.sh and we made evals.sh to do the same though I haven't added all the tests there yet. If you want to make it easier to run those sets i'd maybe add them there?

I think we need more eval metrics to test the different techniques on llama. Though that could be done by updating the eval.sh instead. Also, we can add more benchmarks to the readme file in future.

drisspg · 2024-09-27T23:39:02Z

@HDCharles just to confirm this is what you have in mind right?

HDCharles

this looks better

…#934) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 24, 2024

jainapurva force-pushed the add_eval_metrics branch from 5a1803e to 130bfc4 Compare September 24, 2024 20:43

jerryzh168 requested a review from HDCharles September 24, 2024 21:08

HDCharles requested changes Sep 25, 2024

View reviewed changes

jainapurva force-pushed the add_eval_metrics branch 3 times, most recently from 5776bc0 to c87a56d Compare September 27, 2024 02:49

Add new evaluation metrics

d0743e1

jainapurva force-pushed the add_eval_metrics branch from c87a56d to d0743e1 Compare September 27, 2024 02:50

jainapurva requested a review from HDCharles September 27, 2024 16:03

jainapurva marked this pull request as ready for review September 27, 2024 16:05

HDCharles approved these changes Sep 28, 2024

View reviewed changes

jainapurva merged commit ae49375 into main Sep 30, 2024
17 checks passed

melvinebenezer pushed a commit to melvinebenezer/ao that referenced this pull request Oct 7, 2024

Add new evaluation metrics (pytorch#934)

c0d9817

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

Add more logging for missing argument for tokenizer artifact (pytorch…

ef5f365

…#934) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add new evaluation metrics #934

Add new evaluation metrics #934

Uh oh!

jainapurva commented Sep 24, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 24, 2024 •

edited

Loading

Uh oh!

jerryzh168 commented Sep 24, 2024 •

edited

Loading

Uh oh!

HDCharles commented Sep 25, 2024

Uh oh!

HDCharles left a comment

Uh oh!

jainapurva commented Sep 26, 2024

Uh oh!

drisspg commented Sep 27, 2024

Uh oh!

HDCharles left a comment

Uh oh!

Uh oh!

Uh oh!

Add new evaluation metrics #934

Add new evaluation metrics #934

Uh oh!

Conversation

jainapurva commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/934

✅ No Failures

Uh oh!

jerryzh168 commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HDCharles commented Sep 25, 2024

Uh oh!

HDCharles left a comment

Choose a reason for hiding this comment

Uh oh!

jainapurva commented Sep 26, 2024

Uh oh!

drisspg commented Sep 27, 2024

Uh oh!

HDCharles left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jainapurva commented Sep 24, 2024 •

edited

Loading

pytorch-bot bot commented Sep 24, 2024 •

edited

Loading

jerryzh168 commented Sep 24, 2024 •

edited

Loading