Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add openbuddy-llama-30b-v7.1 to AlpacaEval #108

Merged
merged 5 commits into from
Aug 3, 2023
Merged

Conversation

44670
Copy link
Contributor

@44670 44670 commented Aug 2, 2023

We are pleased to submit our results to the leaderboard, with a win_rate of 81.55.

However, we encountered a problem. The model did not update the csv file in the repo, instead, there is a one-sample csv file in the results/openbuddy-llama-30b-v7.1 directory.

The program outputs as follows before finishing:

INFO:root:drop 3 outputs that are not[0, 1, 2]
INFO:root:Saving all results to results/openbuddy-llama-30b-v7.1
INFO:root:Not saving the result to the cached leaderboard because precomputed_leaderboard is not a path but <class 'NoneType'>.
                          win_rate  standard_error  n_total  avg_length
openbuddy-llama-30b-v7.1     81.55            1.37      802         968

For your reference, the attachment below contains all the files we found in the results/openbuddy-llama-30b-v7.1 directory.

openbuddy-llama-30b-v7.1.zip

@YannDubs
Copy link
Collaborator

YannDubs commented Aug 2, 2023

Thanks @44670, those are some cool results, especially for this length!

Sorry, for the leaderboard issue. It was a known issue #77 which is now solved.

Can please you run

alpaca_eval --model_outputs="results/openbuddy-llama-30b-v7.1/outputs.json"

that will generate the cache leaderboard you should add to the PR. Note that annotations are cached so you will not actually reannotate anything.

@44670
Copy link
Contributor Author

44670 commented Aug 2, 2023

Hi, looks like the file "results/openbuddy-llama-30b-v7.1/outputs.json" does not exist.

ls results/openbuddy-llama-30b-v7.1
annotations.json  leaderboard.csv  model_outputs.json  reference_outputs.json

@YannDubs
Copy link
Collaborator

YannDubs commented Aug 2, 2023

Sorry I meant
alpaca_eval --model_outputs="results/openbuddy-llama-30b-v7.1/model_outputs.json"

@44670
Copy link
Contributor Author

44670 commented Aug 2, 2023

It works!

I have just pushed the updated alpaca_eval_gpt4_leaderboard.csv file.

@44670
Copy link
Contributor Author

44670 commented Aug 2, 2023

I did pushed a commit on https://github.com/OpenBuddy/alpaca_eval but looks like it doesn't show up on this page.

Should I do anything more on my side in GitHub?

EDIT: It should have been a bug of GitHub, I have added another more commit and things works now.

@@ -0,0 +1,13 @@
openbuddy-falcon-7b-v6:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the evaluations for this model, can you remove this configs from the PR or evaluate the model?

@@ -0,0 +1,13 @@
openbuddy-llama-65b-v8:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the evaluations for this model, can you remove this config from the PR or evaluate the model?

@YannDubs
Copy link
Collaborator

YannDubs commented Aug 2, 2023

thanks @44670
small comment: either evaluate the model configs you added or remove them. Once that's done I can merge the PR and ti will show up on the leaderboard!

@44670
Copy link
Contributor Author

44670 commented Aug 2, 2023

Thanks!
For these two models, the evaluations are still running, I will inform you when they are done.

@44670
Copy link
Contributor Author

44670 commented Aug 2, 2023

Hi! All the results has been pushed.

Please let me know if you need further assistance.

@YannDubs
Copy link
Collaborator

YannDubs commented Aug 3, 2023

Great thanks @44670 !

@YannDubs YannDubs merged commit ce1123b into tatsu-lab:main Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants