Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leaderboard Update April 1 #299

Merged
merged 11 commits into from
Apr 1, 2024
Merged

Leaderboard Update April 1 #299

merged 11 commits into from
Apr 1, 2024

Conversation

HuanzhiMao
Copy link
Collaborator

@HuanzhiMao HuanzhiMao commented Mar 31, 2024

This PR is for the Leaderboard April 1 update.
This update comes with new models (Claude-3-Haiku, Databrick-DBRX-Instruct), more advanced AST evaluation process, and updated evaluation datasets. Cost and latency statistics during evaluation are also measured. We also released the manual that our evaluation is based on.

Does this affect leaderboard score?
Yes! Read updated blog 8 - leaderboard to learn more!


Co-authored-by: Charlie Cheng-Jie Ji charliechengjieji@berkeley.edu
Co-authored-by: Fanjia Yan fanjiayan@berkeley.edu

@HuanzhiMao HuanzhiMao changed the title Leaderboard V2 release Leaderboard Update April 1 Apr 1, 2024
Copy link
Owner

@ShishirPatil ShishirPatil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ShishirPatil ShishirPatil merged commit 6971033 into ShishirPatil:main Apr 1, 2024
@HuanzhiMao HuanzhiMao mentioned this pull request Apr 9, 2024
ShishirPatil pushed a commit that referenced this pull request Apr 11, 2024
This PR is for the leaderboard April 8th release:

1. Fixed an oversight that was introduced in #299. For function-calling
(FC) models that cannot take `float` type in input, when the parameter
type is a `float`, the evaluation procedure will convert that type to
`number` in the model input and mention in the parameter description
that `This is a float type value.`. An additional field `format: float`
will also be included in the model input to make it clear about the
type.
2. Update the model handler for Claude, Mistral, and OSS to better parse
the model output. This is to patch the handler we released in #299, as
it sometimes fails to parse even though the model output is valid. This
affects only the prompting models; the FC models are unaffected.


This PR **DOES** change the leaderboard score. We will update the
leaderboard website shortly, in a different PR.

---------

Co-authored-by: Charlie Cheng-Jie Ji <charliechengjieji@berkeley.edu>
Co-authored-by: Fanjia Yan <fanjiayan@berkeley.edu>
devanshamin pushed a commit to devanshamin/gorilla that referenced this pull request Jul 9, 2024
This PR is for the Leaderboard April 1 update. 
This update comes with new models (`Claude-3-Haiku`,
`Databrick-DBRX-Instruct`), more advanced AST evaluation process, and
updated evaluation datasets. Cost and latency statistics during
evaluation are also measured. We also released the manual that our
evaluation is based on.

Does this affect leaderboard score? 
Yes! Read updated blog 8 - leaderboard to learn more!




---------

Co-authored-by: Charlie Cheng-Jie Ji <charliechengjieji@berkeley.edu>
Co-authored-by: Fanjia Yan <fanjiayan@berkeley.edu>
devanshamin pushed a commit to devanshamin/gorilla that referenced this pull request Jul 9, 2024
This PR is for the leaderboard April 8th release:

1. Fixed an oversight that was introduced in ShishirPatil#299. For function-calling
(FC) models that cannot take `float` type in input, when the parameter
type is a `float`, the evaluation procedure will convert that type to
`number` in the model input and mention in the parameter description
that `This is a float type value.`. An additional field `format: float`
will also be included in the model input to make it clear about the
type.
2. Update the model handler for Claude, Mistral, and OSS to better parse
the model output. This is to patch the handler we released in ShishirPatil#299, as
it sometimes fails to parse even though the model output is valid. This
affects only the prompting models; the FC models are unaffected.


This PR **DOES** change the leaderboard score. We will update the
leaderboard website shortly, in a different PR.

---------

Co-authored-by: Charlie Cheng-Jie Ji <charliechengjieji@berkeley.edu>
Co-authored-by: Fanjia Yan <fanjiayan@berkeley.edu>
devanshamin pushed a commit to devanshamin/gorilla that referenced this pull request Jul 9, 2024
This PR is for the Leaderboard April 1 update. 
This update comes with new models (`Claude-3-Haiku`,
`Databrick-DBRX-Instruct`), more advanced AST evaluation process, and
updated evaluation datasets. Cost and latency statistics during
evaluation are also measured. We also released the manual that our
evaluation is based on.

Does this affect leaderboard score? 
Yes! Read updated blog 8 - leaderboard to learn more!




---------

Co-authored-by: Charlie Cheng-Jie Ji <charliechengjieji@berkeley.edu>
Co-authored-by: Fanjia Yan <fanjiayan@berkeley.edu>
aw632 pushed a commit to vinaybagade/gorilla that referenced this pull request Aug 22, 2024
This PR is for the Leaderboard April 1 update. 
This update comes with new models (`Claude-3-Haiku`,
`Databrick-DBRX-Instruct`), more advanced AST evaluation process, and
updated evaluation datasets. Cost and latency statistics during
evaluation are also measured. We also released the manual that our
evaluation is based on.

Does this affect leaderboard score? 
Yes! Read updated blog 8 - leaderboard to learn more!




---------

Co-authored-by: Charlie Cheng-Jie Ji <charliechengjieji@berkeley.edu>
Co-authored-by: Fanjia Yan <fanjiayan@berkeley.edu>
aw632 pushed a commit to vinaybagade/gorilla that referenced this pull request Aug 22, 2024
This PR is for the leaderboard April 8th release:

1. Fixed an oversight that was introduced in ShishirPatil#299. For function-calling
(FC) models that cannot take `float` type in input, when the parameter
type is a `float`, the evaluation procedure will convert that type to
`number` in the model input and mention in the parameter description
that `This is a float type value.`. An additional field `format: float`
will also be included in the model input to make it clear about the
type.
2. Update the model handler for Claude, Mistral, and OSS to better parse
the model output. This is to patch the handler we released in ShishirPatil#299, as
it sometimes fails to parse even though the model output is valid. This
affects only the prompting models; the FC models are unaffected.


This PR **DOES** change the leaderboard score. We will update the
leaderboard website shortly, in a different PR.

---------

Co-authored-by: Charlie Cheng-Jie Ji <charliechengjieji@berkeley.edu>
Co-authored-by: Fanjia Yan <fanjiayan@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants