-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tool call performs worse on v2.2.0 as compared to latest #2413
Comments
Hi @varad0309 thanks for opening this issue, We will be publishing a newer release in the coming week/weeks and it should include these fixes along with many other improvements! For now I'd recommend using latest or a pinned commit to ensure you are using a version with the tool fixes. Thanks again! |
@drbh thanks for the quick reply. I did try a commit from few hrs back (more specifically, My observation: the list of available tools is still not getting passed appropriately. |
oh apologies I must have misunderstood the issue, it sounds that tools responses have regressed starting at version 2.2.0 and onwards? Would you be able to share an example of the input and expected output? Additionally do you know when the tools were last working as you expected (maybe a version or best case the last commit sha)? Thanks! |
Sure, here are a few examples. I unfortunately don't know the last version after which it starts breaking. The versions I am comparing are via docker images:
Examples:
|
Hi @varad0309 I believe these issues should be resolved by the recent improvements/bug fixes to grammars and tool calling (#2463, #2454, #2391, etc...) Would you kindly try the most recent container image |
@drbh thanks for working on this!! Just ran some tests on different versions (older to newer as you go from col 1 to col 3), on a benchmark by setting temperature 0 using the above OpenAI chat completion script on The function calling performance seems to be still dropping. Though it seems that the models' ability to filter out irrelevant tools is pretty good (benchmark is BFCL style). |
System Info
OS: Ubuntu Linux
Model:
meta-llama/Meta-Llama-3.1-8B-Instruct
/meta-llama/Meta-Llama-3-8B-Instruct
Hardware: A100 80G
Version with issue: v2.2.0
Compared with: latest
Information
Tasks
Reproduction
Expected behavior
Hey @drbh @ErikKaum, did you try benchmarking the performance of
v2.2.0
againstlatest
on tool calling? I am getting dramatically worse performance onv2.2.0
as compared to using the previous versions on some tool-call benchmarks I have created. Just changing the version causes the performance ofmeta-llama/Meta-Llama-3-8B-Instruct
to drop from 0.66 to 0.08 on the same script and data. Ofc, I can't evaluate the performance of Llama-3.1 on previous versions, but the its' performance is similarly close to 0.The text was updated successfully, but these errors were encountered: