Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BFCL] Fix Irrelevance Category Performance for DeepSeek Coder Handler #796

Merged
merged 2 commits into from
Nov 28, 2024

Conversation

HuanzhiMao
Copy link
Collaborator

@HuanzhiMao HuanzhiMao commented Nov 28, 2024

This PR updates the decoding logic for DeepSeek-Coder handler (introduced in #697) to fix its performance issue in the irrelevance category.
The irrelevance category metric we use is that, either the decode_ast should fail (error) or the decoded output is empty (eg, empty list or empty string).

For the DeepSeek-Coder model,
When it outputs a valid function call, the model response will be a list of dictionaries [{func1:{param1:val1,...}},{func2:{param2:val2,...}}], so it's fine for decode_ast to just return it without any processing.
However, when the output is a message (not valid function call), under the _parse_query_response_prompting logic, the model response will be that message string, and in the current decode_ast implementation, that string will just be treated as the decoded output, and it would fail both the metric for the irrelevance category, which is not ideal.

@HuanzhiMao HuanzhiMao added the BFCL-General General BFCL Issue label Nov 28, 2024
@HuanzhiMao HuanzhiMao marked this pull request as ready for review November 28, 2024 00:39
@HuanzhiMao HuanzhiMao merged commit 7cec275 into ShishirPatil:main Nov 28, 2024
VishnuSuresh27 pushed a commit to VishnuSuresh27/gorilla that referenced this pull request Nov 28, 2024
ShishirPatil#796)

This PR updates the decoding logic for DeepSeek-Coder handler to fix its
performance issue in the irrelevance category.
The irrelevance category metric we use is that, either the `decode_ast`
should fail (error) or the decoded output is empty (eg, empty list or
empty string).

For the DeepSeek-Coder model, 
When it outputs a valid function call, the model response will be a list
of dictionaries `[{func1:{param1:val1,...}},{func2:{param2:val2,...}}]`,
so it's fine for `decode_ast` to just return it without any processing.
However, when the output is a message (not valid function call), under
the `_parse_query_response_prompting` logic, the model response will be
that message string, and in the current `decode_ast` implementation,
that string will just be treated as the decoded output, and it would
fail both the metric for the irrelevance category, which is not ideal.
HuanzhiMao added a commit that referenced this pull request Dec 7, 2024
This PR updates the leaderboard to reflect the change in score due to
the following PR merge:

1. #747 
2. #770 
3. #768 
4. #750 
5. #763 
6. #772 
7. #777 
8. #778 
9. #786 
10. #787 
11. #697 
12. #718 
13. #755 
14. #796 
15. #789 
16. #804 
17. #808 
18. #809
19. #811 
20. #810 

Models were evaluated using checkpoint commit d7e52e5.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFCL-General General BFCL Issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants