[BFCL] Fix Irrelevance Category Performance for DeepSeek Coder Handler #796
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR updates the decoding logic for DeepSeek-Coder handler (introduced in #697) to fix its performance issue in the irrelevance category.
The irrelevance category metric we use is that, either the
decode_ast
should fail (error) or the decoded output is empty (eg, empty list or empty string).For the DeepSeek-Coder model,
When it outputs a valid function call, the model response will be a list of dictionaries
[{func1:{param1:val1,...}},{func2:{param2:val2,...}}]
, so it's fine fordecode_ast
to just return it without any processing.However, when the output is a message (not valid function call), under the
_parse_query_response_prompting
logic, the model response will be that message string, and in the currentdecode_ast
implementation, that string will just be treated as the decoded output, and it would fail both the metric for the irrelevance category, which is not ideal.