Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I believe this should close #398. It would be good if @sadransh could confirm this though.
I think what was happening there was that the API response was sending data in an unexpected format (already a dict rather than just a JSON string), and our implementation of OpenAIModel assumed it would be a string, and didn't check that at runtime.
In practice, it seems like a likely mistake to be made by LLM implementers to claim to be api-compatible with a model, but have the tool data returned as a string vs. as an object, so it feels reasonable to me to be more defensive about the object we get back. In this PR, I removed the
from_json
andfrom_dict
methods onToolCallPart
and replaced it with onefrom_raw_args
that always checks the type and produces the appropriate kind ofArgs
object.I also added methods to get the args object as a json string or as a dict — while you'd expect most models to do the right thing internally, it's feasible that a user might generate some messages with one model and then run further requests to another model, but reusing the original messages. If the implementation of the other model expects the tool calls to be in a different format, we'll unnecessarily generate errors when we could just convert the data from dict to json or vice versa at runtime.
The overhead of checking the type of the tool call data when parsing or generating vendor-compatible messages seems low enough to be negligible in the grand scheme, so I think the improved handling of mis-typed responses is more beneficial than harmful.