Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow more input types to functions, fix tests #377

Merged
merged 23 commits into from
Jun 21, 2023

Conversation

stillmatic
Copy link
Contributor

@stillmatic stillmatic commented Jun 15, 2023

See discussion in #360 -- having just the bytes here is much more maintainable than trying to maintain the entire JSONSchema spec. I have tested that the OpenAI model accepts additional jsonschema params (of which there can be many).

Also, when verifying the behavior, I found that the ChatCompletion and Completion servers weren't actually trying to make completions. Converting req.N to a minimum of 1 should fix that.

@jmacwhyte
Copy link
Contributor

I think this is a much cleaner solution! I will test it out and give feedback (I realize this is kind of a pointless comment, but I want you to know there is interest!)

@codecov
Copy link

codecov bot commented Jun 15, 2023

Codecov Report

Merging #377 (df71769) into master (b095938) will increase coverage by 0.05%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #377      +/-   ##
==========================================
+ Coverage   95.26%   95.32%   +0.05%     
==========================================
  Files          17       17              
  Lines         676      684       +8     
==========================================
+ Hits          644      652       +8     
  Misses         22       22              
  Partials       10       10              
Impacted Files Coverage Δ
chat.go 100.00% <ø> (ø)

... and 2 files with indirect coverage changes

chat.go Show resolved Hide resolved
@stillmatic
Copy link
Contributor Author

I've added back the removed interfaces to the Params field and renamed the json.RawMessage field to ParametersRaw. I feel like this is an anti-pattern though and greatly complicates the library -- I barely understand what's going on in the marshal/unmarshal functions. While the marshaled struct is byte compatible with the OpenAI spec, it's quite unclear how we handle unmarshal, as there are two possible fields where that can be set (and if we want to maintain strict compatibility, we also need to maintain this unmarshal function, not just the marshal function).

The PR now also contains essentially the same fix as #375 and #373 -- my view is that so long as the option to use the structs is around, users will gradually request more and more functionality and it's better to draw a line in the sand sooner. Those 2 PR's also suggest that this code path was already broken for anyone using it, so breaking compatibility this time should not be an issue -- itwas already broken.

@jmacwhyte
Copy link
Contributor

@stillmatic I agree with the ease-of-use issues. If there was only a RawMessage field, it's pretty easy for the docs to say "put your schema (marshalled into []byte) here!"

If a struct and the rawmessage are both populated, which takes precedence when the request is submitted to the API? How do we convey that to users of the libray? I'm a big fan of a "single source of truth" instead of having two copies of the same data, just in different formats.

chat.go Outdated Show resolved Hide resolved
@stillmatic stillmatic force-pushed the master branch 2 times, most recently from abba18d to 9fc4bb9 Compare June 16, 2023 20:59
chat.go Outdated Show resolved Hide resolved
chat.go Outdated Show resolved Hide resolved
@jmacwhyte
Copy link
Contributor

@stillmatic I've submitted a PR to your repo with some improvements (including the comment @vvatanabe suggested!)

@stillmatic
Copy link
Contributor Author

thanks @jmacwhyte ! my only issue with that is that FunctionDefine -> FunctionDefinition also breaks compatibility :)

@jmacwhyte
Copy link
Contributor

thanks @jmacwhyte ! my only issue with that is that FunctionDefine -> FunctionDefinition also breaks compatibility :)

How long has FunctionDefine even existed? One day? XD

I agree we don't want to break anything, but it's frustrating when poor word choices are rushed out the door... it makes it much harder for people to understand the code in the future.

@stillmatic
Copy link
Contributor Author

That's basically how I feel - the Parameters implementation with a custom struct adds quite a lot of complexity and is quite hard for future users to understand (and deprecation without removal is also confusing).

If we're going to break compatibility with the FunctionDefinition change (which I agree is much easier to read), we might as well break compatibility and focus only on the json.RawMessage implementation. It's only been around for 2 days, and for 1 of those days, it was broken without the items field.

@jmacwhyte
Copy link
Contributor

@stillmatic Should we just put together what we think is the optimal solution, and let @sashabaranov (or another maintainer) pull it in if they want? I don't know who is making the decisions around here, I just want a library that makes my project easy to do.

@stillmatic
Copy link
Contributor Author

stillmatic commented Jun 17, 2023

@sashabaranov - my proposal is that we merge this PR as of 345214c -- that diff converts the Parameters object from a custom JSON Schema implementation to a json.RawMessage object, which is more generally supported and relieves the library maintainers from the need to implement another JSON Schema definition. It's closer to what OpenAI expects and much easier to work with using external libraries. I've written a testing and working implementation https://github.com/stillmatic/gollum/blob/main/functions.go#L16 using the github.com/invopop/jsonschema library but have confirmed that it is also compatible with the github.com/santhosh-tekuri/jsonschema/v5 library/. Tests and checks also pass completely, as the implementation is extremely simple.

This change is the most compatible with the rest of the golang ecosystem and the least confusing for users - a single code path with no required dependencies and clear debugging paths. It technically breaks compatibility, but I believe it is not a big deal - as discussed above, the release has not been out for very long and it was bugged most of that time. The future maintenance burden of keeping a deprecated tag around is quite high in this case and should outweigh the disruption to workflows right now.

Note that 345214c is basically a pure revert commit - the backwards compatibility support requires 200+ LOC. Previous changes got borked from rebasing.

Copy link
Contributor

@jmacwhyte jmacwhyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also change FunctionCall.Arguments to type json.RawMessage.

@stillmatic
Copy link
Contributor Author

@jmacwhyte is the change you're referring to covered by https://github.com/sashabaranov/go-openai/pull/388/files?

@jmacwhyte
Copy link
Contributor

jmacwhyte commented Jun 17, 2023

@jmacwhyte is the change you're referring to covered by https://github.com/sashabaranov/go-openai/pull/388/files?

No, it is not touched by that one.

Above I was refering to chat.go L41

@sashabaranov
Copy link
Owner

Folks, thank you so much for the great effort you are putting in here! Please excuse me if I've missed some part of the discussion. Here are a couple of thoughts after briefly studying the discussion:

  1. I'm fine with breaking API in this case
  2. I can see the appeal of both simplified in-library JSON Schema and json.RawMessage. Could we have a type that can take either of them and do the proper thing during marshaling? Or would it break during unmarshalling? If this approach is feasible — let's also move JSON schema stuff into a separate directory.

@jmacwhyte
Copy link
Contributor

jmacwhyte commented Jun 18, 2023

I wonder if we could change Parameters to type any, and then during submission:

  1. If Parameters is a []byte or string, treat it like json.RawMessage.
  2. If it is anything else, run it through json.Marshal().

That way, users can pass in a preformatted schema as text, but they can also use any library that provides nested structs that are JSON compliant (and we don't have to know anything about which libraries they are using).

@stillmatic Let me know if you want me to work on this, I'm holding off for now so we don't come up with conflicting commits.

@stillmatic
Copy link
Contributor Author

I've pushed up a sample commit now that converts Parameters to interface{} and relies on the JSON package to handle marshaling/unmarshaling here. Tests pass using both json.RawMessage and custom structs that serialize to JSON here. This is very similar to @jmacwhyte suggestion but does not require a custom marshaller.

If you want to add back the JSONSchema implementation, ok, we can add that back, preferably it should note the limitations on it, but I think the PR should be good to go right afterwards.

chat.go Outdated Show resolved Hide resolved
chat.go Show resolved Hide resolved
@jmacwhyte
Copy link
Contributor

@sashabaranov Sounds good to me, as long as it's simple and easy to maintain.

@stillmatic I submitted another PR to your fork, it's quite minor but I would like it included before the final merge.

@jmacwhyte
Copy link
Contributor

Regarding @sashabaranov 's point number 2 here, I have submitted a PR to stillmatic's fork which moves all the json schema stuff to a separate directory.

chat.go Outdated Show resolved Hide resolved
@vvatanabe
Copy link
Collaborator

@sashabaranov Sorry for the late reply.

  1. I'm fine with breaking API in this case.

I agree. Please note: if the API is to be changed in a way that breaks compatibility, it should be announced that these breaking changes will be made in a forthcoming minor update. Subsequently, these changes should be implemented and released in a major update. Make sure to document the breaking changes in the release notes of the major update. go-github is a good reference for this process, as it frequently makes breaking changes while following these steps.

  1. I can see the appeal of both simplified in-library JSON Schema and json.RawMessage.

I agree. Changing the Parameters field to the type any would allow for a minor update to be released. When marshalling the Parameters field, we can use type assertions to differentiate []byte(json.RawMessage) or struct. The next major update should then support only []byte(json.RawMessage).

added test for compatibility as well
@stillmatic
Copy link
Contributor Author

Thanks for the tip on defined type vs type alias -- with that, I've added a test that covers the old behavior as well. So we should actually have 100% compatibility.

I haven't added any custom marshaling code since this works now. I can also see the appeal of leaving it as an any field i nstead of json.RawMessage. This is not the greatest example, but imagine if msg here was a properly formed JSONSchema struct, e.g. from a custom JSONSchema library.

image

Then, this code will marshal properly as is. The advantage is that you save a marshal step (instead of marshal struct to get bytes to pass to parameters, then marshal again). The disadvantage is that you add an unmarshal step (need to reflect to get the type). However, most people will never unmarshal this particular Request struct as they don't run their own servers, so probably better this way. We can revisit that in the future though :)

should be good to merge?

@jmacwhyte
Copy link
Contributor

Thanks for the comments @vvatanabe , I'm learning a lot from them!

I really like the solution we've come up with to use any for the schema. It lets developers use whatever they want, and as far as I can see it's pretty hard to use incorrectly, so there aren't really any downsides.

@vvatanabe
Copy link
Collaborator

My apologies, I overlooked the TestChatCompletionsFunctions test. There's no need to implement a custom marshalling function. I believe the solution proposed in this PR is both simple and fitting. LGTM !

Thanks to the change to the any type, there's no need to specifically plan for a major update. This change can simply be released as part of a minor update.

Copy link
Owner

@sashabaranov sashabaranov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 Thank you for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants