-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
suggestion: implement jsonformer for generating JSON #1300
Comments
Great task for a Btw, this is along the lines of the constrained Whisper sampling idea for chess moves: https://twitter.com/ggerganov/status/1640441536403116032 |
This is something I've been working on, I have constrained JSON parsing implemented but not the full JSONSchema spec using the llama.cpp python bindings. I wrote a custom tree-sitter parser that can parse partial JSON files and samples tokens accordingly. The tree-sitter parser generates a single c file that I believe should be easy to use in a c++ example if anyone's interested in taking that approach. Validating against the JSONSchema may be harder to do in C++, not sure if there are any good libraries. |
Looking great! |
#1397 looks like it could address this |
#1397 is related, but doesn't (currently) do what this issue is asking for. |
More recently about JSONformer |
I've found the docs about this and am very interrested. However, I'm really not sure how to write a grammar for generating JSON... Does anyone have an example to provide? As JSON is given as an example of a possible thing to do in the grammar docs, it'd be great if an example of how to do that was provided. Thanks. |
For generating arbitrary JSON, there's a JSON grammar provided in
For conforming to a JSON schema, there's
|
Thanks a lot. I actually figured this out in the meantime. Just in case
somebody finds this while looking for answers about grammars, i'd also like
to point out this really cool tool that lets you generate custom ones:
https://grammar.intrinsiclabs.ai/
…On Fri, Aug 25, 2023 at 1:15 AM Evan Jones ***@***.***> wrote:
For generating arbitrary JSON, there's a JSON grammar provided in
grammars/json.gbnf:
% ./main -m $L13B -p 'The weather for today: ' --grammar-file grammars/json.gbnf
...
The weather for today: {"temp":450, "pressure":36.0, "humidity":890}
For conforming to a JSON schema, there's
examples/json-schema-to-grammar.py :
% cat ../schemas/student.json
{
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "number"},
"is_student": {"type": "boolean"},
"courses": {
"type": "array",
"items": {"type": "string"}
}
}
}
% ./main -m $L13B -p 'Hermione Granger ' --grammar "$(python3 examples/json-schema-to-grammar.py ../schemas/student.json --prop-order 'is_student,name,age,courses')"
...
Hermione Granger {"is_student":true, "name":"Hermione","age":12,"courses":[ "Arithmancy", "Defense Against the Dark Arts", "Divination", "Muggle Studies", "Herbology", "Potions" ]}
—
Reply to this email directly, view it on GitHub
<#1300 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA2SFOB2OA3DPPDBSMNRRLXW7N73ANCNFSM6AAAAAAXUVVTKQ>
.
You are receiving this because you commented.Message ID: <ggerganov/llama.
***@***.***>
--
勇気とユーモア
|
This is a neat idea: basically, constrain the output to a particular subset of tokens so that you are guaranteed to generate data of a particular format, and also fill in other context after each piece of output automatically.
In this specific example the format is "JSON with a particular schema", and that's a good place to start, although the technique obviously generalizes.
The text was updated successfully, but these errors were encountered: