Resync llama_grammar with llama.cpp implementation and use curly braces quantities instead of repetitions #1721

gbloisi-openaire · 2024-08-31T20:39:14Z

This PR resync recent changes from llama.cpp's json_schema_to_grammar.py

Initially I came to this patch because I noticed that generated json strings sometimes contained ascii control characters, hence causing json.loads to fail.
Then I noticed that sometimes unterminated json was generated because llama.cpp was producing very long and wrong replies. I believe this is caused by a bug in grammar management, however using curly braces quantities instead of code-generated repetitions alleviates this problem a lot, and it gets totally fixed by providing max_length of string fields in the json schema.
Something else I noticed is that without this change the generated json could contain newlines and tabs/spaces as separators between json elements, whereas the grammar would impose a sigle whitespace: that should be another sign of a bug in grammar management. Perhaps teh grammar was silently ignored and a json was generated anyway because the json schema was part of the prompt as well.

…es quantifiers instead of repetitions

Resync llama_grammar with llama.cpp implementation and use curly brac…

c771334

…es quantifiers instead of repetitions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resync llama_grammar with llama.cpp implementation and use curly braces quantities instead of repetitions #1721

Resync llama_grammar with llama.cpp implementation and use curly braces quantities instead of repetitions #1721

gbloisi-openaire commented Aug 31, 2024

Resync llama_grammar with llama.cpp implementation and use curly braces quantities instead of repetitions #1721

Are you sure you want to change the base?

Resync llama_grammar with llama.cpp implementation and use curly braces quantities instead of repetitions #1721

Conversation

gbloisi-openaire commented Aug 31, 2024