Skip to content

Misc. bug: JSON schema that defines array with 0 elements generates un-parseable GBNF #13116

@rick-github

Description

@rick-github

Name and Version

$ ./build/bin/llama-cli --version
version: 5188 (514c456)
built with cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-cli

Command line

./build/bin/llama-cli -m ./tmp/mnt/models/quantize/gemma-1.1-2b-it.Q8_0.gguf -j "$(python -c $'import json, pydantic\nclass Result(pydantic.BaseModel):  colours:list[str]=pydantic.Field(max_length=0)\nprint(json.dumps(Result.model_json_schema()))')" --no-display-prompt  -p "Here are some colours: " -no-cnv

Problem description & steps to reproduce

A syntactically correct (albeit nonsensical) JSON schema that defines an element as a 0-length array creates a GBNF grammar that causes llama-cli to error with a failed to parse grammar error.

For example, this code:

import json, pydantic
class Result(pydantic.BaseModel):  
  colours:list[str]=pydantic.Field(max_length=0)
print(json.dumps(Result.model_json_schema()))

creates this schema:

{
    "properties": {
        "colours": {
            "items": {
                "type": "string"
            },
            "maxItems": 0,
            "title": "Colours",
            "type": "array"
        }
    },
    "required": [
        "colours"
    ],
    "title": "Result",
    "type": "object"
}

and passing to llama-cli results in an error:

$ ./build/bin/llama-cli -m ./tmp/mnt/models/quantize/gemma-1.1-2b-it.Q8_0.gguf -j "$(python -c $'import json, pydantic\nclass Result(pydantic.BaseModel):  colours:list[str]=pydantic.Field(max_length=0)\nprint(json.dumps(Result.model_json_schema()))')" --no-display-prompt  -p "Here are some colours: " -no-cnv
...

system_info: n_threads = 8 (n_threads_batch = 8) / 24 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | 

parse: error parsing grammar: expecting '}' at -1})? "]" space
colours-kv ::= "\"colours\"" space ":" space colours
root ::= "{" space colours-kv "}" space
space ::= | " " | "\n"{1,2} [ \t]{0,20}
string ::= "\"" char* "\"" space


char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
colours ::= "[" space (string ("," space string){0,-1})? "]" space
colours-kv ::= "\"colours\"" space ":" space colours
root ::= "{" space colours-kv "}" space
space ::= | " " | "\n"{1,2} [ \t]{0,20}
string ::= "\"" char* "\"" space

llama_grammar_init_impl: failed to parse grammar
main: failed to initialize sampling subsystem

First Bad Commit

55b2d08

Relevant log output

common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 8

system_info: n_threads = 8 (n_threads_batch = 8) / 24 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | 

parse: error parsing grammar: expecting '}' at -1})? "]" space
colours-kv ::= "\"colours\"" space ":" space colours
root ::= "{" space colours-kv "}" space
space ::= | " " | "\n"{1,2} [ \t]{0,20}
string ::= "\"" char* "\"" space


char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
colours ::= "[" space (string ("," space string){0,-1})? "]" space
colours-kv ::= "\"colours\"" space ":" space colours
root ::= "{" space colours-kv "}" space
space ::= | " " | "\n"{1,2} [ \t]{0,20}
string ::= "\"" char* "\"" space

llama_grammar_init_impl: failed to parse grammar
main: failed to initialize sampling subsystem

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions