Grammar Splitting words into characters #5925

UYousafzai · 2024-03-07T16:32:31Z

So I have been getting a weird grammar object, not sure if this is how I should be receiving it.

##My original Grammar string is as following
root ::= (" "| "\n") grammar-models
grammar-models ::= more
more ::= "{" "\n" ws ""title"" ":" ws string "," "\n" ws ""author"" ":" ws string "\n" ws "}"
boolean ::= "true" | "false"
null ::= "null"
string ::= """ (
[^"\\] |
"\" (["\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
)* """ ws
ws ::= ([ \t\n] ws)?
float ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)? ws
integer ::= [0-9]+

##Grammar Object Construction Printouts:

from_string grammar:
root ::= root_1 grammar-models
root_1 ::= [ ] | [<U+000A>]
grammar-models ::= more
more ::= [{] [<U+000A>] ws ["] [t] [i] [t] [l] [e] ["] [:] ws string [,] [<U+000A>] ws ["] [a] [u] [t] [h] [o] [r] ["] [:] ws string [<U+000A>] ws [}]
ws ::= ws_12
string ::= ["] string_10 ["] ws
boolean ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e]
null ::= [n] [u] [l] [l]
string_8 ::= [^"] | [] string_9
string_9 ::= ["/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]
string_10 ::= string_8 string_10 |
ws_11 ::= [ <U+0009><U+000A>] ws
ws_12 ::= ws_11 |
float ::= float_14 float_20 float_24 ws
float_14 ::= float_15 float_16
float_15 ::= [-] |
float_16 ::= [0-9] | [1-9] float_17
float_17 ::= [0-9] float_17 |
float_18 ::= [.] float_19
float_19 ::= [0-9] float_19 | [0-9]
float_20 ::= float_18 |
float_21 ::= [eE] float_22 float_23
float_22 ::= [-+] |
float_23 ::= [0-9] float_23 | [0-9]
float_24 ::= float_21 |
integer ::= integer_26
integer_26 ::= [0-9] integer_26 | [0-9]

Not sure if this is meant to be this way but I tried running the same script but with strings and the output was different, however when I changed the data type to float to play around it formatted it this way.

Now the real problem is after reverting it back to string type it still prints these floats objects out.

HanClinto · 2024-03-14T13:38:57Z

I just read your original bug over on abetlen/llama-cpp-python#1261 , and I think I might have been confused about the bug you were reporting here.

It sounds like your issue is with the Pydantic grammar generator script pydantic_models_to_grammar.py -- can you please clearly give the commands that you're using and steps to reproduce?

UYousafzai · 2024-03-15T13:43:22Z

@HanClinto thank you for responding.
I am not even sure if this is a bug, I presumed the float's were actually float data type and I think now that instead its just explaining a part of the grammar (decomposing the entire grammar into pieces)

my question now is that given the grammar text above, is the grammar object consistent and inline with the behavior as it should be expected? because I saw floats and I presumed they are float datatypes which was nowhere to be found in my grammar text itself.

HanClinto · 2024-03-15T18:57:00Z

I am not even sure if this is a bug, I presumed the float's were actually float data type and I think now that instead its just explaining a part of the grammar (decomposing the entire grammar into pieces)

That is correct -- that's exactly what it's doing.

my question now is that given the grammar text above, is the grammar object consistent and inline with the behavior as it should be expected? because I saw floats and I presumed they are float datatypes which was nowhere to be found in my grammar text itself.

Yes -- that's correct. Even though "float" isn't being referenced anywhere, you're still defining it in your grammar, so the parser is breaking it up into its component pieces. If you took your input grammar string and edited it to rename "float" to something like "uyousafzai", then you would see it rename everything from "float_14" and "float_16" to "uyousafzai_14" and "uyousafzai_16" -- everything about this parsed grammar looks correct to me.

float ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)? ws
to
uyousafzai ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)? ws

UYousafzai added the bug-unconfirmed label Mar 7, 2024

UYousafzai mentioned this issue Mar 7, 2024

Always returning Cache'd Grammar Object abetlen/llama-cpp-python#1261

Closed

4 tasks

UYousafzai closed this as completed Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grammar Splitting words into characters #5925

Grammar Splitting words into characters #5925

UYousafzai commented Mar 7, 2024

HanClinto commented Mar 14, 2024

UYousafzai commented Mar 15, 2024

HanClinto commented Mar 15, 2024

Grammar Splitting words into characters #5925

Grammar Splitting words into characters #5925

Comments

UYousafzai commented Mar 7, 2024

HanClinto commented Mar 14, 2024

UYousafzai commented Mar 15, 2024

HanClinto commented Mar 15, 2024