Skip to content

Grammar Splitting words into characters #5925

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
UYousafzai opened this issue Mar 7, 2024 · 3 comments
Closed

Grammar Splitting words into characters #5925

UYousafzai opened this issue Mar 7, 2024 · 3 comments

Comments

@UYousafzai
Copy link

So I have been getting a weird grammar object, not sure if this is how I should be receiving it.

##My original Grammar string is as following
root ::= (" "| "\n") grammar-models
grammar-models ::= more
more ::= "{" "\n" ws ""title"" ":" ws string "," "\n" ws ""author"" ":" ws string "\n" ws "}"
boolean ::= "true" | "false"
null ::= "null"
string ::= """ (
[^"\\] |
"\" (["\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
)* """ ws
ws ::= ([ \t\n] ws)?
float ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)? ws
integer ::= [0-9]+

##Grammar Object Construction Printouts:

from_string grammar:
root ::= root_1 grammar-models
root_1 ::= [ ] | [<U+000A>]
grammar-models ::= more
more ::= [{] [<U+000A>] ws ["] [t] [i] [t] [l] [e] ["] [:] ws string [,] [<U+000A>] ws ["] [a] [u] [t] [h] [o] [r] ["] [:] ws string [<U+000A>] ws [}]
ws ::= ws_12
string ::= ["] string_10 ["] ws
boolean ::= [t] [r] [u] [e] | [f] [a] [l] [s] [e]
null ::= [n] [u] [l] [l]
string_8 ::= [^"] | [] string_9
string_9 ::= ["/bfnrt] | [u] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]
string_10 ::= string_8 string_10 |
ws_11 ::= [ <U+0009><U+000A>] ws
ws_12 ::= ws_11 |
float ::= float_14 float_20 float_24 ws
float_14 ::= float_15 float_16
float_15 ::= [-] |
float_16 ::= [0-9] | [1-9] float_17
float_17 ::= [0-9] float_17 |
float_18 ::= [.] float_19
float_19 ::= [0-9] float_19 | [0-9]
float_20 ::= float_18 |
float_21 ::= [eE] float_22 float_23
float_22 ::= [-+] |
float_23 ::= [0-9] float_23 | [0-9]
float_24 ::= float_21 |
integer ::= integer_26
integer_26 ::= [0-9] integer_26 | [0-9]

Not sure if this is meant to be this way but I tried running the same script but with strings and the output was different, however when I changed the data type to float to play around it formatted it this way.

Now the real problem is after reverting it back to string type it still prints these floats objects out.

@HanClinto
Copy link
Collaborator

I just read your original bug over on abetlen/llama-cpp-python#1261 , and I think I might have been confused about the bug you were reporting here.

It sounds like your issue is with the Pydantic grammar generator script pydantic_models_to_grammar.py -- can you please clearly give the commands that you're using and steps to reproduce?

@UYousafzai
Copy link
Author

@HanClinto thank you for responding.
I am not even sure if this is a bug, I presumed the float's were actually float data type and I think now that instead its just explaining a part of the grammar (decomposing the entire grammar into pieces)

my question now is that given the grammar text above, is the grammar object consistent and inline with the behavior as it should be expected? because I saw floats and I presumed they are float datatypes which was nowhere to be found in my grammar text itself.

@HanClinto
Copy link
Collaborator

I am not even sure if this is a bug, I presumed the float's were actually float data type and I think now that instead its just explaining a part of the grammar (decomposing the entire grammar into pieces)

That is correct -- that's exactly what it's doing.

my question now is that given the grammar text above, is the grammar object consistent and inline with the behavior as it should be expected? because I saw floats and I presumed they are float datatypes which was nowhere to be found in my grammar text itself.

Yes -- that's correct. Even though "float" isn't being referenced anywhere, you're still defining it in your grammar, so the parser is breaking it up into its component pieces. If you took your input grammar string and edited it to rename "float" to something like "uyousafzai", then you would see it rename everything from "float_14" and "float_16" to "uyousafzai_14" and "uyousafzai_16" -- everything about this parsed grammar looks correct to me.

float ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)? ws
to
uyousafzai ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)? ws

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants