Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repetition with whitespaces as separator #524

Closed
sunng87 opened this issue Jul 3, 2021 · 1 comment
Closed

Repetition with whitespaces as separator #524

sunng87 opened this issue Jul 3, 2021 · 1 comment

Comments

@sunng87
Copy link
Contributor

sunng87 commented Jul 3, 2021

This might be related to #271 and it's a real world case in handlebars

consider the rules

WHITESPACE = _{ " " }
name = @{ "_"? ~ ('a'..'z')+ }
param = { name }
hash = { name ~ "=" ~ name }
v = { hash | param }
exp = { "{{" ~ name ~ v+ ~ "}}"}

The rule exp works great to match expression like {{echo a b c d=1}} until there is an invalid name within it, like {{echo a b c_d}}. Note that in our name rule, underscore is only allowed as a leading character. However, in this case it still matches exp, with c and _d parsed as two params. What I expect is a failure of matching.

So I changed exp to below to ensure whitespaces between params.

exp = { "{{" ~ name ~ v ~ (WHITESPACE+ ~ v)* ~ "}}"}

However, it fails to match basic case like {{echo a b}}, and says

 --> 1:8
  |
1 | {{echo a b}}
  |        ^---
  |
  = expected hash

which is confusing. So before #271 landing (seems never to happen) , is there any workaround for my case?

@xelivous
Copy link

This is more or less how it works as I see it:

WHITESPACE = _{ " " }

name = _{ (ASCII_ALPHANUMERIC | "_")+ }

command = @{ name }
param = @{ name }
value = @{ name }
hash = { param ~ "=" ~ value }

parameter = _{ hash | param }

exp = { "{{" ~ command ~ parameter+? ~ "}}" }

all = { (exp ~ NEWLINE?)+ }

If you swap the ruleset over to "all" it will match all of these expressions:

{{ echo a b c d=1 }}
{{echo a b=2 c_d = 39 }}
{{tes__t}}

The resulting parse tree is:

- all
  - exp
    - command: "echo"
    - param: "a"
    - param: "b"
    - param: "c"
    - hash
      - param: "d"
      - value: "1"
  - exp
    - command: "echo"
    - param: "a"
    - hash
      - param: "b"
      - value: "2"
    - hash
      - param: "c_d"
      - value: "39"
  - exp > command: "tes__t"

If you add a $ before the brackets for hash (hash = ${ param ~ "=" ~ value }) then you no longer allow whitespace in-between the = signs. AKA

{{echo a b=2 c_d = 39 }} <- this fails
{{echo a b=2 c_d=39 }} <- this is fine

And you can further customize it by using (hash = ${ param ~ WHITESPACE? ~ "=" ~ WHITESPACE? ~ value }), which has optional whitespace on both sides of the = sign. If you only wanted specific kinds of whitespace on a specific kind that is possible as well. Now all of the examples supplied are parsed again.

@tomtau tomtau converted this issue into discussion #625 Jul 10, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants