Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brace use in productions. #108

Open
kfsone opened this issue Feb 26, 2021 · 9 comments
Open

Brace use in productions. #108

kfsone opened this issue Feb 26, 2021 · 9 comments

Comments

@kfsone
Copy link
Contributor

kfsone commented Feb 26, 2021

Tripped myself on subtle difference between token and production definitions :)

identifier: 'a'-'z' { 'a'-'z' };

File: Enum;
Enum: "enum" identifier "{" { identifier } "}";

->

Parse error: Error: {(13) { @ 4:29, expected one of: | ; g_sdt_lit prodId tokId string_lit

which is the unquoted open brace:

Enum: "enum" identifier "{" { identifier } "}";
                             ^
12345678901234567890123456789

Obviously, I need to use the alternate structure here, but I'm just curious if it wouldn't actually just make sense to have that effect achieved by introducing the use of '{' in productions anyway?

@awalterschulze
Copy link
Collaborator

Good question :)

It actually was considered to add a zero or more type operator to the parse production rules, but this made it more complicated to specify the syntax direction translation rules, so for the sake of simplicity this was left out.

@kfsone
Copy link
Contributor Author

kfsone commented Feb 27, 2021

Aye, in the process of converting my grammar over from my homebrew parser, I found the absence of square-bracket a little more frustrating. It felt like something that ought to be feasible as syntactic sugar... I.e writing:

    Element: Value [ "," ];

could be treated as

    Element: Value "," | Value;

@awalterschulze
Copy link
Collaborator

This is a great example of where the SDT rules would be awkward.
Try to include SDT rules in your examples, maybe I am wrong.

@kfsone
Copy link
Contributor Author

kfsone commented Feb 28, 2021

[edit: after reading the gocc2.bnf I'm guesisng 'sdt' specifically refers to the <<...>> directives; I'll write a follow-up]

Sure, something like this?

List: Element | List Element;
Element: Value "," | Value;

I'll try to swing back to this and look at the code so I can see if how I'm now thinking it might be implemented is feasible, but

R: P [ n ];

would effectively be internally mapped to

R: __R0 | __R1;
__R0: P n;
__R1: P;

// so my example
List: Element | List Element;
Element: Value "," | Value;

// becomes
List: Element | List Element;
Element: Value [ "," ];

// produces the same result as
List: Element | List Element;
Element: __Element0 | __Element1;
__Element0: Value ",";
__Element1: Value;

Pardon my oafishness - self-taught and aside from toy parsers/compilers for small dsls I haven't worked on a real parser in anger since I wrote a mud language+engine where the compiler produced an abstract grammar that the engine subsequently used to drive a bottom-up parser to interpret player input ('plant the big plant in the little plant pot and pot the little plant with the big potted plant' [spot the catch :)]).

@kfsone
Copy link
Contributor Author

kfsone commented Feb 28, 2021

After reading the gocc2, I think you're referring to trying to capture the "optional" field in a production:

ClassDef: "class" identifier OptionalParent Body << ast.NewClass($1, $2, $3) >>;
OptionalParent : ":" identifier | empty;

vs

ClassDef: "class" identifier [ Parent ] Body << ast.NewClass($1, $??, $??) >>

If "[...]" is replaced with a logical substitute, then "[ Parent ]" would remain $2 regardless, it would just have a nil value when none was provided, so it would still be treated exactly as

ClassDef : "class" identifier __optional__Parent Body  <<  ast.NewClass($1, $2, $3) >>;

__optional__Parent : Parent << $0, nil >> | empty << nil, nil >>;

The precedent for this is "anonymous terminals", where gocc allows

ClassDef: "class" ...

instead of requiring

class_keyword: "class";

ClassDef: class_keyword identifier ...

@kfsone
Copy link
Contributor Author

kfsone commented Feb 28, 2021

I can see cases where a naive approach would cause problems:

// looking at you, Guido.
import : "import" [ identifier string_lit | string_lit "as" identifier ];

obvious but flawed workarounds:

  • pad the attrib count to match worst case, let the user figure the conext themselves: lots of surprises for beginners :(
  • require each branch have the same attrib count: will have arbitrary usage feel and still surprise users with order of params,

or:

  • disallow | in Lexical []s: it's a small but incredibly useful convenience for a lot of super-common cases, the effect on attribs is relatively predictable for learners.

@awalterschulze
Copy link
Collaborator

I'm guesisng 'sdt' specifically refers to the <<...>> directives

Yes exactly

@awalterschulze
Copy link
Collaborator

After reading the gocc2, I think you're referring to trying to capture the "optional" field in a production:

ClassDef: "class" identifier OptionalParent Body << ast.NewClass($1, $2, $3) >>;
OptionalParent : ":" identifier | empty;

vs

ClassDef: "class" identifier [ Parent ] Body << ast.NewClass($1, $??, $??) >>

If "[...]" is replaced with a logical substitute, then "[ Parent ]" would remain $2 regardless, it would just have a nil value when none was provided, so it would still be treated exactly as

ClassDef : "class" identifier __optional__Parent Body  <<  ast.NewClass($1, $2, $3) >>;

__optional__Parent : Parent << $0, nil >> | empty << nil, nil >>;

The precedent for this is "anonymous terminals", where gocc allows

ClassDef: "class" ...

instead of requiring

class_keyword: "class";

ClassDef: class_keyword identifier ...

I think this might work for the optional case, not sure about all implications, but at least SDT rules look nice.

@awalterschulze
Copy link
Collaborator

I can see cases where a naive approach would cause problems:

// looking at you, Guido.
import : "import" [ identifier string_lit | string_lit "as" identifier ];

obvious but flawed workarounds:

  • pad the attrib count to match worst case, let the user figure the conext themselves: lots of surprises for beginners :(
  • require each branch have the same attrib count: will have arbitrary usage feel and still surprise users with order of params,

or:

  • disallow | in Lexical []s: it's a small but incredibly useful convenience for a lot of super-common cases, the effect on attribs is relatively predictable for learners.

Yes I think already | is only allowed at the top level in the parser part of the bnf, so then this shouldn't be a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants