Brace use in productions. #108

kfsone · 2021-02-26T20:53:33Z

Tripped myself on subtle difference between token and production definitions :)

identifier: 'a'-'z' { 'a'-'z' };

File: Enum;
Enum: "enum" identifier "{" { identifier } "}";

->

Parse error: Error: {(13) { @ 4:29, expected one of: | ; g_sdt_lit prodId tokId string_lit

which is the unquoted open brace:

Enum: "enum" identifier "{" { identifier } "}";
                             ^
12345678901234567890123456789

Obviously, I need to use the alternate structure here, but I'm just curious if it wouldn't actually just make sense to have that effect achieved by introducing the use of '{' in productions anyway?

The text was updated successfully, but these errors were encountered:

awalterschulze · 2021-02-27T15:55:07Z

Good question :)

It actually was considered to add a zero or more type operator to the parse production rules, but this made it more complicated to specify the syntax direction translation rules, so for the sake of simplicity this was left out.

kfsone · 2021-02-27T19:55:09Z

Aye, in the process of converting my grammar over from my homebrew parser, I found the absence of square-bracket a little more frustrating. It felt like something that ought to be feasible as syntactic sugar... I.e writing:

    Element: Value [ "," ];

could be treated as

    Element: Value "," | Value;

awalterschulze · 2021-02-28T08:14:46Z

This is a great example of where the SDT rules would be awkward.
Try to include SDT rules in your examples, maybe I am wrong.

kfsone · 2021-02-28T19:05:47Z

[edit: after reading the gocc2.bnf I'm guesisng 'sdt' specifically refers to the <<...>> directives; I'll write a follow-up]

Sure, something like this?

List: Element | List Element;
Element: Value "," | Value;

I'll try to swing back to this and look at the code so I can see if how I'm now thinking it might be implemented is feasible, but

R: P [ n ];

would effectively be internally mapped to

R: __R0 | __R1;
__R0: P n;
__R1: P;

// so my example
List: Element | List Element;
Element: Value "," | Value;

// becomes
List: Element | List Element;
Element: Value [ "," ];

// produces the same result as
List: Element | List Element;
Element: __Element0 | __Element1;
__Element0: Value ",";
__Element1: Value;

Pardon my oafishness - self-taught and aside from toy parsers/compilers for small dsls I haven't worked on a real parser in anger since I wrote a mud language+engine where the compiler produced an abstract grammar that the engine subsequently used to drive a bottom-up parser to interpret player input ('plant the big plant in the little plant pot and pot the little plant with the big potted plant' [spot the catch :)]).

kfsone · 2021-02-28T20:42:46Z

After reading the gocc2, I think you're referring to trying to capture the "optional" field in a production:

ClassDef: "class" identifier OptionalParent Body << ast.NewClass($1, $2, $3) >>;
OptionalParent : ":" identifier | empty;

vs

ClassDef: "class" identifier [ Parent ] Body << ast.NewClass($1, $??, $??) >>

If "[...]" is replaced with a logical substitute, then "[ Parent ]" would remain $2 regardless, it would just have a nil value when none was provided, so it would still be treated exactly as

ClassDef : "class" identifier __optional__Parent Body  <<  ast.NewClass($1, $2, $3) >>;

__optional__Parent : Parent << $0, nil >> | empty << nil, nil >>;

The precedent for this is "anonymous terminals", where gocc allows

ClassDef: "class" ...

instead of requiring

class_keyword: "class";

ClassDef: class_keyword identifier ...

kfsone · 2021-02-28T20:57:00Z

I can see cases where a naive approach would cause problems:

// looking at you, Guido.
import : "import" [ identifier string_lit | string_lit "as" identifier ];

obvious but flawed workarounds:

pad the attrib count to match worst case, let the user figure the conext themselves: lots of surprises for beginners :(
require each branch have the same attrib count: will have arbitrary usage feel and still surprise users with order of params,

or:

disallow | in Lexical []s: it's a small but incredibly useful convenience for a lot of super-common cases, the effect on attribs is relatively predictable for learners.

awalterschulze · 2021-03-14T12:30:39Z

I'm guesisng 'sdt' specifically refers to the <<...>> directives

Yes exactly

awalterschulze · 2021-03-14T12:33:23Z

After reading the gocc2, I think you're referring to trying to capture the "optional" field in a production:
ClassDef: "class" identifier OptionalParent Body << ast.NewClass($1, $2, $3) >>;
OptionalParent : ":" identifier | empty;
vs
ClassDef: "class" identifier [ Parent ] Body << ast.NewClass($1, $??, $??) >>
If "[...]" is replaced with a logical substitute, then "[ Parent ]" would remain $2 regardless, it would just have a nil value when none was provided, so it would still be treated exactly as
ClassDef : "class" identifier __optional__Parent Body  <<  ast.NewClass($1, $2, $3) >>;

__optional__Parent : Parent << $0, nil >> | empty << nil, nil >>;
The precedent for this is "anonymous terminals", where gocc allows
ClassDef: "class" ...
instead of requiring
class_keyword: "class";

ClassDef: class_keyword identifier ...

I think this might work for the optional case, not sure about all implications, but at least SDT rules look nice.

awalterschulze · 2021-03-14T12:34:20Z

I can see cases where a naive approach would cause problems:
// looking at you, Guido.
import : "import" [ identifier string_lit | string_lit "as" identifier ];
obvious but flawed workarounds:

pad the attrib count to match worst case, let the user figure the conext themselves: lots of surprises for beginners :(

require each branch have the same attrib count: will have arbitrary usage feel and still surprise users with order of params,

or:

disallow | in Lexical []s: it's a small but incredibly useful convenience for a lot of super-common cases, the effect on attribs is relatively predictable for learners.

Yes I think already | is only allowed at the top level in the parser part of the bnf, so then this shouldn't be a problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Brace use in productions. #108

Brace use in productions. #108

kfsone commented Feb 26, 2021

awalterschulze commented Feb 27, 2021

kfsone commented Feb 27, 2021

awalterschulze commented Feb 28, 2021

kfsone commented Feb 28, 2021 •

edited

Loading

kfsone commented Feb 28, 2021 •

edited

Loading

kfsone commented Feb 28, 2021

awalterschulze commented Mar 14, 2021

awalterschulze commented Mar 14, 2021

awalterschulze commented Mar 14, 2021

Brace use in productions. #108

Brace use in productions. #108

Comments

kfsone commented Feb 26, 2021

awalterschulze commented Feb 27, 2021

kfsone commented Feb 27, 2021

awalterschulze commented Feb 28, 2021

kfsone commented Feb 28, 2021 • edited Loading

kfsone commented Feb 28, 2021 • edited Loading

kfsone commented Feb 28, 2021

awalterschulze commented Mar 14, 2021

awalterschulze commented Mar 14, 2021

awalterschulze commented Mar 14, 2021

kfsone commented Feb 28, 2021 •

edited

Loading

kfsone commented Feb 28, 2021 •

edited

Loading