-
-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Railroad, Syntax diagrams, EBNF
This is an extension and partially a compilation of what has been discussed here
We would like to add diagrams displaying syntax structure by parsing formal grammar definitions.
The idea is not new, there are many different tools and alternatives
Kind | Source code or link to the page | Live |
---|---|---|
Visualizer | https://github.com/jacquev6/DrawGrammar | https://jacquev6.github.io/DrawGrammar/ |
Visualizer | https://github.com/GuntherRademacher/rr | https://bottlecaps.de/rr/ui |
Visualizer | http://dotnet.jku.at/applications/visualizer/ | |
Parser | https://www.icosaedro.it/bnf_chk/ | |
Renderer | https://www.npmjs.com/package/railroad-diagrams |
We have a couple of standards and different notations out there, so we should probably support as many as we can at once
It does not perform left-reduction
In RR 'a'+?
and 'a'?+
are effectively reduced to 'a'*
, and ('a'|)+
reduced to ('a'|)+
but not to the 'a'*
, although they are effectively the same. It seems that RR needs like 2 steps for that reduction.
It also cannot compress a|a*
although it obviously equals to a+
Instead of binary concatenation operator
graph
c1[,] --> A
c2[,] --> B
c2[,] --> C
c1 --> c2
Lets use multiple operands
graph
c1[,]
c1 --> A
c1 --> B
c1 --> C
Author of RR suggest to reduce AST down to two operators.
- alternative
- one or more
See the longer example:
rule ::= 'a'? 'b'+ 'c'* 'd'?+ 'e'?* 'f'+* ('g'|) ('h'|)+ ('i'|)* ('w'|)+* 'x' y 'z'
becomes
[rule]::= 'a'? 'b'+ 'c'* 'd'* 'e'* 'f'* 'g'? 'h'?+ 'i'?* 'w'?* 'x' [y] 'z'
I have not come to a conclusion yet, there are some cases in this implementation that are really tricky, but some, that are relatively simple are not processed.
Currently I am trying to figure out, should "one or many" be a node or a "transition" just like empty transition.
May be it'll make something simpler, may be not. In that case we probably could use only alternative, and definition representing "list of consequent terms and non terms".
%e
- epsilon, empty transition from the start
state to the end
state
%r
- repetition, empty transition from the end
to the start
state
x
or 1
- only one
x?
or x|%e
- zero or one
x+
or x|%r
- one or many
x*
or x|%e|%r
- zero or many
uncompressed => compressed
We can represent quantifiers as alternatives, or as alternatives with repetitions
a? => a | %e
a+ => a | %r
a* => a | %e | %r
Two quantifiers one after another does not meed being 'greedy' (like in regexp)
(a?)? => a | %e | %e => a?
(a?)+ => a | %e | %r) => a*
(a?)* => a | %e | %e | %r => a?
(a+)? => a | %r | %e => a*
(a+)+ => a | %r | %r => a+
(a+)* => a | %r | %r | %e => a*
// because a* contains %e and %r it always produces a*
a*? => a*
a*+ => a*
a** => a*
Alternatives can unite its elements
a | a? => a | a | %e => a?
a | a+ => a | a | %r => a+
a | a* => a | a | %e | %r => a*
Concatenation can unite its elements single element + quantifier
a a? a => a a? a
a a* a => a a+
a a+ a => a a a+
quantifier + quantifier
a? a? => a{0,2} # we do not support this
a? a? => a? a?
a? a+ => a? a+
a? a* => (a|%e) (a|%e)* => a*
a* a* => (a|%e)* (a|%e)* => a*
a+ a+ => a a* a a* => a a+
a+ a* => a*
a* a+ => a*
Unwrap and unite alternatives
(a) => a
a|(b|c) => a|b|c
Common prefixes
TODO
We need to
- eliminate recursion if possible
- apply left-reduction, search for common prefixes and suffixes in long rules e.g.
a b c d | a b
becomesa b (c d)?
rule: 'a' rule | %e => rule: a*
rule ::= 'a' rule | 'a' => rule: a+
alt ::= a(b a)*
alt ::= a b c d | b | c d