-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why is the start rule feature undocumented? #77
Comments
Sorry for responding slowly. It's documented that the first rule is the "start" rule. Can you spell out a bit more what problem you're facing? |
Thanks for answering.
Looking at the code I found that Pegex::Grammar has an undocumented
attribute `start_rules` which takes an arrayref of alternative start rules,
and the `parse` method of Pegex::Parser takes the name of one of these
rules as an undocumented third (second not counting the invocant) argument
and will use the named rule as start rule if that argument is present.
This feature is useful for me because the DSL I'm working on takes a path
through a data structure as part of its main input but also allows
bash-like indirection where a path can be fetched from a value in the data
structure itself, parsed and resolved while the AST is being evaluated.
Since the syntax for these dynamically obtained paths is the same as for
paths in the main input, and hence a subset of the main grammar can be used
to parse them, it makes sense to use the undocumented start rule feature
for this. It also makes development and maintenance a lot easier since I
can keep the whole grammar in a single file and a single module rather than
having the subset for parsing paths in a separate file and concatenating it
with the rest in order to parse the main grammar.
I have used this feature for several days now and it seems to be fully
functional. The only problem is that it is not part of the documented API
and so I'm worried that it might go away and that it may not be safe to
rely on it.
Den tis 24 mars 2020 08:45mohawk2 <notifications@github.com> skrev:
… Sorry for responding slowly. It's documented that the first rule is the
"start" rule. Can you spell out a bit more what problem you're facing?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#77 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAI3OU7FWH3TUSZRO5LFEXDRJBQIBANCNFSM4LLMZ2SA>
.
|
Why not make a PR to document it? @ingydotnet any thoughts? @bpj As an alternative thought, it sounds like you're taking quite a code-driven approach to your parsing. Have you considered a more data-driven approach whereby you produce the entire parse-tree (which means you'd only need to call |
@mohawk2 I think you don't understand. It has everything to do with my "approach" to the data.
There simply is no way both paths can be parsed during the same phase, because the secondary/indirect path does not yet "exist" during the main parsing phase. As a concrete example suppose the main text contains a path foo:
- some value
- bar: '/biz/buz/quux'
# presumably more data here
biz:
buz:
quux: The actual data
# presumably more data here
# presumably more data here When the evaluator sees the piece of the AST representing the primary path Since the syntax for specifying paths is the same in both cases it is only natural to parse the "secondary" path using the same grammar, but using the
Please see perlre and perluniprops documentation if you don't know what these escapes mean. Basically "A 'word- starts with a letter in the Unicode sense, followed by zero or more underscores/letters/numerics in the Unicode sense, possibly with following combining diacritical marks and possibly separated by dashes in the Unicode sense". While this is a regex it is complicated enough that I don't want to have to maintain it in more than one place! I hope this explains better what I mean. Note that English isn't my native language, which unfortunately may mean that I don't know the right words to use for some concepts. I'll be happy to take a stab at documenting the alternative start rule syntax if there is an interest. |
I don't understand why you wouldn't have a rule called something like
That way the original AST would contain the pathspec already parsed. Are you sure you're not solving the wrong problem here? :-) |
Of course the grammar for the whole language would contain the rule, and an AST from a parse of a whole text would contain the paths contained in that text, but some paths are fetched from elsewhere after the whole text/program has already been parsed. Now how would I parse a string containing a path fetched from elsewhere, which is not embedded in any other text without specifying the rule for parsing a path as the start rule instead of the top rule used when parsing a whole program? The problem is that I need to parse some strings using a subset of the grammar. I can't see how I can do that without either
I can't see what would be wrong with the second approach. I could of course set things up so that the grammar always parses either a whole text or a bare path, but that seems wrong, since sometimes I want a whole text and sometimes I want a bare path, but never either/or. |
My gut says that if you provide a suitable subset of your program, I can provide an answer. Please prove me wrong so we can justify this API change :-) |
I think you both misunderstand what It is a set of rules passed to the Pegex compiler. The compiler takes a textual Pegex grammar and turns it into a grammar object. That's phase 1. Then it does a combinate phase. It takes the starting rule and follows all the rule references and does certain combining effects. Any rule that is not reached in this process is removed from the grammar object. Note: they don't need to be removed but currently that's what happens. So Now there is a related concept in Pegex::Parser of a starting rule. Look in Pegex/Parser.pm and you'll see:
You can do a parse with the grammar using an alternate starting rule. This sounds like what you are trying to do. You only need the It doesn't sound like you need I hope I understood things right, and that this is helpful. |
You can do a parse with the grammar using an alternate starting rule. This
sounds like what you are trying to do.
Yes, that's what I'm doing, successfully. It's only that since the
alternate start rule feature is undocumented I was concerned that it may
not be fully functional — although after 2+ weeks of using it that concern
is gone — or that it might go away, so I'm mostly looking for assurance
that it won't go away before I depend on it. As I said I'd prefer not to
have to keep the subset of rules used for parsing path specifications in a
separate file/string since (a) keeping track of what should go where is an
extra hassle, and (b) keeping everything in one place makes inlining the
compiled grammar much less problematic.
You only need the starting_rules attribute if the compiler is throwing out
the non-default rules you need during its compile/combinate phase.
I understand that. I've been using the start rule feature for testing some
subsets of the grammar during development, and sometimes I had to use the
`starting_rules` attribute, but it's correct that I don't need it for path
specs, the subset that will be used as alternate starting point during
production.
|
The optional start rule feature will not go away. It should be documented. Freel free to make a pull request if you'd like to do that. |
Thanks! I'll look into making a pull request.
|
If it's OK I'll leave this issue open as a reminder until I've made that docu PR. |
Why is the start rule feature undocumented? I have a very good use case for it. Should I refrain from using it?
The text was updated successfully, but these errors were encountered: