-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Context/State? #164
Comments
In my specific case, I have a language which has "statements" - all of which follow the same structure: kind, argument, then children. But which children (and how many of those children) are allowed depends on the kind of statement. For example, not all statements allow a It would be nice if the parsing process was able to keep track of how many statements of that type it has encountered in this scope, and produce an "unexpected" error if it encounters an extra one. |
If this is not currently possible, would you be opposed if it was added? (I'd be willing to do most of the work) In general, it would consist of:
public struct Result<TUserState, TResult> { /* Implementation ommitted */ }
public delegate Result<TUserState, TResult> TextParser<TUserState, TResult>(TextSpan input, TUserState userState); In an ideal world, the non-user state methods would simply redirect to the user state methods. For example (assuming the necessary methods exist): static TextParser<char> Matching(Func<char, bool> predicate, string[] expectations)
=> Matching<Unit>(predicate, expectations).WithoutState();
static TextParser<char> Matching(Func<char, bool> predicate, string[] expectations)
{
if (predicate == null) throw new ArgumentNullException(nameof(predicate));
if (expectations == null) throw new ArgumentNullException(nameof(expectations));
return input =>
{
var next = input.ConsumeChar();
if (!next.HasValue || !predicate(next.Value))
return Result.Empty<char>(input, expectations);
return next;
};
} But, if performance reasons dictate that doesn't happen (e.g., extra closures), then a copy/paste with the necessary modifications would also work, albeit with more code. |
Hi! Currently there are a few ways to implement this, I'm not sure they're quite as good as a built-in feature, but I don't think the gap is enough to warrant broad modifications at this point so hoping this helps:
If you declare your parsers as instance fields or properties of a class (instead of statics), you can use other fields in the class to hold state - in plain variables, or stacks, if needed. A bit disruptive in how the parser is implemented, and needs some additional per-parse allocation, but pretty flexible. A variation on this is to use
Linq expressions (nested closures) make it fairly easy to carry state from one parser to another, later one: ... MyParser { get; } =
from a in ParserWithStateInReturn
// ... more parsers in sequence
from z in ParserThatNeedsState(a.State)
// ...
This is probably the nicest option in the long term, if your grammar is not truly context-sensitive: just allow the parse, and then validate the resulting AST before moving on to further processing. Might take some work to produce good error messages (things that may fail validation will need to be annotated with positions/line numbers). When I say "context-sensitive" here I mean whether the syntax structure actually changes depending on context. If your parser needs to handle one language within another, or something of similar ferocious complexity, you can't usually get around that without a context-dependent parser. But, if the grammatical structure is still essentially the same in the child parsers you mention, regardless of their parent, then semantic analysis is the best thing to reach for. HTH! |
Ah! That's an interesting idea. Not sure how I feel about this though... not to mention any implications that may arise from it.
Then I'd essentially end up allocating 60-100 objects for parsing a single file, because each of my statement parsers would be an allocation (there's ~60), plus any ancillary parsers that I make to assist. Which can be avoided by just passing a state value throughout. Yes, I'd have one allocation (the state value), and even then, that could possibly be avoided if it's a struct.
Both of my use cases involve a language within a language. Yes, I can work around it, but it's annoying, knowing that a simple state value passed through would solve the problem. IMO, without a state capability, post-parse semantic analysis is the only option. And I've got enough semantic analysis I gotta deal with, I was hoping to have less passes, not more. |
Seems like the one to explore, though - no added overhead for existing parsers, and no code changes required, mean it'll be a lot quicker to pull together. Built-in user parsing state is a useful feature, but with other options within reach I don't think it'd motivate a change this broad at this stage. |
FParsec (an F# parser combinator library) has the concept of "user state", which can be used to maintain state between parsers.
Is there any such concept in Superpower? If not, it makes it harder to do grammars that aren't context-free.
The text was updated successfully, but these errors were encountered: