-
-
Notifications
You must be signed in to change notification settings - Fork 642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract directive parsing into subparser #3020
Extract directive parsing into subparser #3020
Conversation
As a first step into migrating Directive handling to a subparser, I have created the Subparser but limited its interactions to the creation/init part. Since the implementation currently creates a Directive Handler and assigns that to a property in the DocumentParser, I can get away with this slim step. Subsequent commits will migrate more into the Subparser.
ContentBlocks In this change, I had to make a couple more extensive changes in order for Directives to work. Directives are treated as a sort of pushdown system where as a property directiveParser has a non-null value; then the next Code node is added as a ContentBlock. This too should be changed in a future commit to properly support nested / tree parsing but we need to take small steps.
After having reworked a series of Node types into substates, I have been able to get a glimpse of the larger system. It seems, the original author attempted a line-based parser that could backtrack (hence, why it could re-parse every line until a parse was successful). During research on how others have approached and fixed this issue, I found references to LL(1) or Recursive Descend Parsers without backtracking and with 1 token look ahead. When I compare that method, it seems very suitable in this situation and requires a bit of rework to introduce "productions". This term is taken from LL(1) parser theory to indicate a means to start at a given token (line in our case) and construct a product or AST node by moving the cursor forward. This deviates from the original implementation where the movement of the cursor was driven by a foreach statement. By changing this to a while statement it is not a problem is productions themselves move the cursor forward.
The title is tightly coupled to sections, which are structural elements, and through this it means that we have to rework the parser a bit. In this change, I have promoted the DocumentParser to a Production Rule as a first step towards nested production rules and I have reworked parts so that section handling is actually done through the TitleRule. This is not the final phase, but I had to reduce the complexity of the refactoring. After this commit, the parser is in a broken state unfortunately because it is no longer possible to use a strangler pattern to keep it working with old and new components. Subsequent commits will restore functionality by moving more and more pieces of the parser into production rules.
During implementation, I found out I had to tweak parts of the recognition of characters for quote rules because of issues with whitelines. Since a QuoteBlock recreates a parser and passes its contents through it (to be changed), it ended up in an infinite loop when the content merely contained an empty string.
As the next step in finalizing the design of the parser for Restructured Text, I am migrating directive handling to a production rule. This is also a good example how we can re-use other Production Rules to get a compound rule. The implementation is a bit rough, and there are one or two bugs remaining (I lost the last line of text in a note?) but it is a good starting point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look for bugs but just at the bigger picture. And I think I can follow what you are doing here.
Having a production for each state which is able to move the iterator forward makes a lot of sense to me. A minor issue would be that a production rule is able to step back. I would not allow this in the iterator at all. Each line must be peeked by a look ahead or consumed. If we would allow rules to move before they were triggered we could get some odd bugs.
Interesting! I'll have a closer look at these changes this week. |
No description provided.