Extract directive parsing into subparser #3020

mvriel · 2021-09-24T15:20:59Z

No description provided.

As a first step into migrating Directive handling to a subparser, I have created the Subparser but limited its interactions to the creation/init part. Since the implementation currently creates a Directive Handler and assigns that to a property in the DocumentParser, I can get away with this slim step. Subsequent commits will migrate more into the Subparser.

ContentBlocks In this change, I had to make a couple more extensive changes in order for Directives to work. Directives are treated as a sort of pushdown system where as a property directiveParser has a non-null value; then the next Code node is added as a ContentBlock. This too should be changed in a future commit to properly support nested / tree parsing but we need to take small steps.

After having reworked a series of Node types into substates, I have been able to get a glimpse of the larger system. It seems, the original author attempted a line-based parser that could backtrack (hence, why it could re-parse every line until a parse was successful). During research on how others have approached and fixed this issue, I found references to LL(1) or Recursive Descend Parsers without backtracking and with 1 token look ahead. When I compare that method, it seems very suitable in this situation and requires a bit of rework to introduce "productions". This term is taken from LL(1) parser theory to indicate a means to start at a given token (line in our case) and construct a product or AST node by moving the cursor forward. This deviates from the original implementation where the movement of the cursor was driven by a foreach statement. By changing this to a while statement it is not a problem is productions themselves move the cursor forward.

The title is tightly coupled to sections, which are structural elements, and through this it means that we have to rework the parser a bit. In this change, I have promoted the DocumentParser to a Production Rule as a first step towards nested production rules and I have reworked parts so that section handling is actually done through the TitleRule. This is not the final phase, but I had to reduce the complexity of the refactoring. After this commit, the parser is in a broken state unfortunately because it is no longer possible to use a strangler pattern to keep it working with old and new components. Subsequent commits will restore functionality by moving more and more pieces of the parser into production rules.

During implementation, I found out I had to tweak parts of the recognition of characters for quote rules because of issues with whitelines. Since a QuoteBlock recreates a parser and passes its contents through it (to be changed), it ended up in an infinite loop when the content merely contained an empty string.

As the next step in finalizing the design of the parser for Restructured Text, I am migrating directive handling to a production rule. This is also a good example how we can re-use other Production Rules to get a compound rule. The implementation is a bit rough, and there are one or two bugs remaining (I lost the last line of text in a note?) but it is a good starting point

src/Guides/RestructuredText/Parser/DocumentIterator.php

src/Guides/RestructuredText/Parser/DocumentParser.php

jaapio

I didn't look for bugs but just at the bigger picture. And I think I can follow what you are doing here.
Having a production for each state which is able to move the iterator forward makes a lot of sense to me. A minor issue would be that a production rule is able to step back. I would not allow this in the iterator at all. Each line must be peeked by a look ahead or consumed. If we would allow rules to move before they were triggered we could get some odd bugs.

src/Guides/RestructuredText/Parser/DocumentIterator.php

src/Guides/RestructuredText/Parser/DocumentParser.php

src/Guides/RestructuredText/Parser/DocumentIterator.php

wouterj · 2021-09-29T08:40:24Z

Interesting! I'll have a closer look at these changes this week.
This is getting closer to the state machine parser used by docutils, which should allow following the specs easier.

mvriel added 19 commits September 24, 2021 13:50

Moved more of the Directive parsing's state into a Subparser

67ff5a1

Extract Separator into a Subparser

08b89f3

Simplified BEGIN and NORMAL state

484a50c

Migrate BlockQuote to new LL(1) based parsing

d260e08

Convert list parsing into a Production

a17f5b4

Migrate comment parsing onto new parser design

74f418e

Move Link creation to new LL(1) structure

e7ec6f3

Migrate DefinitionList to a Production Rule

56c336d

Clean up DocumentParser a bit more

9b5890c

Re-attach the last line if a Paragraph does not trigger a code block

95618c9

Add support for parsing tables

a2c6582

Clean up and add links

afbc5c3

Reintroduce transitions / separators

556c8b0

mvriel marked this pull request as ready for review September 28, 2021 19:39

jaapio reviewed Sep 28, 2021

View reviewed changes

src/Guides/RestructuredText/Parser/DocumentIterator.php Outdated Show resolved Hide resolved

jaapio reviewed Sep 28, 2021

View reviewed changes

src/Guides/RestructuredText/Parser/DocumentParser.php Outdated Show resolved Hide resolved

Quality of life improvements

c5fc6b3

jaapio reviewed Sep 28, 2021

View reviewed changes

mvriel added 2 commits September 28, 2021 22:32

ename DocumentIterator to LinesIterator

1257110

Extract Document interpretation to a Rule

d9cd803

jaapio approved these changes Sep 29, 2021

View reviewed changes

jaapio merged commit 79bee4b into master Sep 29, 2021

mvriel mentioned this pull request Sep 29, 2021

Let's talk about joined effort doctrine/rst-parser#156

Open

jaapio deleted the feature/extract-directive-parsing-into-subparser branch September 29, 2021 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Extract directive parsing into subparser #3020

Extract directive parsing into subparser #3020

Uh oh!

mvriel commented Sep 24, 2021

Uh oh!

Uh oh!

Uh oh!

jaapio left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wouterj commented Sep 29, 2021

Uh oh!

Uh oh!

Uh oh!

Extract directive parsing into subparser #3020

Extract directive parsing into subparser #3020

Uh oh!

Conversation

mvriel commented Sep 24, 2021

Uh oh!

Uh oh!

Uh oh!

jaapio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wouterj commented Sep 29, 2021

Uh oh!

Uh oh!