Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming Parsing (from Lexer to Parser) #288

Closed
jasonwilliams opened this issue Mar 30, 2020 · 9 comments · Fixed by #559
Closed

Streaming Parsing (from Lexer to Parser) #288

jasonwilliams opened this issue Mar 30, 2020 · 9 comments · Fixed by #559
Labels
enhancement New feature or request help wanted Extra attention is needed parser Issues surrounding the parser
Milestone

Comments

@jasonwilliams
Copy link
Member

jasonwilliams commented Mar 30, 2020

The way Boa works right now is that the Lexer will run till completion. Then once an array of tokens is filled this is sent over the parser.

However the parser doesn't need to wait idle whilst the Lexer is running, the parser can begin working through the tokens as they come through..

Completely inspired from how Go parses its templates, Rob talks about it in more detail here:
https://youtu.be/HxaD_trXwRE?t=960

Parser Cursor

Does the cursor need to change?
Can it work in its current state? @Razican is working on the updated cursor right now.
jasonwilliams#287

Can Rust's Async/Await help here?

Maybe, i'm not sure.

Diagram

Blank Diagram (1)

@Razican @HalidOdat thoughts?

@jasonwilliams jasonwilliams added enhancement New feature or request help wanted Extra attention is needed labels Mar 30, 2020
@Razican
Copy link
Member

Razican commented Mar 30, 2020

About the cursor, it might need to return a Result on each call, but for the rest, the external API could stay without much change.

@Razican
Copy link
Member

Razican commented Mar 31, 2020

Something we might need would be to add an EOF token to the stream, if we don't want to keep the parser waiting for more tokens indefinitely. Or, on the other hand, we might want exactly that.

Doing this in multiple threads could be done by lexing in one thread, parsing in another and executing in another. Doing concurrent parser might prove to be more difficult, though.

@jasonwilliams jasonwilliams added the parser Issues surrounding the parser label Mar 31, 2020
@jasonwilliams
Copy link
Member Author

The lexer can close the channel once it's finished, this will signal to the parser there is nothing more to come down the pipe.

@jasonwilliams
Copy link
Member Author

It might be due to the way parsing and lexing are related (goal symbols) this may not be possible.

Related:
jasonwilliams#294

@Razican
Copy link
Member

Razican commented Apr 13, 2020

It might be due to the way parsing and lexing are related (goal symbols) this may not be possible.

I think that what can be done is for the parser to request lexing of new tokens to the lexer via calls to next() in the cursor.

@jasonwilliams
Copy link
Member Author

I may close this seeing as we can't do it that way dur to the lexer needing context

@Razican
Copy link
Member

Razican commented May 2, 2020

I may close this seeing as we can't do it that way dur to the lexer needing context

Actually, I think this is now more important than ever!

Instead of "streaming" we could do "lazy" lexing, and making the parser iterate over it, while changing context as needed. @maciejhirsz proposed to use Logos for this, and I think it makes sense. Not sure if there was any progress in that front.

@maciejhirsz
Copy link

I'm on holidays since yesterday, so I should have time to dig into Boa at last :)

Streaming (iterating) the tokens during parsing if anything makes things easier, since the Parser can supply the Lexer with context if need be. The main culprit in JS is the regex literal, and the rule here is fairly simple: any expression that begins with division token / (that is a unary prefix expression with operator /) should switch the lexer into regex mode. It's fairly easy to integrate that into standard recursive Pratt parsing for nested expressions.

@Razican
Copy link
Member

Razican commented May 9, 2020

I'm on holidays since yesterday, so I should have time to dig into Boa at last :)

Streaming (iterating) the tokens during parsing if anything makes things easier, since the Parser can supply the Lexer with context if need be. The main culprit in JS is the regex literal, and the rule here is fairly simple: any expression that begins with division token / (that is a unary prefix expression with operator /) should switch the lexer into regex mode. It's fairly easy to integrate that into standard recursive Pratt parsing for nested expressions.

Let us know if we can be of any help :)

@Lan2u Lan2u mentioned this issue Jul 26, 2020
@Razican Razican linked a pull request Aug 16, 2020 that will close this issue
@Razican Razican added this to the v0.10.0 milestone Sep 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed parser Issues surrounding the parser
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants