-
-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(parser/html): fix whitespace being lexed as html literal #3908
fix(parser/html): fix whitespace being lexed as html literal #3908
Conversation
272e65f
to
fb07a79
Compare
CodSpeed Performance ReportMerging #3908 will not alter performanceComparing Summary
|
@@ -150,17 +153,11 @@ fn element() { | |||
} | |||
|
|||
#[test] | |||
fn element_with_text() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this test changed for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was removed because lexing it correctly requires changing contexts, which is not supported by the assert_lex!
macro. It was replaced with other tests that specify the context upfront.
/// Consume HTML text literals outside of tags. | ||
/// | ||
/// This includes text and single spaces between words. If newline or a second | ||
/// consecutive space is found, this will stop consuming and to allow the lexer to | ||
/// switch to `consume_whitespace`. | ||
fn consume_html_text(&mut self) -> HtmlSyntaxKind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a link to the spec that justify having newlines and double spaces parsed differently? It would be useful to have that link in the docs of the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found these and added them to the doc comment:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning is that trivia means that the token doesn't affect the functionality of the code. The primary function of an HTML document is to render in a web browser. If you remove a space between words in HTML text, it does not render the document invalid, and it changes how the document is rendered (aka its functionality), and is therefore not trivia. Multiple spaces are collapsed into a single space, so all those extra spaces are trivia.
fb07a79
to
96929a5
Compare
96929a5
to
7df8723
Compare
Summary
This fixes the HTML lexer treating whitespace trivia as
HTML_LITERAL
when it shouldn't.We end up creating less syntax nodes by treating all HTML text that would affect what is rendered on the screen as part of the same
HTML_LITERAL
. See the updated snapshots for examples of how this affects lexing and parsing.This also significantly restructures the HTML lexer, and it should be a bit more straightforward.
Test Plan
Added/updated tests