-
Notifications
You must be signed in to change notification settings - Fork 656
feat(rome_css_parser): CSS lexer #4682 #4684
Conversation
✅ Deploy Preview for docs-rometools canceled.
|
Parser conformance results on ubuntu-latestjs/262
jsx/babel
symbols/microsoft
ts/babel
ts/microsoft
|
// Interpret the hex digits as a hexadecimal number. If this number is zero, or | ||
// is for a surrogate, or is greater than the maximum allowed code point, return | ||
// U+FFFD REPLACEMENT CHARACTER (�). | ||
let hex = match hex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we need to convert the escaped sequence to a char. swc uses two buffers for this: raw and buf.
https://github.com/swc-project/swc/blob/e4f9f734ad1f92c6f05e8cb4c2d799679cca9f79/crates/swc_css_parser/src/lexer/mod.rs#L692-L723
I'm not sure if we can implement this with our CST, but we can potentially implement such logic at the syntax level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, in CSS we need two values 🤔 we can think about it later, as you suggested
COMMENT | ||
} | ||
} | ||
Some(b'/') => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//
comments are forbidden in CSS, but I prefer to keep it.
We can implement an option to turn on/off such comments.
5771f83
to
f71436e
Compare
f71436e
to
ae13e2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A great start! Thank you @denbezrukov
crates/rome_css_parser/Cargo.toml
Outdated
quickcheck = "1.0.3" | ||
quickcheck_macros = "1.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point we could move this crate in the workspace, what do you think?
/// Lexes the next token. | ||
/// | ||
/// ## Return | ||
/// Returns its kind and any potential error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Returns its kind and any potential error. | |
/// Returns its kind |
It doesn't seem to turn any error
// Interpret the hex digits as a hexadecimal number. If this number is zero, or | ||
// is for a surrogate, or is greater than the maximum allowed code point, return | ||
// U+FFFD REPLACEMENT CHARACTER (�). | ||
let hex = match hex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, in CSS we need two values 🤔 we can think about it later, as you suggested
Is there a reason why you are creating a css lexer from scratch? Why not use one that exists already? This feels like ALOT of work |
The lexers that already exists are not suitable for CSTs, and tree-sitter APIs are very poor |
Summary
Lexes string literal and comments
https://drafts.csswg.org/css-syntax/#consume-string-token
https://github.com/swc-project/swc/blob/main/crates/swc_css_parser/src/lexer/mod.rs
https://github.com/servo/rust-cssparser/blob/master/src/tokenizer.rs
Test Plan
cargo test -p rome_css_parser