-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate name-start code points for identifier #185
base: main
Are you sure you want to change the base?
Conversation
@sabberworm How's this look? |
This doesn't seem to recognize some escape sequences. For example: body { background: height: ca\lc(100% - 1px); } The proposed changes cause the background rule here to get dropped. |
The CSS I meant to post in my previous message was actually: body { height: ca\lc(100% - 1px); } And the dropped rule is the |
It looks like this will introduce breaking changes (at least in lenient mode). I am not fully on board with the idea of running this kind of validation while parsing, at least not without introducing something like a new parsing mode. The syntax of the example The proposed validation can also be done post parse. Are there cases where doing it while parsing is truly necessary? It would definitely be nice to have this validation as part of the parser, so maybe it can be implemented as a method of |
I agree with @raxbg. We throw errors when we can‘t make sense of the input (and we try to recover in lenient mode). But throwing errors when we can understand the structure and it just happens to be invalid is a different thing entirely. Yes, a validation after parsing would be cool though. It could also catch tons of other issues. There‘s lots of CSS that‘s syntactically valid but semantically incorrect. I would not put it on the Document class though. I‘ve put too many extraneous stuff on the main classes already. It should be a separate |
Yes that would the case. Adding a check for lenient mode would resolve that and allow the parser to recover: --- lib/Sabberworm/CSS/Parsing/ParserState.php (revision 8fbd0fe82aa08ad2650def1b44f2f77154211e30)
+++ lib/Sabberworm/CSS/Parsing/ParserState.php (date 1580502992173)
@@ -72,7 +72,7 @@
$sResult = $this->parseCharacter(true);
}
- if ($sResult === null) {
+ if (!$this->oParserSettings->bLenientParsing && $sResult === null) {
throw new UnexpectedTokenException($sResult, $this->peek(5), 'identifier', $this->iLineNo);
}
$sCharacter = null; Validation of the CSS after parsing would be ideal, but that would be for another PR. |
The thrown exception will be caught when in lenient mode
When given the following CSS:
The parser does not remove or throw an exception for the malformed rule
-0-transition
, as it is a syntax error:This occurs because the check to validate if the rule is an identifier does not validate if the first 3 code points are valid.
This PR adds a check to validate the first 3 code points in an identifier. If a malformed identifier is given, an
UnexpectedTokenException
exception will be thrown before the identifier is consumed.