Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSS content property and unicode escape sequences #399

Closed
wants to merge 8 commits into from

Conversation

Dakror
Copy link
Contributor

@Dakror Dakror commented Dec 28, 2022

This PR adds the content property in a very simple first version. Only text literals are supported, and will be instantiated as inner RML. Additionally, unicode unescaping according to the CSS spec is added as well.

I wanted this feature in order to be able to use icon web-fonts out of the box using their stylesheet with as little modification as possible (i think we currently don't support ::before, do we?)

After implementing the backslash escaping, i noticed however that currently backslashes are handled like windows path delimiters, which would of course clash with the escaping syntax. Thus the current escaping tests fail.

So maybe we need to introduce a spec-diverging escape sequence like \u or something. But that would still cause unwanted behavior in the rare cases where Windows folders would start with a u.

@Dakror
Copy link
Contributor Author

Dakror commented Dec 28, 2022

image

Speaking of my desired usecase. Now that i have the (giant) stylesheet loaded, i noticed it took significantly longer to boot up. Measuring it shows that the StyleSheetNode CompoundSelector equals check takes long, which makes sense considering that all nodes lie in a single level in the hierarchy, with different class names.

@mikke89 Do you know how other engines deal with this? I've tried googling possible data structures or approaches to hash the compound selector into some non-string based form that would be easier to check against? Just plainly hashing the compound selector components would probably be very collision-prone.

@Dakror Dakror mentioned this pull request Dec 28, 2022
@mikke89 mikke89 added the enhancement New feature or request label Dec 30, 2022
@mikke89
Copy link
Owner

mikke89 commented Dec 30, 2022

Thanks for the PR!

From what I can tell, it seems like CSS doesn't actually support the content property for normal elements, except to add images. So this would be non-standard use of it, which isn't ideal for a new property. And I don't think it's supposed to affect the DOM like here. We don't currently support ::before and ::after which would be the proper place for generated content, but I do have some larger plans for improving the node tree, which would serve as a foundation for supporting exactly these kind of features.

I wonder if we really need the unicode escape support? Our assumption is that users have complete control over their assets and environment, so it's really just a matter of storing the style sheets or documents as UTF-8? I think the backward incompatibility is an issue.

By the way, we already have a UTF8-encoder in StringUtilities.


We've made a lot of effort in optimizing the element style rule selection, see a detailed discussion here: #293. However, we've probably made less effort for the style sheet merging, which seems to be the case here. I guess it's just because I haven't really seen this pop up in any benchmarks before.

I did some more benchmarking, and I thought at least it would appear in the ElementDocument benchmark. But for me it is only around 3%. Perhaps you could design a benchmark better suited for this situation?
I guess we might get far by some hashing? At least if there are a lot of false negatives (calls to operator== that return false), perhaps measure that first.

It's hard to find detailed, good references for how other browsers deal with these things, but at least here's a pretty good reference for how they apply style rules in general. Otherwise, you'll probably have to dive into their source code.

@Dakror
Copy link
Contributor Author

Dakror commented Dec 30, 2022

Oh damn i completely oversaw the fact that content does not work for real dom elements.
I'm definitely advocating for some form of unicode unescaping because otherwise from a development perspective, pasting utf-8 codepoints for some icon font icon into a document is a worse experience than using a human readable escape sequence representing the code point entry.

Yes the performance issue is at the stage of merging the style sheet and upon media-query re-evaluation.

@Dakror
Copy link
Contributor Author

Dakror commented Dec 30, 2022

I've switched to use the proper library utf-8 encoder function

@mikke89
Copy link
Owner

mikke89 commented Dec 30, 2022

Oh damn i completely oversaw the fact that content does not work for real dom elements. I'm definitely advocating for some form of unicode unescaping because otherwise from a development perspective, pasting utf-8 codepoints for some icon font icon into a document is a worse experience than using a human readable escape sequence representing the code point entry.

Yeah, that's a good argument for unicode escapes. We'll have to wait for this until the next major version, due to the backward incompatibility. The content property also needs ::before and ::after which won't happen until a new major version.

With that said, I think the unescape conversion logic should happen at an earlier stage in the pipeline, perhaps in the StyleSheetParser or its utilities?

Also, while we're at it, we might want to add support for html unicode escapes for the same reason?

Yes the performance issue is at the stage of merging the style sheet and upon media-query re-evaluation.

Yeah, we should continue performance discussion in #400.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants