-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow when HTML + CSS is large #287
Comments
One thing I noticed was that there are a lot of low hanging fruit of re-using the "decision cache" if we just try to be a little smarter: For example, that monster of a CSS file contains this: .highlight .hll
.highlight
.highlight .c
.highlight .err
.highlight .k
.highlight .o
.highlight .cm
.highlight .cp
.highlight .c1
.highlight .cs
.highlight .gd
.highlight .ge
.highlight .gr
.highlight .gh
.highlight .gi
.highlight .go
.highlight .gp
.highlight .gs
... There are about 60 of those. I know for a fact that there is One way to figure that out would be something like this: if (selectorAsString.includes(' ')) {
selectorParentAsString = selectorAsString.split(' ').slice(0, selectorAsString.split(' ').length - 1).join(' ');
if (selectorParentAsString in decisionsCache) {
if (!decisionsCache[selectorParentAsString]) {
// No point continuing on!
return false; // or, whatever we do when decide to NOT include it.
}
}
} |
Another thing I noticed was that, for this CSS, the function |
Bah! I implemented a solution. It looks like this: const selectorParentString = utils.selectorParentString(
selectorString
);
if (selectorParentString !== null) {
// Only proceed if the selector did have a parent.
// E.g. selectorString was '.foo .bar'.
// Now we can see if the decision cache has already concluded
// that there is no '.foo' because if that's the case there's
// no point proceeding to checking '.foo .bar'.
if (selectorParentString in decisionsCache === false) {
decisionsCache[
selectorParentString
] = isSelectorMatchToAnyElement(selectorParentString);
}
if (!decisionsCache[selectorParentString]) {
// Indeeed! The parent was in the cache and it was concluded
// that it is not matched to any element.
decisionsCache[selectorString] = false;
}
} I measured how many times the critical call gets called and the number went from 3,281 down to 2,793 and it didn't appear to make any difference in the total processing time. |
So I experimented with three different scenarios. Ran it a bunch of times: 1.
|
Actually, I found that I can avoid even more expensive calls by splitting on ancestor selector too! E.g. from |
Hoping that |
@lahmatiy Hey, chatting here instead of Twitter. I did some experiments and noticed that for a large CSS frameworks like Sematic-UI there are a LOT of CSS selectors like this: .ui.comments > .reply.form {
margin-top: 1em;
}
.ui.comments .comment .reply.form {
width: 100%;
margin-top: 1em;
}
.ui.comments .reply.form textarea {
font-size: 1em;
height: 12em;
} So instead of doing I store all lookups in an object. I call it the "decision cache". That helps me avoid doing too many // Pseudo code
const decisions = {};
if (!document.querySelector('.ui.comments')) {
decisions['.ui.comments > .reply.form'] = false;
decisions['.ui.comments .comment .reply.form'] = false;
decisions['.ui.comments .reply.form textarea'] = false;
} At first I just did this: |
) * Optimize DOM selector lookups by pre-warming by selectors' parents Part of #287 * feedbacked * Update tests/utils.test.js Co-Authored-By: peterbe <peterbe@mozilla.com>
hey @peterbe thought you might be interested in a similar project of mine: https://github.com/leeoniya/dropcss I also originally started with CSSTree, plus css-select and node-html-parser but could not get the performance i was expecting - a lot more back-story here: https://old.reddit.com/r/javascript/comments/bb7im2/dropcss_v100_an_exceptionally_fast_thorough_and/ it doesnt require Puppeteer, though you can easily use it if you need JS execution. i tried sending your html [1] and css [2] through and it finished in 130ms (no, that's not a typo). i originally came to give some advice, but then realized that there are optimizations all the way through the whole architecture of DropCSS (for which i had to ditch all deps), so i'm not sure how much it would help here : ( cheers! [1] https://www.peterbe.com/plog-original.html |
I've been thinking, the puppeteer piece is ultimately just to get the DOM
in the form of HTML. Suppose you already have the HTML as a string, how
would minimalcss stack up?
On Fri, Apr 12, 2019 at 9:08 PM Leon Sorokin ***@***.***> wrote:
hey @peterbe <https://github.com/peterbe>
thought you might be interested in a similar project of mine:
https://github.com/leeoniya/dropcss
I also originally started with CSSTree, plus css-select
<https://github.com/fb55/css-select> and node-html-parser
<https://github.com/taoqf/node-html-parser> but could not get the
performance i was expecting - a lot more back-story here:
https://old.reddit.com/r/javascript/comments/bb7im2/dropcss_v100_an_exceptionally_fast_thorough_and/
it doesnt require Puppeteer, though you can easily use it
<https://github.com/leeoniya/dropcss#javascript-execution> if you need JS
execution. i ran tried sending your html [1] and css [2] through and it
finished in 130ms (no, that's not a typo). i originally came to give some
advice, but then realized that there are optimizations all the way through
the whole architecture of DropCSS (for which i had to ditch all deps), so
i'm not sure how much it would help here : (
cheers!
[1] https://www.peterbe.com/plog-original.html
[2] https://www.peterbe.com/static/css/base.min.79787297dbf5.css
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#287 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABoczyudRj1S0y04KDATsQJx0_Kf4xnks5vgS30gaJpZM4Zlost>
.
--
Peter Bengtsson
Mozilla Services Engineering
https://www.peterbe.com
|
@peterbe You already parse a selector to AST, so you can split the selector (node list) by the first combinator if any, and then translate it to a string. You also can normalize a compound selector (e.g. @leeoniya Did you try to switch off detailed parsing (anything except selectors)? It can boost CSS parsing. |
yes i turned off a bunch or prelude parsing, and i used only
i'll have to dig through that list and see what's worth handling, thanks!
the goal of DropCSS was to support the 99% use case, as is stated in the README. i've run it through my own CSS/HTML for several sizeable projects in addition to Bootstrap, Materialize, Semantic UI, Bulma, (and Tailwind with some meh pre-processing). That's a pretty wide swath of real-world cases. certainly it's easy to construct artificial but valid cases that trip it up, but this is design choice rather than an oversight.
for CSSTree - a CSS parser - of course correctness is number one, no one is arguing that :) i'm excited to see how fast, thorough and compact you guys can get minimalcss while being fully spec compliant (both for html and css parsing). |
quick rundown of https://codepen.io/anon/pen/zXdjNY *|p{font-size:25px} super obscure - like 0.0001% usage at best. probably won't bother. @media all { p { color: blue }}
@media none { p { color: purple }} this is left in by design. DropCSS does not evaluate media queries, and will leave any p{color:red;animation:2s foo infinite}
@keyframes infinite {
0% { background: red }
100% { background: white }
}
@keyframes foo {
0% { background: green }
100% { background: white }
}
div[id~=root]{font-weight:bold}
p:not(:nth-child(n+3)) { color: green }
[1] https://github.com/leeoniya/dropcss/blob/master/src/find.js#L23 |
@leeoniya One thing to keep in mind here is that there's a difference between performance and performance :) Suppose that csstree has some imperfect flaws that could be faster, I think that'd make it potentially go from 0.002s to 0.001s since it's most string work in Node and that's almost always fast enough. Also, about 10+ years ago I wrote a tool called Another thing to ponder is that puppeteer does take a lot of time. Second, to that is the work of doing |
that's what I do here: Lines 62 to 70 in 3a4b93f
I'm pretty pleased with the result, from the outside, of that function. It gets what I need. I'm curious if you can see anything horrible about it from the inside. |
I strongly doubt that's worth it. It would mean, if there's no DOM node called
Not sure what you mean there. |
thankfully, we don't have to "suppose" when we can measure. here's what i get on my laptop (prior numbers were from desktop): if i do a full "proper" parse using CSStree: const fs = require('fs');
const csstree = require('css-tree');
const css = fs.readFileSync('base.min.79787297dbf5.css', 'utf8');
const start = +new Date();
let ast = csstree.parse(css);
console.log(+new Date() - start); 180ms fully processing your CSS plus HTML with DropCSS: const puppeteer = require('puppeteer');
const fetch = require('node-fetch');
const dropcss = require('../../dist/dropcss.cjs.js');
const fs = require('fs');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.peterbe.com/plog-original.html');
const html = await page.content();
const styleHrefs = await page.$$eval('link[rel=stylesheet]', els => Array.from(els).map(s => s.href));
await browser.close();
await Promise.all(styleHrefs.map(href =>
fetch(href).then(r => r.text()).then(css => {
let start = +new Date();
let clean = dropcss({
css,
html,
});
console.log({
stylesheet: href,
cleanCss: clean.css,
elapsed: +new Date() - start,
});
fs.writeFileSync('out.css', clean.css, 'utf8');
})
));
})(); 214ms of this 214ms, CSS processing (which includes tokenization/parse) is 35ms.
DropCSS does not use only regular expressions. also worth mentioning that most parsers will use numerous regular expressions in their tokenizers. I would never write a parser using just regexes especially when matching nested opening/closing braces is needed.
Again, this is easily quantifiable. Here is what Chrome's perf timeline shows (with JS disabled) when simply loading https://www.peterbe.com/plog-original.html. This excludes any I/O and is purely the work needed for parsing/layout/render. I'm not sure how much of this work is done in headless mode but i suspect all of it unless you have some very specific options set, assuming they exist at all.
if cheerio is doing the selector parsing every time you use it, you're already starting considerably behind, since CSSTree would already have parsed it. this is why i rewrote DropCSS from its previous architecture, you need to have CSS and HTML systems that share information rather than being isolated. without having this you're going to be repeating the same work all over the place, even with aggressive caching strategies. |
looks like cheerio uses the same underlying if you're using cheerio you should be using querySelector, not querySelectorAll so it bails after the first match. |
That's soo cool. I'd love to experiment with making minimalcss depend on DropCSS. I love the idea of how minimalcss can get you the rendered DOM as a HTML string, including any DOM mutations (first) by things like I'd love to see a rough patch to drop in DropCSS (no pun intended) and it could be a flag like $ ./bin/minimalcss.js --use-dropcss https://www.url.com Then you could do: $ time ./bin/minimalcss.js --use-dropcss https://www.peterbe.com/plog-original.html -o /tmp/with.min.css
$ time ./bin/minimalcss.js https://www.peterbe.com/plog-original.html -o /tmp/without.min.css
$ prettier /tmp/without.min.css > /tmp/without.css
$ prettier /tmp/with.min.css > /tmp/with.css
$ diff -w /tmp/without.css /tmp/with.css The crucial test would obviously be to test this against the big well-known CSS frameworks and see what their quirks do to the minimal css and what difference it makes ultimately. |
Hint hint @leeoniya ^ :) Truth be told, I got started making a hack to Mind you, what the existing code does is that it when it opens pages, it the HTML strings aren't kept. Instead, they're put straight into |
what would be the value-add of
i'm not sure i got this one :) did you want me to compare the diff between dropcss and minimalcss outputs of your site? |
What I'm saying is that In other words, today
I do. But struggling to find the time. It's easier to think and comment about it rather than writing a patch. It was my optimistic hope that you'd have the inclination but I totally understand if you don't. |
When I run:
It steadily takes about 6-7 seconds. That's way too slow. Let's see what we can do.
The text was updated successfully, but these errors were encountered: