-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define speculative HTML parsing #5959
Conversation
7a81521
to
122cff2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall quite solid, and I'd be pretty happy with merging it as-is. I tried to find ways to polish it and commented on those.
More implementer weigh-in would be ideal (specifically @hsivonen would be great), since I don't feel confident about that aspect of the review.
source
Outdated
|
||
<p class="note">It is possible that the same markup is seen multiple times from the | ||
<span>speculative HTML parser</span> and then the normal HTML parser. It is expected that | ||
duplicated fetches will be prevented by normal caching rules.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"normal caching rules" sounds like it's referring to well-specified caching rules. But I think in reality they're prevented by unspecified memory caches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better or worse, Gecko at present tries a speculative fetch at most once for each unique URL seen during a page load regardless of "normal caching rules".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Belated +1 to what @domenic said - at the very least, we need to acknowledge that the rules are currently unspecified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hsivonen do you think the spec should suggest that strategy for speculative fetches?
source
Outdated
<var>speculativeParser</var>.</p></li> | ||
|
||
<li><p><span>In parallel</span>, run <var>speculativeParser</var> until it is stopped or until it | ||
reaches the end of its <span>input stream</span>.</p></li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do speculative parsers stop? I thought they'd just kind of try to keep going, ignoring scripts or similar...
Or, can speculative parsers have their own speculative parsers? That might be worth calling out explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think when the normal parser starts parsing again, the speculative parser is stopped. At least I think this is true for Gecko, less sure about Chromium and WebKit. cc @hsivonen @mfreed7
In Gecko, I think a speculative parser can have a speculative parser, but I didn't do that in the spec to make the model a bit simpler. If a parser-blocking script document.write
s another parser-blocking script, the spec starts over speculative parsing altogether. The main "win" for the spec is that it doesn't need to check whether the document.write
did something that invalidates existing speculations. This can still be added in in the future if implementers would like to have that specified.
source
Outdated
|
||
<p class="note">It is possible that the same markup is seen multiple times from the | ||
<span>speculative HTML parser</span> and then the normal HTML parser. It is expected that | ||
duplicated fetches will be prevented by normal caching rules.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Belated +1 to what @domenic said - at the very least, we need to acknowledge that the rules are currently unspecified.
source
Outdated
|
||
<li> | ||
<p>If the <span>speculative HTML parser</span> encounters one of the following elements, then | ||
act as if that element is processed for the purpose of its effect of speculative fetches for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Chromium and WebKit's implementations deal with tags (and their tokens), not elements. Dunno if it matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tokens or tags seems slightly more correct from the spec perspective too; WDYT @zcorpan?
@domenic @yoavweiss I've tried to address the feedback by being a bit more specific and inventing a concept of "speculative mock elements", which can't cause things to happen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is pretty inventive. After staring at it for a bit, I think it works, and I like it!!
Still some other minor code review comments to resolve, but I think this is a very clean way of semi-rigorously building up a tree structure, while making it clear none of the normal mechanisms happen.
@domenic thanks! Another reason for this approach is that e.g. the Adoption Agency Algorithm can mutate parts of the DOM tree before the parser-blocking script, which would be bad to do speculatively on the real DOM tree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a couple of minor things!
707607d
to
565b295
Compare
@domenic thanks, fixed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd ideally like a LGTM from at least two implementers here. |
source
Outdated
|
||
<ul> | ||
<li> | ||
<p>The state of the normal HTML parser and the document itself must not be affected.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior of the input stream and the input byte stream probably needs to be specified with more detail – since tokens pushed into the input (byte) stream must also be pushed into the speculative parser's stream, but tokens read from the streams should be independent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! 6a20e19
This list is hand-wavy, certainly, but the feedback from implementers is to not specify the speculative parser in precise detail as the approaches used in different engines are different and currently we're mostly concerned about observable differences and that this optimization is specified at all.
So overall I like this spec change. In essence, the preload scanner shouldn't load something that the "real" parser wouldn't eventually load also, assuming no TL;DR, I'm supportive of landing this PR along with the test PR. |
Thanks for working on this. The spec seems to allow speculative fetches only when a script is blocking the parser. In Firefox, speculative fetches can happen even when the parser is not blocked by script but DOM building actions are being accumulated into a larger batch of work and speculative fetches are started during that accumulation. I think this is relevant to whether Firefox complies to the spec text in the as-if sense in case DOM changes come from other sources than parser-blocking scripts (e.g. async scripts or timeouts set by a same-origin parent). Specifically, if something else disconnects a node such that the parser keeps inserting nodes that are not in the document and this causes some non-speculative fetches not to occur, Firefox still performs speculative fetches. (Images are fetched even if disconnected, but I think there are other fetch types where the non-speculative case requires the node to be in the document.) I think it should be conforming to perform speculative fetches at any time on the assumption that no scripted action, regardless whether from a parser-blocking script or not, does anything. In Firefox, duplicate request avoidance doesn't go all the way to the cache but the speculative load machinery refuses to speculatively fetch the same URL twice. I don't see spec text connecting the creation of speculative mock elements to speculative fetches. |
Is it keyed off of only the URL, or a tuple of URL, |
I'm happy to allow it. I think we need to separate speculative parsing and speculative fetching a bit more in the spec, so normal parsing can also cause speculative fetches...
Yes, for example
The spec allows fetching "For performance reasons, user agents may start fetching the classic script or module graph (as defined above) as soon as the src attribute is set, instead, in the hope that the element will be inserted into the document (and that the crossorigin attribute won't change value in the meantime)." I can't tell for
Yeah, this is maybe too handwavily implied currently. I'll try to address it along with allowing regular-parsing speculative fetches. |
6a20e19
to
d65ab07
Compare
I've rebased on current |
d65ab07
to
0dce7bb
Compare
URL only with the twist that if there is a media query that doesn't apply, the URL is ignored instead of listed as already loaded. This might not be a good idea. This code predates the introduction of the |
@hsivonen I've addressed your comments, except I haven't specified the media query twist. I specified the duplicate fetch prevention to be URL only for now. |
Arguably, "Let url be the URL that element would fetch if it was processed normally" covers the media query aspect (as well as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think the spec should allow speculation while looking for meta charset
and the spec should allow fetches that the normal parser will see in the future to start speculatively. This could be explained by starting a parallel speculative parser at the start of the document with the HTTP-layer encoding, if there is one, or, otherwise, the inherited encoding, if there is one, or, otherwise, UTF-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good to me. I like the "speculative mock element" concept - makes the spec pretty straightforward here. I added a few small comments, but overall LGTM.
Fixes #5624.
(See WHATWG Working Mode: Changes for more details.)
/index.html ( diff )
/parsing.html ( diff )
/references.html ( diff )