-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Footnotes: Add block-level caching when parsing content for footnotes #52577
Conversation
Size Change: +70 B (0%) Total Size: 1.53 MB
ℹ️ View Unchanged
|
Flaky tests detected in 4382c18. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/5544335427
|
ae61c23
to
4382c18
Compare
while ( ( match = regex.exec( content ) ) !== null ) { | ||
newOrder.push( match[ 1 ] ); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that we have easy access to a DOM here, why parse the HTML with the regular expressions? would it be just the same if we did this methodically instead? particularly since we're caching the values?
if ( ! content.includes( 'data-fn' ) ) {
cache.set( block, [] );
}
const dom = doTheStuffWeDoToParseHTML( content );
dom.querySelectorAll( '[data-fn]' ).forEach( ( node ) => {
newOrder.push( node.getAttribute( 'data-fn' ) );
} )
|
||
export function updateFootnotesFromMeta( blocks, meta ) { | ||
const output = { blocks }; | ||
if ( ! meta ) return output; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this syntax makes me so nervous. it's been the vector of some of the biggest attacks and looks otherwise so innocent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify? Is it the lack of braces around the THEN block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's right. it's essentially the same discussion as with automatic semicolon insertion, which the industry has overwhelmingly rejected. I thought we overwhelmingly rejected code blocks without braces after major exploits (notably the OpenSSL-bypass).
even if ( ! meta ) { return output; }
retains the safety that if ( ! meta ) return output;
eschews, so giving up the safety doesn't even avoid any lines.
there are some trivial cases, particularly with comments, that can fool auto-formatters into creating the bugs. formatters move comments around, and the newlines and comments themselves can rip apart these coupled blocks. the braces tie them together.
curious what the advantages are of doing it this way
} | ||
|
||
// When we store rich text values, this would no longer | ||
// require a regex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no regex is required today, no? and it's JS so it's easy to build a DOM of the HTML and then modify it without risking more corruption.
it's also unclear to me what our purpose is here, partially because of the obscurity of the regex patterns themselves. regex
is moving the footnote id into the link text?
const dom = parseHTML( value );
// Set the footnote id as the link's text.
dom.querySelectorAll( 'sup[data-fn] > a' ).forEach( ( node ) => {
const fnId = node.parentNode.getAttribute( 'data-fn' );
node.innerText = newOrder.indexOf( fnId ) + 1;
} );
// Replace footnotes that are only anchor tags with the <sup>-wrapped version.
// From: <a data-fn="a431c">a431c</a>
// To: <sup data-fn="a431c" class="fn"><a href="#a431c" id="a431c-link">a431c</a></sup>
dom.querySelectorAll( 'a[data-fn]' ).forEach( ( node ) => {
const fnId = node.parentNode.getAttribute( 'data-fn' );
const sup = dom.createElement( 'sup' );
sup.setAttribute( 'data-fn', fnId );
sup.classList.add( 'fn' );
const link = dom.createElement( 'a' );
link.setAttribute( 'href', `#${ fnId }` );
link.setAttribute( 'id', `${ fnId }-link` );
link.innerText = fnId + 1;
sup.appendChild( link );
node.parentNode.replaceNode( node, sup );
} );
attributes[ key ] = dom.outerHTML;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These regexes are just temporary until we get rich text values in the store. If we can ever do that at least :(
We could cache the rich text values instead somewhere, but we can't use a weak map since the key would be a string... Any ideas?
I would prefer if we could cache the rich text values themselves somewhere instead of caching rich text values for a block.
Thanks for the feedback, @dmsnell, but this PR is only concerned with bringing block-level caching, not changing or questioning the overall approach established in #51201. Most of the diff is only moving existing code, which is why I recommend reading its three non-refactoring commits. As far as I know, Ella's plan is to get rich-text values exposed throughout the footnotes logic, eliminating the need for any parsing, but this is something to discuss with her once she's back. In the meantime, what I propose in this PR — if shown to perform better — should still be useful in the future, because my point was to split the work into cacheable bits, regardless of implementation details. |
roger. I missed that when looking at the green lines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I can make a PR to get rid of the regexes in a follow up.
Fixes unit tests failing at packages/editor/src/components/document-outline
a1ceee2
to
0020e54
Compare
Experimental: This works, but the experiment needs to be validated with performance benchmarks.
This pull request explores the idea that I originally explained at #52241 (review).
What?
The current situation with footnotes is that, if a post has any footnotes in meta, then a subroutine called
updateFootnotes
will parse the current document for any footnotes on every change.(The reason for this is that footnotes need to be kept in order in the Footnotes block (as well as in the indices in the document itself), and any change to the document can affect that order: moving blocks around, deleting RichText containing a footnote anchor, inserting a footnote via the Formats menu.)
In practice, though, most changes to a document only affect one block at a time (e.g. typing, setting attributes) or a small number thereof (e.g. moving a selection of blocks). So a straightforward performance improvement to the footnote parsing process is to cache the parsing at the block level. That way, whenever a user is typing inside a block, the entire parsing looks like this:
Note that caching was added in two components:
getFootnotesOrder
(higher level) andgetRichTextValuesCached
(lower level). I am not sure this is necessary.whereas before the process looked like:
Why?
How?
To help with the review, I took care to separate refactoring commits from functional commits in the commit history. The most important commits are probably:
Testing Instructions
Testing Instructions for Keyboard
Screenshots or screencast