-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block API: Consider encoding-normalized text as equivalent #11771
Conversation
The current test failures are legitimate. In disabling I'll need to think more on how best to address this, because the normalizations aren't strictly the same for text and attributes. Further, we have a few specific handlers on attribute value equivalence (e.g. One option may be to switch back to relying on |
This is what I decided to do in the latest commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Related to this change in Gutenberg: WordPress/gutenberg#11771
Fixes #9906
This pull request seeks to improve the block validation step to allow more leniency for effectively equivalent text encoded in varying forms.
The changes here were authored in such a way where there may be a slight performance benefit over master, both in a reduction of bundle size (an approximate 18% reduction gzipped on the
blocks
module) and in optimizing for an early return of equality if normalization (whitespace or encoding) is not necessary to determine equivalence of text sequences.Implementation notes:
In the process of implementing further text normalization here, it was discovered that the underlying
simple-html-tokenizer
performs its own entities substitution when encountering text tokens in an HTML string. For the purposes of validation, this was considered to be redundant and was thus swapped with a stub entity parser in the included changes. Note that this is the change which enables the significant drop in bundle size. Note also as an aside that there's desire to consolidate to a single parser between the blocks parse and validator parse, so the use ofsimple-html-tokenizer
may or may not persist far into the future.Testing instructions:
Verify that block invalidation is not triggered by encoding variations.
For example, inserting the following HTML as the contents of a post (in Text Mode, Classic Editor Text tab, or directly in the database) should not be presented as an invalid block when next viewing the Visual Mode of the editor:
cc @MarkRH