-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add greater than symbol to attribute escaping #9963
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR fixed the problem in my tests, and now it looks like we are doing the same the classic editor does 👍
I think we can merge this fix for now, but I wonder if this is not a bug that should be fixed in wptexturize.
packages/element/src/serialize.js
Outdated
@@ -295,13 +306,17 @@ export function escapeLessThan( value ) { | |||
* "[...] the text cannot contain an ambiguous ampersand [...] must not contain | |||
* any literal U+0022 QUOTATION MARK characters (")" | |||
* | |||
* Additionally we escape the greater than symbol, as this is used by wptexturize to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as @wordpress/element
is concerned, it should have no knowledge of a wptexturize
existing. Further, there's nothing about this module which implies any relationship to being used in the context of a post (i.e. it can be used to produce HTML which would be injected in an admin screen). These modules are meant to be independently useful without consideration being strongly bound to any one specific context. Holding ourselves honest to this guideline will prevent biasing implementation toward our specific needs.
Now, with that said, it may still be our best option moving forward, though I'm also wondering a few things:
- Can this be fixed from
wptexturize
? - Can this be fixed by running a string replace on the result generated by a serialize call? Or do we need to be granular about the replacement occurring within an attribute value?
- And if so, should we consider extension points to our handling of various parts of the serialization? An attribute value visitor, for example?
Considering WordPress' needs as foreign can help surface where these sorts of extension points might be broadly useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these modules are meant to be independently useful
I wasn't aware of that, but it's good to know. This did seem like a surely-this-is-too-easy kind of fix and I was expecting it to raise a question or two!
My first thought was to look at wptexturize
. It's a monster of an old function and seems complicated to fix without triggering consequences somewhere. Saying that, I'm unfamiliar with the process, and don't know if Gutenberg is going to be tied to a particular base WP version etc. A fix there is certainly ideal, but tackling it in Gutenberg seems more immediately practical.
I then looked at fixing it inside the blocks themselves, but this places the burden on the block creator.
I looked at the serialize code. I did consider doing a string replace, but fixing a regex bug with another regex seemed like double trouble! It should be possible though, and we don't need to be granular about it - any unencoded >
in an attribute will break the post. This could leave the core Gutenberg code to produce clean HTML, and we then have a 'texturizing' layer.
I also looked further into the serialization code and wondered if we could do it at a higher level - somehow traverse the block output and add the extra encoding to attributes. It got complicated, and I then came across escapeAttribute
, which as solutions go seemed ideally magical.
By extension points do you mean hooks? That would work, and we could target different environments. If wptexturize
is fixed in a future WordPress, the >
encoding could be removed etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By extension points do you mean hooks? That would work, and we could target different environments. If wptexturize is fixed in a future WordPress, the > encoding could be removed etc
In my mind I had pictured something like an AST / other tree visitor, which could be passed as an option to serialize, or in the creation of a custom serializer (e.g. expose serialize factory method to create / override default behavior).
Prior art:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WordPress/gutenberg-core I know there is a core ticket for this https://core.trac.wordpress.org/ticket/45387 It doesn't seem like it has progressed much. Shall we land this fix as a preventive measure while the core fix is ongoing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These modules are meant to be independently useful without consideration being strongly bound to any one specific context.
Using >
in an attribute is valid HTML. Using >
in an attribute is also valid HTML. Important to this discussion: the two have the same meaning.
If we're aiming to be as general as possible, we should pick >
since it's useable in more contexts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using
>
in an attribute is valid HTML. Using>
in an attribute is also valid HTML. Important to this discussion: the two have the same meaning.
I don't disagree, but:
- The expressed intention of this function was to apply the minimal transformations necessarily to transform an attribute value to be conformant as per the specification of what constitutes a valid attribute value. Exceptions muddy expectations, both for the consumer and for the future maintainer.
- The bug still remains in WordPress that valid HTML becomes mangled upon save (whether from Gutenberg or elsewhere), but because there are few who feel comfortable touching the relevant texturize code, the impetus has been placed on integrators to implement workarounds, as we've done here.
- None of this matters, as the pull request has been approved on the pragmatic consideration that this be solved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Thank you for your patience and for humoring me :)
I wanted to use this package elsewhere and found I could not because of the details of its implementation. In attempting to understand why, I found this ticket. My comment was meant to be a springboard into that discussion elsewhere, but I instead derailed the conversation. My apologies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdawaffe Not at all derailed, certainly not. I think I'd stressed far too much the idea of being "generic" in my original comment, and it was a good prompt to reevaluate what I'd meant in terms of the stated purpose of a module / function.
d675a2b
to
3d339ca
Compare
Thanks for the nudge @nosolosw ! If the core ticket gets any movement then that may ultimately be the preferred route. If not then based on Andrew's feedback it seems like a more comprehensive solution is preferred in the long run. However, if this PR makes sense for the immediate future then I've rebased and fixed it up so it's working again. I can also create another ticket with the desired plans. |
packages/escape-html/src/index.js
Outdated
@@ -64,7 +75,7 @@ export function escapeLessThan( value ) { | |||
* @return {string} Escaped attribute value. | |||
*/ | |||
export function escapeAttribute( value ) { | |||
return escapeQuotationMark( escapeAmpersand( value ) ); | |||
return escapeGreaterThan( escapeQuotationMark( escapeAmpersand( value ) ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use a comment explaining why it diverges from HTML spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I think it got lost in the rebase. Added!
packages/escape-html/src/index.js
Outdated
* | ||
* @return {string} Escaped string. | ||
*/ | ||
export function escapeGreaterThan( value ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be best if we avoided exposing this on the public interface of the module, to avoid committing ourselves to future maintenance if it's arguably a workaround for a WordPress quirk.
Noting that it's exported for use by testing, alternatives could include one of either:
- Moving it to a separate file from which it is the default export, but is imported and then not subsequently exported by
index.js
- Naming it as an unstable API
__unstableEscapeGreaterThan
packages/escape-html/src/index.js
Outdated
* | ||
* @return {string} Escaped string. | ||
*/ | ||
export function __unstableEscapeGreaterThan( value ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we even exposing it? If you're exposing it for tests, I'd recommend making a separate file and not export it in index.js
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related: #9963 (comment)
Punting, as we seemingly haven't reached consensus yet. |
What are next steps? Should we land it or close it? What about the issues that are opened and this PR tries to resolve? |
I have some changes to address the points about exposing unstable code. Once I've got my e2e tests working properly I'll add the changes and if it's still useful then hopefully it will push the PR forward. |
350e10a
to
06f9027
Compare
Ok, I’ve updated this so the unstable code is also in a separate file. I’ve left the tests in the same place as they seem to make more sense there. I’ve also included changes for the package version and history (pending a real date) |
Has the code here been reverted in any way? When adding <!-- wp:paragraph -->
<p><abbr title="1 > 2">False</abbr> and <abbr title="1 < 2">True</abbr></p>
<!-- /wp:paragraph --> This gets rendered by WordPress as: <p><abbr title="1 > 2″>False</abbr> and <abbr title="1 < 2">True</abbr></p> The “False and” text is erroneously consumed be the attribute: In this case, it seems to be a problem with But if the block serializer output |
@westonruter Nothing more recent has occurred here to my knowledge. Could you create an issue which includes version information and sample code? I might wonder how your |
Although
>
is a valid attribute character, it is used bywptexturize
in WordPress to split HTML tokens. Without escapingwptexturize
will incorrectly tokenize a string, causing problems.For example, add
test > thing
as thealt
tag for an image in Gutenberg, and then view the post:Results in this HTML rendered in WordPress:
Encoding in Gutenberg prevents this problem while being transparent to the Gutenberg visual UI. It does have the effect that if you add a
>
into a block's attribute in HTML mode it will be replaced by>
, and this matches the same behaviour of&
being replaced with&
.The problem likely exists for other block types where an attribute is output directly based on user input. For example, custom class names.
Fixes #9915
Fixes #8779
Note this won't fix any existing posts that have a
>
in an attribute. Re-saving these in Gutenberg should resolve the problem.How has this been tested?
Additional unit test added for the
>
symbol, and existing unit tests modified.It can also be manually verified that adding
test > thing
to an image'salt
tag is displayed as-is in the Gutenberg UI, and results intest > thing
appearing in the HTML.Checklist: