-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gutenberg for non-admins and HTML entities #50050
Comments
I was making a new issue but then found this, so I'm dumping some info that may be relevant here: Possible culprit functions: $input = '&';
var_dump([
'input' => $input,
'normalized' => wp_kses_normalize_entities($input),
'serialized' => serialize_block_attributes($input),
'mangled' => serialize_block_attributes(wp_kses_normalize_entities($input)),
]); Outputs:
Here's my reproduction instructions:
The
Filter to modify add_filter('user_has_cap', static function($allcaps, $caps, $args, $user) {
if(in_array('editor', $user->roles, true)) {
$allcaps['unfiltered_html'] = false;
}
return $allcaps;
}, 10, 4); |
Related to #20490. Continues to be an issue and has been now for several years. |
For anyone who stumbles onto this thread, here's how I ended up solving this for our client. The meat of it is this function, which aims to decode characters & and < if that's what the author geniunely meant to write while still stripping out script HTML ( meaning they still don't have the /**
* This function is a recursive function that's part of our effort to allow some entities through, while preventing
* others.
*
* We really want ACF to be able to save This & That as "This & That" into the DB, not "This & That".
*
* Similarly, we really want ACF to be able to save Two < Three as "Two < Three", not "Two < Three".
*
* But we don't want "<script>console.log('foo');</script>" to be saved as "<script>console.log('foo')</script>"
* because that opens up script injection.
*
* So, it's not enough to blindly decode all entities.
*
* This function is really only concerned with isolated & < and >, but not their numeric equivalents. We
* don't care about the numeric equivalents of & < and > because if someone typed those in, that's them
* really trying to get an encoded entity in there. Everything else seems to be handled okay by wp_kses.
*
* Think of this function as being something to look for & or < or > that the author actually meant to include in
* the text they are saving. We want those to get into the DB as they are, not as encoded entities.
*
* @param mixed $props
*
* @return mixed
*/
function decode_isolated_entities_deep( $props ) {
if ( is_array( $props ) ) {
$props = array_map( 'decode_isolated_entities_deep', $props );
} elseif ( is_object( $props ) ) {
$props = (object) array_map( 'decode_isolated_entities_deep', (array) $props );
} elseif ( is_string( $props ) ) {
// First look for any occurrence of & that is not following immediately by non-spaces and a semi-colon
$props = preg_replace( '/&(?!([#0-9a-zA-Z]+);)/', '&', $props );
// Decode any < that is not immediately followed by script or /script
$props = preg_replace( '#<(?!/?script)#', '<', $props );
// Decode any > that is not immediately preceded by script
$props = preg_replace( '/(?!script)>/', '>', $props );
}
return $props;
} I ended up using this function for two use cases. The first use case was wanting to write those intentional characters to the database so if ( function_exists( 'current_user_can' ) && ! current_user_can( 'unfiltered_html' ) ) {
// We're making a decision to NOT write encoded entities to the database for ACF field values. I think this is best
// decision because it means that if we need to find a post that has "This & That" for a particular field, then we
// can just search for that, and not have to think about encoding the search parameter. Beyond that though, this
// ensures that editors don't see & in an ACF text field if they save "This & That".
//
// This is only relevant for non-admins ( those without the unfiltered_html capability ).
add_filter( 'acf/update_value', 'decode_isolated_entities_deep', 100 );
add_filter( 'title_save_pre', 'decode_isolated_entities_deep', 100 );
add_filter( 'content_save_pre', 'decode_isolated_entities_deep', 100 );
add_filter( 'excerpt_save_pre', 'decode_isolated_entities_deep', 100 );
} The second use case was when encoding the block attributes that get written as an HTML comment to be persist. This is important in case any sort of front-end hydration happens based on those attributes. To solve this, I introduced the following function: /**
* In tracking down the & issue within WordPress, this is where I landed. This filter runs after all of the
* pre_kses filters. We're keeping those intact because we want to keep the job of stripping out invalid HTML out of
* attributes. Primary case is we don't want editors able to put <script> tags into a dangerouslyAllowHtml block, though
* things like <a> and <strong> are still fine.
*
* The issue really just lives in the block attributes. By the time we get to here, wp_kses has converted a single ampersand
* into \u0026amp;, which decodes to &, which will display exactly like that when passed into a prop. A couple of other
* characters ( namely < and > ) are similarly encoded. This traces back to the serialize_block_attributes() function, which
* special-cases these characters.
*
* So, how this filter works is we pass block attributes through decode_isolated_entities_deep() before passing then through
* serialize_block_attributes() again to encode how WP wants them. This will convert \u0026amp; to simply \u0026, which is what
* we want.
*
* @param string $content Content to filter through KSES.
* @param array[]|string $allowed_html An array of allowed HTML elements and attributes,
* or a context name such as 'post'. See wp_kses_allowed_html()
* for the list of accepted context names.
* @param string[] $allowed_protocols Array of allowed URL protocols.
*
* @return string
*/
public function pre_kses_block_attributes( $content, $allowed_html, $allowed_protocols ) {
if ( 'post' === $allowed_html ) {
$content = preg_replace_callback( '/(<!-- wp:[^ ]+ )(.+?)( -->)/', function( $matches ) {
$attributes = decode_isolated_entities_deep( json_decode( $matches[2], true ) );
$encoded = serialize_block_attributes( $attributes );
return "{$matches[1]}{$encoded}{$matches[3]}";
}, $content );
}
return $content;
}
if ( function_exists( 'current_user_can' ) && ! current_user_can( 'unfiltered_html' ) ) {
add_filter( 'pre_kses', 'pre_kses_block_attributes', 100, 3 );
} |
Description
As a non-administrator ( someone without
unfiltered_html
capability ), when I save a prop to a React based component, some characters are getting converted to their encoded versions -&
becomes&
,>
becomes>
.I've verified this on a vanilla WordPress installation - no other plugins and the default twentytwentythree theme.
Consider the following simple component, which eventually becomes a block:
I work in a system that allows such a component to be turned into block automatically, and the block looks like this:
If a user without
unfiltered_html
capability saves such a post, what gets written to the DB is the following:In the attributes for the block, the ampersand ends up getting encoded as
\u0026amp;
. That's a raw value that gets passed along asattributes
, so when the page refreshes in the back end, that ends up showing up as&
. The immediate fallout from that is that the "Attempt Block Recovery" shows up because of the discrepancy between the&
that's written to the database and the&amp;
that the block now wants to render:And once you recover the block, that
&
shows up in the prop for the block:I consider this a bug because it's changing the prop that gets passed to the React component from a
&
to&
, causing the Attempt Block Recovery, and looking ugly for the non-admin users. React components that accept props and render out markup shouldn't have to worry about HTML decoding the props it receives.Further, I wonder about the reasoning behind writing HTML entities to the database at all, given that recommended practice is to put all text from the DB through escaping filters before displaying them. This happens for Advanced Custom Fields when I try to save
&
in text fields. I'm sure that's a can of worms, and I am not asking for any change there.Step-by-step reproduction instructions
unfiltered_html
capability ), enter "This & That" into that text field and save the page.This & That
Screenshots, screen recording, code snippet
No response
Environment info
Please confirm that you have searched existing issues in the repo.
Yes
Please confirm that you have tested with all plugins deactivated except Gutenberg.
Yes
The text was updated successfully, but these errors were encountered: