Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate manual construction of script tags in WP_Scripts and pass other scripts through wp_print_inline_script_tag() #4773

Closed
wants to merge 28 commits into from

Conversation

westonruter
Copy link
Member

@westonruter westonruter commented Jul 1, 2023

The scope here is now limited to the frontend (including Customizer preview) as well as the login screen (wp-login.php). Admin screens are not included since this would increase the scope significantly. Additionally, the core themes have not been updated since wp_print_inline_script_tag() was introduced in WP 5.7, and the last theme to include any custom scripts (Twenty Twenty-One) supports WordPress 5.3 and above. So to take advantage of the example Strict CSP plugin with core themes, you have to use one of these themes which don't manually construct scripts:

  • Twenty Eleven (only uses manual script tag in IE conditional comment)
  • Twenty Twelve (only uses manual script tag in IE conditional comment)
  • Twenty Thirteen
  • Twenty Fourteen (only uses manual script tag in IE conditional comment)
  • Twenty Sixteen
  • Twenty Nineteen
  • Twenty Twenty-Two (a block theme, so no custom JS)
  • Twenty Twenty-Three (a block theme, so no custom JS)

Previously: 10up#58

See WordPress/gutenberg#54637 for syncing block change to Gutenberg.

Testing Instructions

  1. Install and activate the Strict CSP plugin
  2. Activate Twenty Twenty-Four
  3. Navigate the frontend, the wp-login screen, and the Customizer preview (ideally so all scripts touched end up being printed) with the console open, making sure that there are no CSP warnings like:

Refused to load the script 'http://localhost:8889/wp-includes/blocks/navigation/view.min.js?ver=5687b52bb8c84fb4ae68' because it violates the following Content Security Policy directive: "script-src 'nonce-6aa1ba7b65' 'unsafe-inline' 'strict-dynamic' https: http:". Note that 'strict-dynamic' is present, so host-based allowlisting is disabled. Note that 'script-src-elem' was not explicitly set, so 'script-src' is used as a fallback.

Trac ticket: https://core.trac.wordpress.org/ticket/58664

Commit Message

Script Loader: Use wp_get_script_tag() and wp_get_inline_script_tag()/wp_print_inline_script_tag() helper functions to output scripts on frontend and login screen.

Using script tag helper functions allows plugins to employ the wp_script_attributes and wp_inline_script_attributes filters to inject the nonce attribute to apply Content Security Policy (e.g. Strict CSP). Use of helper functions also simplifies logic in WP_Scripts.

  • Update wp_get_inline_script_tag() to wrap inline script in CDATA blocks for XHTML-compatibility.
  • Ensure the type attribute is printed first in wp_get_inline_script_tag() for back-compat.
  • Wrap existing <script> tags in output buffering to retain IDE supports.
  • In wp_get_inline_script_tag(), append the newline to $javascript before it is passed into the wp_inline_script_attributes filter so that the CSP hash can be computed properly.
  • In the_block_template_skip_link(), opt to enqueue the inline script rather than print it.
  • Add ext-php to composer.json under suggest as previously it was an undeclared dependency for running PHPUnit tests.
  • Update tests to rely on DOMDocument to compare script markup, normalizing unsemantic differences.

Props westonruter, spacedmonkey, flixos90, 10upsimon, dmsnell, mukesh27, joemcgill, swissspidy, azaozz.
Fixes #58664.
See #39941.


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.


// Ensure markup is XHTML compatible if not HTML5.
if ( ! $is_html5 ) {
$javascript = sprintf( "/* <![CDATA[ */\n%s\n/* ]]> */", $javascript );
Copy link
Member Author

@westonruter westonruter Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure XML-compatibility, the $javascript string should have any instances of ]]> escaped. It's ugly, but apparently this is what has to be done:

Suggested change
$javascript = sprintf( "/* <![CDATA[ */\n%s\n/* ]]> */", $javascript );
$javascript = str_replace( ']]>', ']]]]><![CDATA[>', $javascript );
$javascript = sprintf( "/* <![CDATA[ */\n%s\n/* ]]> */", $javascript );

Nevertheless, it is likely exceedingly rare that a WordPress site is actually being served with Content-Type: application/xhtml+xml (true confessions, I used to do it, but the draconian XML parse error handling was painful). Still, it is unlikely for ]]> to occur in a script. I say this because if a site is being served as Content-Type: text/html then the parser HTML parser will ignore these CDATA sections, and it could end up causing a parse error when the script is passed to the JS interpreter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any concern from adding this now in case the $javascript passed to this function is already wrapped with that CDATA markup? Very unlikely I assume, but we may want to add a check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good call. I've done this in ecc29a9. This shows that it is indeed necessary to do the escaping. Without the escaping, doing the following:

wp_print_inline_script_tag( "/* <![CDATA[ */ console.log( 'Hello World!' ); /* ]]> */" );

Results in the following output:

<script type="text/javascript">
/* <![CDATA[ */
/* <![CDATA[ */ console.log( 'Hello World!' ); /* ]]> */
/* ]]> */
</script>

Pasting this into the W3C Validator as a fragment under the XHTML Strict doctype:

image

Results in an XML parse error:

image

However, when the escaping is present, the PHP outputs:

<script type="text/javascript">
/* <![CDATA[ */
/* <![CDATA[ */ console.log( 'Hello World!' ); /* ]]]]><![CDATA[> */
/* ]]> */
</script>

And this is valid:

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something as bizarre as this could definitely use a link to an XML spec or the source of the reason why these have to be escaped. someone is going to look at that and "fix" it and "improve quality" by removing the escaping 🙃

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though there is a comment about the escaping?

Copy link
Member

@dmsnell dmsnell Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah because it mentions that it should be compatible but not how or why.
like, how does this fix ensure compatibility?

when I saw it I immediately wondered what rule necessitates this or what breaks without it. I could imagine something like this, to see if I'm properly understanding the code.

/*
 * XHTML extracts the contents of the SCRIPT element and then the XML parser
 * decodes character references and other syntax elements. This can lead to
 * misinterpretation of the script contents or invalid XHTML documents.
 *
 * Wrapping the contents in a CDATA section instructs the XML parser not to
 * transform the contents of the SCRIPT element before passing them to the
 * JavaScript engine.
 *
 * Example
 *
 *     <script>console.log('&hellip;');</script>
 * 
 *     In an HTML document this would print "&hellip;" to the console,
 *     but in an XHTML document it would print "…" to the console.
 * 
 *
 *     <script>console.log('An image is <img> in HTML');</script>
 * 
 *     In an HTML document this would print "An image is <img> in HTML",
 *     but it's an invalid XHTML document because it interprets the `<img>`
 *     as an empty tag missing its closing `/`.
 *
 * @see https://www.w3.org/TR/xhtml1/#h-4.8
 */
if ( ! $is_html5 ) {
	/*
	 * If the string `]]>` exists within the JavaScript it would break
	 * out of any wrapping CDATA section added here, so to start, it's
	 * necessary to escape that sequence which requires splitting the
	 * content into two CDATA sections wherever it's found.
	 *
	 * Note: it's only necessary to escape the closing `]]>` because
	 * an additional `<![CDATA[` leaves the contents unchanged.
	 */
	$javascript = str_replace( ']]>', ']]]]><![CDATA[>', $javascript );

	// Wrap the entire escaped script inside a CDATA section.
	$javascript = sprintf( "/* <![CDATA[ */\n%s\n/* ]]> */", $javascript );
}

yeah I know it's wordy and lengthy, but it was really confusing to me, and this code is now going to be responsible for general code generation, and may get a lot of eyes on it.

by the way it seems like this will not trigger if the HTTP Content-type: text/html is what serves the document. I could not uncover these failures without serving as Content-type: application/xhtml+xml or by directly opening the file with a .xml extension. The exact same file contents stored with .html as its extension leads to HTML semantics, thus this doesn't effect it.

Finally, and thankfully, it doesn't appear to be a security issue because Safari, Firefox, and Chrome all prevent escaping from the script using tricks like &lt;/script&gt;. It will change those things into </script>, but that leads to a JavaScript syntax error, or data corruption if it's found within a JS string.

Including <img> was fun because that broke the entire page render. It became invalid XML. Again, if served as .html or with Content-type: text/html or anything but explicit out-of-band information that it's XML, none of these problem arise.

@@ -2845,8 +2862,6 @@ function wp_get_inline_script_tag( $javascript, $attributes = array() ) {
*/
$attributes = apply_filters( 'wp_inline_script_attributes', $attributes, $javascript );

$javascript = "\n" . trim( $javascript, "\n\r " ) . "\n";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This needs to move up above the applying of the wp_inline_script_attributes filters so that the final $javascript is available for computing a CSP hash.

@westonruter westonruter marked this pull request as ready for review September 1, 2023 00:28
Copy link
Member

@mukeshpanchal27 mukeshpanchal27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @westonruter for the PR.

Do we need to update wp_print_community_events_templates() ?

src/wp-admin/includes/class-wp-list-table.php Outdated Show resolved Hide resolved
Co-authored-by: Mukesh Panchal <mukeshpanchal27@users.noreply.github.com>
@westonruter
Copy link
Member Author

Do we need to update wp_print_community_events_templates() ?

No. Since these are not script tags containing JavaScript, they do not have to be updated to use wp_print_inline_script_tag() in order to apply CSP. The nonce attribute is only needed for JS script tags.

Comment on lines 550 to 558
ob_start();
?>
<script type="text/javascript">
<script>
addLoadEvent = function(func){if(typeof jQuery!=='undefined')jQuery(function(){func();});else if(typeof wpOnload!=='function'){wpOnload=func;}else{var oldonload=wpOnload;wpOnload=function(){oldonload();func();}}};
var ajaxurl = '<?php echo esc_js( admin_url( 'admin-ajax.php', 'relative' ) ); ?>', pagenow = 'media-upload-popup', adminpage = 'media-upload-popup',
isRtl = <?php echo (int) is_rtl(); ?>;
</script>
<?php
wp_print_inline_script_tag( trim( str_replace( array( '<script>', '</script>' ), '', ob_get_clean() ) ) );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really happy with the boilerplate ob_start() followed by an inline script followed by:

wp_print_inline_script_tag( trim( str_replace( array( '<script>', '</script>' ), '', ob_get_clean() ) ) );

I think ideally there would be a helper to do this automatically. Consider, for example, if wp_print_inline_script_tag() actually allowed a closure to be provided for the $javascript parameter in addition to a string. It could handle the output buffering automatically, for instance:

wp_print_inline_script_tag( static function () {
	?>
	<script>
	addLoadEvent = function(func){if(typeof jQuery!=='undefined')jQuery(function(){func();});else if(typeof wpOnload!=='function'){wpOnload=func;}else{var oldonload=wpOnload;wpOnload=function(){oldonload();func();}}};
	var ajaxurl = '<?php echo esc_js( admin_url( 'admin-ajax.php', 'relative' ) ); ?>', pagenow = 'media-upload-popup', adminpage = 'media-upload-popup',
		isRtl = <?php echo (int) is_rtl(); ?>;
	</script>
	<?php
} );

When a closure is passed, it could automatically start and end output buffering, strip the script start and end tags, and trim whitespace. If the output buffer lacks a <script> it could issue a _doing_it_wrong().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only purpose for having the script tags in the PHP code is to enable IDEs to do syntax-highlighting, syntax checking, autocompletion, etc. This is valuable so I think we should facilitate it somehow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This modification to wp_get_inline_script_tag() has since been reverted in this PR.

@westonruter
Copy link
Member Author

westonruter commented Sep 1, 2023

Update: The scope has been reduced to not include wp-admin.

There are still quite a few places where scripts are being manually printed, primarily in the admin:

ack output
wp-includes/class-wp-customize-widgets.php
1315:           <script type="text/javascript">

wp-includes/class-wp-embed.php
91:<script type="text/javascript">

wp-includes/theme-previews.php
72:     <script type="text/javascript">

wp-admin/includes/class-bulk-upgrader-skin.php
112:            echo '<script type="text/javascript">jQuery(\'.waiting-' . esc_js( $this->upgrader->update_current ) . '\').hide();</script>';
133:            echo '<script type="text/javascript">jQuery(\'.waiting-' . esc_js( $this->upgrader->update_current ) . '\').css("display", "inline-block");</script>';
151:                    echo '<script type="text/javascript">jQuery(\'#progress-' . esc_js( $this->upgrader->update_current ) . '\').show();</script>';
161:                    echo '<script type="text/javascript">jQuery(\'.waiting-' . esc_js( $this->upgrader->update_current ) . '\').hide();</script>';

wp-admin/includes/class-custom-image-header.php
377:<script type="text/javascript">
432:<script type="text/javascript">

wp-admin/includes/class-wp-internal-pointers.php
121:            <script type="text/javascript">

wp-admin/includes/class-wp-upgrader-skin.php
241:                    echo '<script type="text/javascript">
247:                    echo '<script type="text/javascript">

wp-admin/includes/meta-boxes.php
918:                    <script type="text/javascript">jQuery(function(){commentsBox.get(<?php echo $total; ?>, 10);});</script>

wp-admin/includes/update-core.php
1727:<script type="text/javascript">

wp-admin/includes/deprecated.php
1514:   <script type="text/javascript">

wp-admin/includes/ms.php
839:<script type="text/javascript">
1000:<script type="text/javascript">

wp-admin/includes/media.php
275:    <script type="text/javascript">
825:            <script type="text/javascript">
2073:   echo '<script type="text/javascript">post_id = ' . $post_id . ';</script>';
2212:   <script type="text/javascript">
2354:   <script type="text/javascript">
2420:   <script type="text/javascript">
2561:   <script type="text/javascript">
2890:   <script type="text/javascript">

wp-admin/network/upgrade.php
126:            <script type="text/javascript">

wp-admin/network/site-users.php
219:<script type="text/javascript">

wp-admin/edit-form-advanced.php
739:<script type="text/javascript">

wp-admin/media-new.php
80:     <script type="text/javascript">

wp-admin/customize.php
154:<script type="text/javascript">

wp-admin/install.php
457:<script type="text/javascript">var t = document.getElementById('weblog_title'); if (t){ t.focus(); }</script>
463:<script type="text/javascript">

wp-admin/update-core.php
214:            <script type="text/javascript">
922:    <script type="text/javascript">

<?php
wp_print_inline_script_tag(
// language=JavaScript
sprintf( 'var _wpWidgetCustomizerPreviewSettings = %s;', wp_json_encode( $settings ) )
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PhpStorm flags the sprintf placeholder as a syntax error:

Expression expected

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presumably because of the // language=JavaScript comment, which shouldn't be there since this is still PHP?

@spacedmonkey
Copy link
Member

@westonruter There seem to be number of places where we should use this function.

function wp_block_theme_activate_nonce() {
$nonce_handle = 'switch-theme_' . wp_get_theme_preview_path();
?>
<script type="text/javascript">
window.WP_BLOCK_THEME_ACTIVATE_NONCE = <?php echo wp_json_encode( wp_create_nonce( $nonce_handle ) ); ?>;
</script>
<?php
}

<script type="text/javascript">
jQuery( function($) {
$.get("<?php echo esc_url( admin_url( 'admin-ajax.php', 'relative' ) ) . '?action=oembed-cache&post=' . $post->ID; ?>");
} );
</script>

<script type="text/javascript">

echo "<script type='text/javascript'>\n" . self::wp_mce_translation() . "</script>\n";

<script type="text/javascript">

There are also 50 other examples in /wp-admin

Copy link
Member

@spacedmonkey spacedmonkey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting close. There are 5 places in wp-includes that should be updated. We should also consider using wp_add_inline_script in some places.

@westonruter
Copy link
Member Author

@westonruter There seem to be number of places where we should use this function.

There are also 50 other examples in /wp-admin

@spacedmonkey I've intentionally excluded instances that are specifically used in the wp-admin. This is to reduce scope. The changes in this PR are intended to only relate to the frontend and to the login screen. Some instances in wp-includes are only used in wp-admin (generally) so that's why they aren't included. I'll double-check the ones you identified.

I think the wp-admin will need to be a separate effort. For one, there are many many more inline script tags, and secondly, the block/site editor screens have manual script construction in JS which breaks Strict CSP. So that will need to be addressed in Gutenberg.

@westonruter westonruter mentioned this pull request Sep 19, 2023
@westonruter
Copy link
Member Author

@westonruter There seem to be number of places where we should use this function.

@spacedmonkey:

function wp_block_theme_activate_nonce() {
$nonce_handle = 'switch-theme_' . wp_get_theme_preview_path();
?>
<script type="text/javascript">
window.WP_BLOCK_THEME_ACTIVATE_NONCE = <?php echo wp_json_encode( wp_create_nonce( $nonce_handle ) ); ?>;
</script>
<?php
}

This is used exclusively at the admin_head action, so it's out of scope.

<script type="text/javascript">
jQuery( function($) {
$.get("<?php echo esc_url( admin_url( 'admin-ajax.php', 'relative' ) ) . '?action=oembed-cache&post=' . $post->ID; ?>");
} );
</script>

This is used exclusively at the edit_form_advanced and edit_page_form actions on the classic post edit screen in the admin, so it's out of scope.

<script type="text/javascript">

echo "<script type='text/javascript'>\n" . self::wp_mce_translation() . "</script>\n";

<script type="text/javascript">

Since these are for the classic editor which is (usually) only used in the admin, I think it is out of scope.

There are also 50 other examples in /wp-admin

See #4773 (comment)

@spacedmonkey
Copy link
Member

I've intentionally excluded instances that are specifically used in the wp-admin. This is to reduce scope.

I agree that would make this PR too big. But I would add a todo or other code comment on the ones wp-includes. It is clear now why you have done what you done, but it may stop a future developer trying to "fix" the issue, it is not using this function is by design.

Side note, wp_block_theme_activate_nonce should use wp_add_inline_script.

@westonruter
Copy link
Member Author

I agree that would make this PR too big. But I would add a todo or other code comment on the ones wp-includes. It is clear now why you have done what you done, but it may stop a future developer trying to "fix" the issue, it is not using this function is by design.

I'm not sure this is necessary. If someone wants to fix up other instances, more power to them. Using the function won't hurt. It's just that it won't get all the benefits since there are some blockers for complete coverage. Adding comments seems like it would just create noise. As long it is clear in the ticket that the scope is limited to the frontend and wp-login, I think this is sufficient.

Side note, wp_block_theme_activate_nonce should use wp_add_inline_script.

Yes, but since it's only used in wp-admin then we can defer it to fix later.

Copy link
Member

@spacedmonkey spacedmonkey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I would like the follow on ticket for changing the admin scripts to be created before this is committed, just so we don't lose track of it.

Copy link
Member

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter This basically looks good to me, though I have a few non-blocking concerns that would be great to get your thoughts on.

@@ -1559,7 +1559,7 @@ public function export_preview_data() {
$exports = array(
'navMenuInstanceArgs' => $this->preview_nav_menu_instance_args,
);
printf( '<script>var _wpCustomizePreviewNavMenusExports = %s;</script>', wp_json_encode( $exports ) );
wp_print_inline_script_tag( sprintf( /** @lang JavaScript */ 'var _wpCustomizePreviewNavMenusExports = %s;', wp_json_encode( $exports ) ) );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that why you added the /** @lang JavaScript */ here? Is that standardized somehow? Asking since I haven't seen that before.

@@ -1366,7 +1366,7 @@ function wp_comment_form_unfiltered_html_nonce() {

if ( current_user_can( 'unfiltered_html' ) ) {
wp_nonce_field( 'unfiltered-html-comment_' . $post_id, '_wp_unfiltered_html_comment_disabled', false );
echo "<script>(function(){if(window===window.parent){document.getElementById('_wp_unfiltered_html_comment_disabled').name='_wp_unfiltered_html_comment';}})();</script>\n";
wp_print_inline_script_tag( /** @lang JavaScript */ "(function(){if(window===window.parent){document.getElementById('_wp_unfiltered_html_comment_disabled').name='_wp_unfiltered_html_comment';}})();" );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are language injection comments. Supported by PhpStorm, at least: https://www.jetbrains.com/help/phpstorm/using-language-injections.html

Comment on lines -2790 to +2794
$attributes['type'] = 'text/javascript';
// Keep the type attribute as the first for legacy reasons (it has always been this way in core).
$attributes = array_merge(
array( 'type' => 'text/javascript' ),
$attributes
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


// Ensure markup is XHTML compatible if not HTML5.
if ( ! $is_html5 ) {
$javascript = sprintf( "/* <![CDATA[ */\n%s\n/* ]]> */", $javascript );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any concern from adding this now in case the $javascript passed to this function is already wrapped with that CDATA markup? Very unlikely I assume, but we may want to add a check?

Comment on lines 3011 to 3023
$expected = str_replace( " type='text/javascript'", '', $expected );
$expected = str_replace( ' type="text/javascript"', '', $expected );
$expected = str_replace( "/* <![CDATA[ */\n", '', $expected );
$expected = str_replace( "\n/* ]]> */", '', $expected );
$expected = str_replace( ' defer="defer"', ' defer', $expected );
$expected = str_replace( ' async="async"', ' async', $expected );

$actual = str_replace( " type='text/javascript'", '', $actual );
$actual = str_replace( ' type="text/javascript"', '', $actual );
$actual = str_replace( "/* <![CDATA[ */\n", '', $actual );
$actual = str_replace( "\n/* ]]> */", '', $actual );
$actual = str_replace( ' defer="defer"', ' defer', $actual );
$actual = str_replace( ' async="async"', ' async', $actual );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'm a fan of having all these exceptions here. Can we somehow ensure the tests pass the relevant values instead? The CDATA one feels okay to me, but otherwise, I think starting to have exceptions here sets a bad precedent that could eventually move this assertion further away from asserting "equal markup".

Maybe instead create a more specific assertEqualScriptTag or something like that with these exceptions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about 3ba5135? I improved the normalization process to use the DOM instead, so it will be much more robust and safe. I also made it clear from the method description that it compares with normalizations applied. Lastly, since this method is contained in the Tests_Dependencies_Scripts class I think this also sufficiently indicates that it is specific to checking script tag markup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feedback wasn't really about how this was implemented, only about that there are now these "exceptions" where it's not actually equal.

I don't feel strongly about that, so happy to keep it as long as it's only used in this scripts.php test file. The name is mostly what throws me off. I'd be on board if this was called something more specific to script markup. Maybe we can just rename the assertion since it's (as far as I can tell) only used in this class anyway?

Regarding DOMDocument, I personally don't trust it. It has so many weird quirks, so IMO the str_replace from before was more accessible to other developers (FWIW I don't understand half the code you added for the DOMDocument approach) and less error-prone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit of DOMDocument is PHPUnit supports comparing two instances in assertEquals, and it normalizes non-semantic differences (e.g. attribute order). See #4773 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@westonruter westonruter Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, assertEquals is intentionally not strict in how it compares equality. It checks semantic equality as opposed to assertSame. So == instead of ===. So I think it makes sense.

@@ -12,6 +12,9 @@
"require": {
"php": ">=7.0"
},
"suggest": {
"ext-dom": "*"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have put this in require since the extension is required when running tests (and not just the tests being introduced in this PR). But since it is not required to actually run WordPress, I went with suggest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PhpStorm, at least, rightly identifies this as having been missing:

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason to probably not use this approach and stick to the simple str_replace that doesn't rely on new dependencies :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it was already using DOMDocument. It turns out that assertEquals in PHPUnit supports comparing two DOMElement instances, and this then accounts for differences in attribute order or whether single vs double quotes are used automatically. So I'm just extending that a bit further in assertEqualMarkup to normalize a bit more for HTML5 vs XHTML.

And DOMDocument is being used in other tests as well, so it's not a new dependency. It's just a dependency that wasn't declared before.

@westonruter
Copy link
Member Author

Committed in r56687.

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some post-hoc comments. I think there are opportunities to follow-up on this, and particularly do something about str_replace( '<script>', '' ) because that seems dangerous and needlessly inefficient.

overall it looks better than it was though.

@@ -2106,6 +2109,7 @@ public function remove_frameless_preview_messenger_channel() {
} )();
</script>
<?php
wp_print_inline_script_tag( str_replace( array( '<script>', '</script>' ), '', ob_get_clean() ) );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, we don't want to arbitrarily replace these tags. we know they exist at the front and back of the string, so by manually removing them we can avoid unintentional HTML syntax poisoning.

$script_html = ob_get_clean();
$script_html = substr( $script_html, strlen( '<script>' ), -strlen( '</script>' ) );

the strlen calls shouldn't add any overhead because that's stored in the string object, which here is a string literal, which we've already created in this patch inside the array.

this change is both safer, deterministic, and more efficient, particularly in the worse-case, though given that we expect very short args here I would be surprised if it's a measurable impact. still, doing less is faster than doing more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch @dmsnell!

@westonruter IMO this is worth a quick follow up commit applying this throughout.

If we feel strongly about introducing a helper at this point, I wouldn't be opposed to that either. Though I think that should remain an @access private function if we add it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Private helper function added in #5301

@@ -2106,6 +2109,7 @@ public function remove_frameless_preview_messenger_channel() {
} )();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is out of scope for this PR, but looks like we could prevent some other corruption here by using URLSearchParams instead of string-searching the query of the URL. no need for DOM either.

url = new URL( location.href );
queryParams = url.searchParams;

if ( queryParams.has( 'customize_messenger_channel' ) ) {
	queryParams.delete( 'customize_messenger_channel' );
	url.search = queryParams;
	location.replace( url );
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, this code dates back to a time before we could rely on URL or URLSearchParams. I made a similar change recently to wp-embed (r56383). Probably should be put into a new defect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed this in Core-59480

@@ -5012,6 +5019,7 @@ public function customize_pane_settings() {
?>
</script>
<?php
wp_print_inline_script_tag( str_replace( array( '<script>', '</script>' ), '', ob_get_clean() ) );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it help to create a helper function for this specific task?

/**
 * Removes leading and trailing _empty_ script tags.
 *
 * This is a helper meant to be used for literal script tag construction
 * within wp_print_inline_script_tag(). It removes the literal values of
 * "<script>" and "</script>" from around an inline script.
 *
 * @since 6.4.0
 *
 * @param string $contents Script body with manually created SCRIPT tag literals.
 * @return string Script body without surrounding script tag literals, or
 *                original contents if both exact literals aren't present.
 */
function wp_remove_surrounding_empty_script_tags( $contents ) {
	$opener = '<script>';
	$closer = '</script>';

	$has_both_empty_tags = (
		strlen( $opener ) + strlen( $closer ) > strlen( $contents ) ||
		substr( $contents, 0, strlen( $opener ) ) !== $opener ) ||
		substr( $contents, -strlen( $closer ) ) !== $closer
	);

	return $has_both_empty_tags
		? substr( $contents, strlen( $opener ), -strlen( $closer ) )
		: $contents;
}

then these repetitive calls could be marginally nicer. even with the checks for the leading and trailing SCRIPT tags, this should still be faster than str_replace(), but more importantly, more secure and resilient against breaking syntax.

$script_body = wp_remove_surrounding_empty_script_tags( ob_get_clean() );
wp_print_inline_script_tag( $script_body );

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a syntax error in the $has_both_empty_tags condition. Could you fix and add comments explaining the conditions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure; also just a note - all my examples are almost always wrong and are only meant to convey ideas. if you ever get the impulse, please never copy code of mine from a comment and paste it into a project. don't trust me 😄 I don't test these examples or vet them for WPCS' prefences.

/**
 * Removes leading and trailing _empty_ script tags.
 *
 * This is a helper meant to be used for literal script tag construction
 * within wp_print_inline_script_tag(). It removes the literal values of
 * "<script>" and "</script>" from around an inline script.
 *
 * @since 6.4.0
 *
 * @param string $contents Script body with manually created SCRIPT tag literals.
 * @return string Script body without surrounding script tag literals, or
 *                original contents if both exact literals aren't present.
 */
function wp_remove_surrounding_empty_script_tags( $contents ) {
	$opener = '<script>';
	$closer = '</script>';

	$has_both_empty_tags = (
		strlen( $opener ) + strlen( $closer ) > strlen( $contents ) ||
		substr( $contents, 0, strlen( $opener ) ) !== $opener ||
		substr( $contents, -strlen( $closer ) ) !== $closer
	);

	/*
	 * What should happen if the given contents are not surrounded by
	 * the exact literal script tags? This question opens up a can of
	 * worms with no obvious or clear answer. Removing one of the tags
	 * could lead to just as much trouble as leaving one in place.
	 * This code leaves the string untouched if it can't find both the
	 * exact tags. Maybe someone added an attribute to the opening tag
	 * or maybe someone added a space; it's impossible to know from
	 * here.
	 *
	 * It would be possible to return `null` or `false` here to indicate
	 * the failure, but people probably wouldn't be checking the result
	 * and that could introduce corruption. An empty string could be
	 * another viable return and that would quickly signal that something
	 * is wrong and needs fixing.
	 */
	return $has_both_empty_tags
		? substr( $contents, strlen( $opener ), -strlen( $closer ) )
		: $contents;
}

Maybe it's best to return an empty string here if the syntax isn't a perfect match. Is this what you were asking for in "add comments explaining the conditions"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I meant more the conditions for $has_both_empty_tags as I wasn't understanding exactly they were doing. But it seems that's because the logic wasn't quite right. Also seems worthwhile to normalize the prefix and suffix to upper case. So I believe the condition should actually be:

	$has_both_empty_tags = (
		strlen( $contents ) > strlen( $opener ) + strlen( $closer ) &&
		strtolower( substr( $contents, 0, strlen( $opener ) ) ) === $opener &&
		strtolower( substr( $contents, -strlen( $closer ) ) ) === $closer
	);

It has both empty tags if:

  1. The entire string is longer than the opener and closer, and
  2. The string starts with $opener, and
  3. The string ends with $closer

When those conditions are satisfied, then substr( $contents, strlen( $opener ), -strlen( $closer ) ) should be done.

In testing, I see that the first line of the function should also be:

$contents = trim( $contents );

In regards to the comment comment before:

	return $has_both_empty_tags
		? substr( $contents, strlen( $opener ), -strlen( $closer ) )
		: $contents;

What do you think about actually just doing:

if ( ! $has_both_empty_tags ) {
	_doing_it_wrong( __FUNCTION__, '6.4', __( 'Expected string to begin with an empty script tag and close with a script tag.' ) );
	return '';
}

I'll note that this is actually the reverse of what wp_add_inline_script() does:

if ( false !== stripos( $data, '</script>' ) ) {
_doing_it_wrong(
__FUNCTION__,
sprintf(
/* translators: 1: <script>, 2: wp_add_inline_script() */
__( 'Do not pass %1$s tags to %2$s.' ),
'<code>&lt;script&gt;</code>',
'<code>wp_add_inline_script()</code>'
),
'4.5.0'
);
$data = trim( preg_replace( '#<script[^>]*>(.*)</script>#is', '$1', $data ) );
}

Note also how it does the string replacement, so somewhat the naive approach which I originally committed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented this in #5301

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about actually just doing:

Seems great. There's never a case to call this without the SCRIPT tags, so that makes sense to me. Make it visible as early as possible.

seems worthwhile to normalize the prefix and suffix to upper case

I have no strong opinion on this. Seems fine. The one thing is that <sCRiPt> is not exactly a string literal match, but that's fine. Whatever happens here I think it will work out, and if something goes awry, with the help of the _doing_it_wrong() someone will figure it out quickly enough.

<?php
wp_print_inline_script_tag(
// language=JavaScript
sprintf( 'var _wpWidgetCustomizerPreviewSettings = %s;', wp_json_encode( $settings ) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

presumably because of the // language=JavaScript comment, which shouldn't be there since this is still PHP?


// Ensure markup is XHTML compatible if not HTML5.
if ( ! $is_html5 ) {
$javascript = sprintf( "/* <![CDATA[ */\n%s\n/* ]]> */", $javascript );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something as bizarre as this could definitely use a link to an XML spec or the source of the reason why these have to be escaped. someone is going to look at that and "fix" it and "improve quality" by removing the escaping 🙃

@@ -3136,7 +3136,7 @@ public function test_remove_frameless_preview_messenger_channel() {
ob_start();
$manager->remove_frameless_preview_messenger_channel();
$output = ob_get_clean();
$this->assertStringContainsString( '<script>', $output );
$this->assertStringContainsString( '<script', $output );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to know what this assertion is supposed to be testing. what's the point here? confirm that we have a SCRIPT element in the output?

$processor = new WP_HTML_Tag_Processor( $output );
$this->assertTrue( $processor->next_tag( 'script' ), 'Failed to find expected SCRIPT element in output.' );

😉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2b764e6 as part of #5301

array(
'id' => 'ms-isa-1-js-after',
)
);
$this->assertSame( $expected, $output, 'Inline scripts in the "after" position, that are attached to a deferred main script, are failing to print/execute.' );
$this->assertEqualMarkup( $expected, $output, 'Inline scripts in the "after" position, that are attached to a deferred main script, are failing to print/execute.' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ all of these. great.

also, did you know that we're finally going to get a spec-compliant HTML5/DOM parser in PHP?

doesn't remove the need for the HTML API (mainly because of memory and performance and interface needs) but it will go a long way in our tests to eliminating false errors.

@@ -257,7 +260,7 @@ public function test_blocking_dependent_with_delayed_dependency( $strategy ) {
wp_enqueue_script( 'main-script-a3', '/main-script-a3.js', array(), null, compact( 'strategy' ) );
wp_enqueue_script( 'dependent-script-a3', '/dependent-script-a3.js', array( 'main-script-a3' ), null );
$output = get_echo( 'wp_print_scripts' );
$expected = "<script type='text/javascript' src='/main-script-a3.js' id='main-script-a3-js' data-wp-strategy='{$strategy}'></script>";
$expected = str_replace( "'", '"', "<script type='text/javascript' src='/main-script-a3.js' id='main-script-a3-js' data-wp-strategy='{$strategy}'></script>" );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a assertEqualMarkup() scenario?
if not, the HEREDOC might make it less quirky

	$expected = <<<HTML
<script type="text/javascript" src="/main-script-a3.js" id="main-script-a3-js" data-wp-strategy="{$strategy}"></script>
HTML;

the initial and terminal newlines are stripped automatically, so this is a trimmed string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, well, it's actually checking for only one of the script tags of the two. But I can just add the other script so it is checking for both.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 9e0c1f5 as part of #5301

array(
"/* <![CDATA[ */\n",
"\n/* ]]> */",
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is worrisome because it conflates syntax and content. I guess it's probably low-risk in our tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants