Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for amp-bind #895

Merged
merged 7 commits into from
Jan 25, 2018
Merged

Add support for amp-bind #895

merged 7 commits into from
Jan 25, 2018

Conversation

westonruter
Copy link
Member

@westonruter westonruter commented Jan 24, 2018

  • Rename [x] binding attribute syntax with data-amp-binding-RANDOMSTRING---x so PHP can parse it.
  • When iterating over attributes in the sanitizer, treat attribute names prefixed by data-amp-binding...--- as being bracketed instead.
  • Make sure that the whitelist sanitizer adds the amp-bind component script when it comes across an AMP bound attribute.
  • When serializing back-out, replace data-amp-binding-... with the original syntax.

Fixes #836.

@westonruter westonruter added this to the v0.7 milestone Jan 24, 2018
@westonruter westonruter changed the title [WIP] Fix amp-bind Add support for amp-bind Jan 24, 2018
@amedina amedina requested a review from pbakaus January 24, 2018 22:01
@pbakaus
Copy link

pbakaus commented Jan 24, 2018

Nice! Generally seems reasonable, but want to see if @choumx has any additional thoughts.

@dreamofabear
Copy link

ampproject/amphtml#11115 would help but haven't had the time. 😓

@westonruter
Copy link
Member Author

I'm glad that an XHTML-friendly option is being considered, but we'll need to support the existing bracketed syntax as well.

Copy link
Collaborator

@ThierryA ThierryA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very exciting ❤️ I left my CR below.

}

// Save all alternative names in lookup to improve performance.
if ( isset( $attr_spec[ AMP_Rule_Spec::ALTERNATIVE_NAMES ] ) ) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to cache this in the conditional statement above to avoid looping through the $attr_spec[ AMP_Rule_Spec::ALTERNATIVE_NAMES ]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but there are other alternative names than just the bind ones. For example, srcset is an alt for src, and xlink:href is an alternate for href. These need to be accounted for as well. So that is why there is a separate conditional.

// Prepare whitelists.
$this->allowed_tags = $this->args['amp_allowed_tags'];
foreach ( AMP_Rule_Spec::$additional_allowed_tags as $tag_name => $tag_rule_spec ) {
$this->allowed_tags[ $tag_name ][] = $tag_rule_spec;
Copy link
Collaborator

@ThierryA ThierryA Jan 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is $this->allowed_tags[ $tag_name ] always an array, amp-share-tracking doesn't seem to be declared as an array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic was just moved from the sanitize method to the constructor. Yes, it should always an array. The $additional_allowed_tags is an array of arrays:

https://github.com/Automattic/amp-wp/blob/216ec376ab5bc698e03a149e6e68fa771cffd2d0/includes/sanitizers/class-amp-rule-spec.php#L88-L95

Copy link
Collaborator

@ThierryA ThierryA Jan 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though it is still looking for $this->allowed_tags[ 'amp-share-tracking' ] which doesn't exists isn't it? To make sure the array is conventionally declared, then we should have an:

if ( ! isset( $this->allowed_tags[ $tag_name ] ) ) {
	$this->allowed_tags[ $tag_name ] = array();
}

Like you have done here.

Copy link
Member Author

@westonruter westonruter Jan 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially but it's just moved from existing code and it didn't have any problem. And actually, my use of isset() is actually overkill it seems. It's actually not required in PHP at all. For example:

<?php
error_reporting(E_ALL);
$array = array();
$array['foo']['bar']['baz'] = 'Hello!';
print_r( $array );

Results in:

Array
(
    [foo] => Array
        (
            [bar] => Array
                (
                    [baz] => Hello!
                )

        )

)

There are no notices or warnings. It works all the way back to PHP 4.3.0 (and beyond potentially): https://3v4l.org/b8ZaG

/**
* Replace AMP binding attributes with something that libxml can parse (as HTML5 data-* attributes).
*
* This is necessary necessary because attributes in square brackets are not understood in PHP and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo mistake necessary necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very necessary 😉

@@ -56,6 +56,8 @@ public static function get_dom( $document ) {

$dom = new DOMDocument();

$document = self::convert_amp_bind_attributes( $document );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at this stage it would be beneficial to look at extending the DOMDocument. For example here is a quick prototype which handles the amp binding conversion:

/**
 * Custom AMP DOM Document.
 */
class AMP_Dom_Document extends DOMDocument {

	/**
	 * Class constructor.
	 *
	 * @param string $version  The version number of the document as part of the XML declaration.
	 * @param string $encoding The encoding of the document as part of the XML declaration.
	 */
	public function __construct( $version = null, $encoding = null ) {
		parent::__construct( $version, $encoding );
	}

	/**
	 * Load HTML.
	 *
	 * @param string $source  The HTML code
	 * @param int    $options Additional Libxml parameters
	 * @return boolean True on success or False on failure
	 */
	public function loadHTML( $source, $options = 0 ) {
		$source = AMP_DOM_Utils::convert_amp_bind_attributes( $source );
		parent::loadHTML( $source, $options );
	}

	/**
	 * Save HTML.
	 *
	 * @param DOMNode $node Optional parameter to output a subset of the document.
	 * @return string The document (or node) HTML code as string
	 */
	public function saveHTML( DOMNode $node = null ) {
		$html = parent::saveHTML( $node );
		return AMP_DOM_Utils::restore_amp_bind_attributes( $html );
	}
}

Then new AMP_Dom_Document would be called instead of new DOMDocument.

Eventually, AMP_Dom_Document could call the entire sanitization logic making the processing workflow easier to understand and more bullet proof.

Also, instead of removing attributes causing Warning: DOMDocument::loadHTML(): error parsing attribute name and restoring them after saveHTML(), we could look at potentially using DOMDocument::registerNodeClass() in our AMP_DOM_Document::__construct to allow AMP attributes and values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good idea, but I want to wait to do that until we do the sanitization of the entire doc including the head.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, it is a good enhancement but it's not critical for this first merge.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is perfectly fine, good to have it as a @todo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThierryA And this is now done! 😄 There is a shiny new \Amp\AmpWP\Dom\Document courtesy of @schlessera via #3758.

*/
private function process_alternate_names( $attr_spec_list ) {
foreach ( $attr_spec_list as $attr_name => &$attr_spec ) {
if ( '[' === $attr_name[0] ) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is $this->args['amp_bind_placeholder_prefix'] prefix used for spec attributes like [src]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since such attributes aren't understood by DOMDocument we add an safe placeholder version of that same attribute (e.g. amp-binding-1234…-src) and add it to the alternative names so it can be matched instead when the nodes are processed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could use another term than binding in the future if some of the attributes are not specific to the binding component.

* @param array $attr_spec_list Attribute spec list.
* @return array Modified attribute spec list.
*/
private function process_alternate_names( $attr_spec_list ) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called for all tags and then for all attributes of each tag which is quite resource heavy. My understanding is that it is used to add the $this->args['amp_bind_placeholder_prefix'] to the AMP_Rule_Spec::ATTR_SPEC_LIST which is later on used for the allowed attrs and flagging scripts.

To avoid having to do all these loops, we could perhaps do the conditional check at the is_amp_allowed_attribute() and process_node() level since we have access to the single node attributes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it should just be called once for each tag. It loops over all attributes itself.

By doing it once at the constructor we avoid double-nested foreach loops inside the sanitize_disallowed_attributes_in_node method below. Since the double foreach O(n^2) was being done at the node level, it could be extremely poor for performance. So now is_amp_allowed_attribute is able to use the rev_alternate_attr_name_lookup to do fast lookups.

@westonruter westonruter dismissed ThierryA’s stale review January 25, 2018 20:12

Code review feedback applied.

@westonruter westonruter merged commit 4744f2d into develop Jan 25, 2018
@westonruter westonruter deleted the fix/amp-bind branch January 25, 2018 20:12
@ThierryA
Copy link
Collaborator

ThierryA commented Jan 6, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AMP_Content_Sanitizer::sanitize() strips out amp-bind Bindings
4 participants