-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve validating sanitizer with context for why element/attribute is invalid #3780
Conversation
a909107
to
aa129aa
Compare
a2ff9d1
to
e5ef273
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at everything in detail yet, but added a few general comments and questions about the approach.
@@ -537,13 +551,15 @@ public function prepare_validation_error( array $error = [], array $data = [] ) | |||
|
|||
if ( $node instanceof DOMElement ) { | |||
if ( ! isset( $error['code'] ) ) { | |||
$error['code'] = AMP_Validation_Error_Taxonomy::INVALID_ELEMENT_CODE; | |||
$error['code'] = AMP_Tag_And_Attribute_Sanitizer::DISALLOWED_TAG; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base class retrieves internals from one of its extending classes.
Can we extract all of the validation error constants into a separate class? I tend to create an interface
with only constants in such a case. An interface with constants better communicates what is happening: two different pieces of code have a contract about how to refer to a specific problem. Contracts should be encoded in interface, not use internal logic of one another.
namespace Amp\AmpWP;
interface ValidationError {
const DISALLOWED_TAG = 'DISALLOWED_TAG';
const DISALLOWED_CHILD_TAG = 'DISALLOWED_CHILD_TAG';
const DISALLOWED_FIRST_CHILD_TAG = 'DISALLOWED_FIRST_CHILD_TAG';
// [...]
}
Thi is separate of any inheritance chain, and provides one central place where you retrieve constants from. The interface name also makes the syntax nicer to read:
$error['code'] = ValidationError::DISALLOWED_TAG;
If we need to have more grouping, we can have multiple interfaces, like CssValidationError
, HtmlValidationError
, ...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. On one hand, I think this should just be removed entirely, and let the code
be UNKNOWN_ERROR
. Each sanitizer should be responsible for indicating the code, specifically.
Otherwise, I think this should be scoped with what you discussed in #3780 (comment). It's bound up with generating PHP objects from the spec.
So I think we should do it, but probably not in the scope of this PR.
$error_code = self::INVALID_BLACKLISTED_VALUE_REGEX; | ||
} elseif ( isset( $attr_spec_rule[ AMP_Rule_Spec::VALUE_PROPERTIES ] ) && | ||
AMP_Rule_Spec::FAIL === $this->check_attr_spec_rule_value_properties( $node, $attr_name, $attr_spec_rule ) ) { | ||
// @todo Should there be a separate validation error for each invalid property? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have 100 properties, you'd love to know exactly which ones are invalid and which ones are valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Either the invalid properties could be communicated as a property of the one validation error, or there could be multiple validation errors each for a single property. That's the question here. I think the latter is better.
Rebasing after large PR merged into |
@schlessera There are 3 error codes left which do not have test assertions:
|
…id-markup-reason * 'develop' of github.com:ampproject/amp-wp: Pull the built `block-libray` package from Gutenberg SVN if it does not exists (#3847) Update dependency @babel/plugin-transform-react-jsx to v7.7.4 (#3688) Update dependency @babel/plugin-proposal-class-properties to v7… (#3687) Update dependency @babel/cli to v7.7.4 (#3685) Update dependency browserslist to v4.7.3 (#3792) Update dependency postcss to v7.0.23 (#3791) Update dependency autoprefixer to v9.7.2 (#3679) For the Gallery block, use the recommended amp-lightbox-gallery
|
@@ -82,7 +82,7 @@ abstract class AMP_Base_Sanitizer { | |||
* | |||
* @var array | |||
*/ | |||
private $should_not_removed_nodes = []; | |||
private $nodes_to_keep = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😜
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, maybe I should mention I searched for usage first:
@@ -1466,6 +1518,10 @@ private function check_attr_spec_rule_disallowed_relative( DOMElement $node, $at | |||
if ( isset( $attr_spec_rule[ AMP_Rule_Spec::VALUE_URL ][ AMP_Rule_Spec::ALLOW_RELATIVE ] ) && ! $attr_spec_rule[ AMP_Rule_Spec::VALUE_URL ][ AMP_Rule_Spec::ALLOW_RELATIVE ] ) { | |||
if ( $node->hasAttribute( $attr_name ) ) { | |||
foreach ( $this->extract_attribute_urls( $node->getAttributeNode( $attr_name ) ) as $url ) { | |||
if ( '__amp_source_origin' === $url ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use a constant for this as well?
@@ -1486,6 +1542,10 @@ private function check_attr_spec_rule_disallowed_relative( DOMElement $node, $at | |||
foreach ( $attr_spec_rule[ AMP_Rule_Spec::ALTERNATIVE_NAMES ] as $alternative_name ) { | |||
if ( $node->hasAttribute( $alternative_name ) ) { | |||
foreach ( $this->extract_attribute_urls( $node->getAttributeNode( $alternative_name ), $attr_name ) as $url ) { | |||
if ( '__amp_source_origin' === $url ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, use of a constant makes sense.
tests/php/test-amp-tag-and-attribute-sanitizer-private-methods.php
Outdated
Show resolved
Hide resolved
Did a quick search, and no one seems to be using the constants in |
…id-markup-reason * 'develop' of github.com:ampproject/amp-wp: (143 commits) Update dependency @wordpress/block-editor to v3.3.0 (#3691) Update dependency @wordpress/editor to v9.8.0 (#3693) Update dependency @wordpress/compose to v3.8.0 (#3736) Use comment as array key for data set to show when failure happens Remove unused AMP_YouTube_Embed_Handler::sanitize_v_arg method after 7a97571 Bump stylesheet cache group after #3866 (#3880) Delete AMP_YouTube_Embed_Handler::shortcode() and oembed() Delete AMP_Twitter_Embed_Handler::oembed() Prevent wrapping plugin names in code tags Update dependency @wordpress/blocks to v6.8.0 (#3734) Update dependency @wordpress/core-data to v2.8.0 (#3737) Update dependency @wordpress/edit-post to v3.9.0 (#3692) Update dependency @wordpress/components to v8.4.0 (#3735) Update dependency @wordpress/element to v2.9.0 (#3741) Align @param descriptions in test_video_override Replace a call to ->shortcode() with the logic from shortcode() Refactor AMP_Vimeo_Embed_Handler::shortcode() into video_override() Deprecate AMP_YouTube_Embed_Handler::shortcode() Restore AMP_YouTube_Embed_Handler::video_override() Improve theme inline CSS checks ...
Co-Authored-By: Alain Schlesser <alain.schlesser@gmail.com>
Remove STYLESHEET_INVALID_FILE_PATH code and always fall-back to HTTP request when file not on file system
…s as strings Co-Authored-By: Alain Schlesser <alain.schlesser@gmail.com>
Not really. It's an error that was returned (sometimes) when a file looked like it was on the filesystem but it could not be read. However, I've now eliminated this in 827659a by always falling back to an HTTP request when the file can't seem to be found on the filesystem. |
If it's alright, tomorrow might be the earliest I could review this. |
@westonruter You've moved this back into Ready for Review, but looking at it, it seems to be merged and there was no mention of this needing additional changes. Is this Done? |
I hadn't moved it to Done yet because #1420 isn't done yet and I wanted to make sure the other follow-ups had been filed as issues. They have now, so I'll move this to done. |
Summary
invalid_element
when a required attribute is missing, the more specific errorATTR_REQUIRED_BUT_MISSING
is raised which explains why it is invalid. Similarly, when an attribute value is incorrect, instead ofinvalid_attribute
occurring when an attribute value violates a regex pattern, a more specificINVALID_ATTR_VALUE_REGEX
pattern is raised. Methods which check for whether a node is valid now return an error code on failure rather than justfalse
. These fine-grained error codes will be used in a subsequent PR to add detailed messages, though at the moment just codes are used.AMP_Tag_And_Attribute_Sanitizer::validate_attr_spec_list_for_node()
which prevent subsequent ability to raise specific error codes while sanitizing attributes.spec_name
is included in the validation error, allowing in a future PR to link a validation error directly to the spec which defines specifically why it is invalid.error_message
forblacklisted_cdata_regex
is as expected (since we map to an error code).ATTR_REQUIRED_BUT_MISSING
validation errors whensrc
is missing.AMP_Tag_And_Attribute_Sanitizer::sanitize_disallowed_attribute_values_in_node
read only, deferring removal or emptying of invalid attribute toAMP_Base_Sanitizer::remove_invalid_attribute
.AMP_Tag_And_Attribute_Sanitizer::validate_attr_spec_list_for_node()
(which should be read-only) to a later point.get_disallowed_attributes_in_node
intoAMP_Tag_And_Attribute_Sanitizer::sanitize_disallowed_attribute_values_in_node
.missing_body_element
validation error in Style sanitizer in favor of just adding thebody
.AMP_Tag_And_Attribute_Sanitizer::validate_attr_spec_list_for_node()
always return integer scores, rather than use a float for the special case.AMP_Tag_And_Attribute_Sanitizer
by eliminating duplicated assertions and adding error code checking to each test.__amp_source_origin
, preventing it from being interpreted as a relative URL.See #1420.
For Subsequent PRs
class-amp-allowed-tags-generated.php
to be more efficient to look up a tag spec byspec_name
. See Refactor generated spec data to facilitate looking up by spec name #3817.\AMP_Validation_Error_Taxonomy::get_error_title_from_code()
and into the respective sanitizer that originated the validation error. Improve structure of validation error to make it easier to construct messages. For example, harmonizenode_attributes
andelement_attributes
to justelement_attributes
. See Improve architecture of composing error messages for each validation error code #4071.amp_validity
REST API field to use in the block warnings, as opposed to duplicating the error message logic. See Improve displaying validation errors' invalid markup in Gutenberg block warning notice #3664.DISALLOWED_PROPERTY_IN_ATTR_VALUE
validation errors for each invalid property, and update the value when sanitization is accepted. See Add AMP_DOM_Document & meta tag sanitizer #3758 (comment) and Add validation of individual properties in meta content attributes #4070.Checklist