-
Notifications
You must be signed in to change notification settings - Fork 35
issue #88: Prevent infinite looping on empty/short HTML comment #89
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TotalWipeOut Thanks for your contribution.
After your changes the following example is not working correctly:
A <!--My favorite operators are > and <!--> B
After filtering we should have just A B
, but with your changes we are getting A
only.
This is from: https://www.w3.org/TR/html52/syntax.html#comments
Thanks for taking a look - Great, leave with me, I will add that to the unit tests and get it working |
@webimpress Had a look at this, I would suggest this a different bug to this one. The way the code is working is that it strips comments starting from the end of the string, so it finds UPDATE: @webimpress OK, so I re-wrote the loop that strips HTML comments. Now passing all unit tests including your suggestion. and now no need for a regex... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TotalWipeOut It looks much better now!
I like that we don't need to use regexp anymore and everything is just simple string operations.
public function badCommentProvider() | ||
{ | ||
return [ | ||
['A <!--> B', 'A '], // Should be treated as just an open |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the tests. But just curious, would tests about these scenarios make sense as well:
- multiline comments
- stacked comments (i know this seems odd, but still it should work
A <!-- <!-- --> --> B
should returnA B
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TotalWipeOut would you be able to add scenarios mentioned above?
I would suggest also: A <!--> B <!--> C
and I believe it should return A C
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case with nested comment is invalid (it is not possible to have nested comments).
I agree that we should decide somehow to proceed them, and I would go with the same as modern browser. Just checked the following example on Chrome:
A <!-- B <!-- C --> D --> E
and as the result I've got:
A D --> E
so not A E
as you would expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will add these extra tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@webimpress and @icanhazstring We have an issue with this test:
A <!-- B <!-- C --> D --> E
-> A D --> E
After the loop for stripping comments, $value
is correctly A D --> E
The following loop strips the lone >
so the result is A D -- E
which doesn't seem right. But this is existing behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TotalWipeOut I've checked it, and it looks like desired behaviour. We have tests to cover these scenarios, that >
is removed, for example:
testFilterGt
: Ensures that any greater-than symbols ‘>’ are removed from text having no tags
:
zend-filter/test/StripTagsTest.php
Lines 200 to 211 in 922fe11
/** | |
* Ensures that any greater-than symbols '>' are removed from text having no tags | |
* | |
* @return void | |
*/ | |
public function testFilterGt() | |
{ | |
$filter = $this->_filter; | |
$input = '2 > 1 === true ==> $object->property'; | |
$expected = '2 1 === true == $object-property'; | |
$this->assertEquals($expected, $filter($input)); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@webimpress totally fine with me to match the filter with modern browser behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@webimpress I agree, it was designed to strip orphan >
characters. But I would say that this isn't correct behaviour, considering that strip_tags()
and Chrome leave that character untouched.
@icanhazstring If we do that, a lot of tests will fail and need to be re-written.
How do I proceed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TotalWipeOut I would keep it for now, as changing it now will be BC Break. We should log another issue and change the behaviour in next major version. So for now - in your test - expected value should be A D -- E
as you said.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks. I have added the requested tests, 11 in total now 🙂
$value = ''; | ||
$open = '<!--'; | ||
$openLen = strlen($open); | ||
$close = '-->'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why, but before we expect closing tag to match reg exp --\s*>
.
Now, we explicitly expecting no any spaces between --
and >
.
Do you think expecting some spaces there was a bug? I can't see any other test which fail because of that change, and also I haven't seen anything about these spaces in the specs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't see why it was doing that either. Running some additional tests this loops now handles comments in the same way strip_tags
does. So i can only think this was a hidden feature/bug?
With the string test<!---- -- > -- > -->
the previous version returned test -- --
where as now it returns test
Should the previous behaviour be restored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I think the current behaviour is correct and comply with the html comment spec. Also, as you said, the same way strip_tags
behaves, so I think it's right. You can add this test case as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, cool. Test added
test/StripTagsTest.php
Outdated
['A <!-- --> B', 'A B'], | ||
['A <!--> B <!--> C', 'A C'], | ||
['A <!-- -- > -- > -->', 'A '], | ||
["A <!-- B\n C\n D -->", 'A '], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this two above tests we should have something after -->
so we know that not the whole content after opening is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@TotalWipeOut It looks good to me, thanks! 👍 @icanhazstring would you mind to have another look, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TotalWipeOut well done. Thank you 👍
issue #88: Prevent infinite looping on empty/short HTML comment
Thanks, @TotalWipeOut! |
This fixes issue detailed in #88