Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

issue #88: Prevent infinite looping on empty/short HTML comment #89

Merged
merged 6 commits into from
Aug 18, 2019

Conversation

TotalWipeOut
Copy link
Contributor

@TotalWipeOut TotalWipeOut commented Aug 15, 2019

This fixes issue detailed in #88

  • Prevents infinite loop in short/empty HTML comments
  • PHPUnit tests

Copy link
Member

@michalbundyra michalbundyra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TotalWipeOut Thanks for your contribution.

After your changes the following example is not working correctly:

A <!--My favorite operators are > and <!--> B

After filtering we should have just A B, but with your changes we are getting A only.

This is from: https://www.w3.org/TR/html52/syntax.html#comments

@TotalWipeOut
Copy link
Contributor Author

Thanks for taking a look - Great, leave with me, I will add that to the unit tests and get it working

@TotalWipeOut
Copy link
Contributor Author

TotalWipeOut commented Aug 15, 2019

@TotalWipeOut Thanks for your contribution.

After your changes the following example is not working correctly:

A <!--My favorite operators are > and <!--> B

After filtering we should have just A B, but with your changes we are getting A only.

This is from: https://www.w3.org/TR/html52/syntax.html#comments

@webimpress Had a look at this, I would suggest this a different bug to this one. The way the code is working is that it strips comments starting from the end of the string, so it finds <!--> first, and strips it. Then in the remainder we have only <!-- without an -->, which the code is designed to strip to the end of the string - which it does. Fixing this is almost a refactor.
Also, interesting is that running this html snippet on master branch causes and infinite loop - so this particular check has never been run!
I would prefer this issue only fix the infinite loop issue.

UPDATE: @webimpress OK, so I re-wrote the loop that strips HTML comments. Now passing all unit tests including your suggestion. and now no need for a regex...

Copy link
Member

@michalbundyra michalbundyra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TotalWipeOut It looks much better now!
I like that we don't need to use regexp anymore and everything is just simple string operations.

@michalbundyra michalbundyra added this to the 2.9.2 milestone Aug 15, 2019
public function badCommentProvider()
{
return [
['A <!--> B', 'A '], // Should be treated as just an open

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the tests. But just curious, would tests about these scenarios make sense as well:

  • multiline comments
  • stacked comments (i know this seems odd, but still it should work A <!-- <!-- --> --> B should return A B

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TotalWipeOut would you be able to add scenarios mentioned above?

I would suggest also: A <!--> B <!--> C and I believe it should return A C.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@icanhazstring

case with nested comment is invalid (it is not possible to have nested comments).
I agree that we should decide somehow to proceed them, and I would go with the same as modern browser. Just checked the following example on Chrome:

A <!-- B <!-- C --> D --> E

and as the result I've got:

A D --> E

so not A E as you would expect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will add these extra tests

Copy link
Contributor Author

@TotalWipeOut TotalWipeOut Aug 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webimpress and @icanhazstring We have an issue with this test:
A <!-- B <!-- C --> D --> E -> A D --> E
After the loop for stripping comments, $value is correctly A D --> E
The following loop strips the lone > so the result is A D -- E which doesn't seem right. But this is existing behaviour

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TotalWipeOut I've checked it, and it looks like desired behaviour. We have tests to cover these scenarios, that > is removed, for example:
testFilterGt: Ensures that any greater-than symbols ‘>’ are removed from text having no tags:

/**
* Ensures that any greater-than symbols '>' are removed from text having no tags
*
* @return void
*/
public function testFilterGt()
{
$filter = $this->_filter;
$input = '2 > 1 === true ==> $object->property';
$expected = '2 1 === true == $object-property';
$this->assertEquals($expected, $filter($input));
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webimpress totally fine with me to match the filter with modern browser behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@webimpress I agree, it was designed to strip orphan > characters. But I would say that this isn't correct behaviour, considering that strip_tags() and Chrome leave that character untouched.
@icanhazstring If we do that, a lot of tests will fail and need to be re-written.

How do I proceed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TotalWipeOut I would keep it for now, as changing it now will be BC Break. We should log another issue and change the behaviour in next major version. So for now - in your test - expected value should be A D -- E as you said.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks. I have added the requested tests, 11 in total now 🙂

$value = '';
$open = '<!--';
$openLen = strlen($open);
$close = '-->';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why, but before we expect closing tag to match reg exp --\s*>.
Now, we explicitly expecting no any spaces between -- and >.

Do you think expecting some spaces there was a bug? I can't see any other test which fail because of that change, and also I haven't seen anything about these spaces in the specs.

Copy link
Contributor Author

@TotalWipeOut TotalWipeOut Aug 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't see why it was doing that either. Running some additional tests this loops now handles comments in the same way strip_tags does. So i can only think this was a hidden feature/bug?
With the string test<!---- -- > -- > --> the previous version returned test -- -- where as now it returns test

Should the previous behaviour be restored?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I think the current behaviour is correct and comply with the html comment spec. Also, as you said, the same way strip_tags behaves, so I think it's right. You can add this test case as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, cool. Test added

['A <!-- --> B', 'A B'],
['A <!--> B <!--> C', 'A C'],
['A <!-- -- > -- > -->', 'A '],
["A <!-- B\n C\n D -->", 'A '],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this two above tests we should have something after --> so we know that not the whole content after opening is removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@michalbundyra
Copy link
Member

@TotalWipeOut It looks good to me, thanks! 👍

@icanhazstring would you mind to have another look, please?

Copy link

@icanhazstring icanhazstring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TotalWipeOut well done. Thank you 👍

michalbundyra added a commit that referenced this pull request Aug 18, 2019
issue #88: Prevent infinite looping on empty/short HTML comment
michalbundyra added a commit that referenced this pull request Aug 18, 2019
@michalbundyra michalbundyra merged commit 0ce8431 into zendframework:master Aug 18, 2019
michalbundyra added a commit that referenced this pull request Aug 18, 2019
@michalbundyra
Copy link
Member

Thanks, @TotalWipeOut!

@TotalWipeOut TotalWipeOut deleted the hotfix/88 branch August 20, 2019 08:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants