Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trailing text is not preserved #506

Closed
IvanPizhenko opened this issue Oct 15, 2021 · 7 comments
Closed

Trailing text is not preserved #506

IvanPizhenko opened this issue Oct 15, 2021 · 7 comments
Labels

Comments

@IvanPizhenko
Copy link

To Reproduce

  1. Get sample project that demonstrates bug (2 files) from here: https://gist.github.com/IvanPizhenko/100c92f523a994fd425ff8f16420f12d
  2. Run npm install
  3. Run node index.js

Expected behavior

I expect trailing text Above is preserved, i.e. sanitized output should be:

Below
<span>[image: image.png]</span>
Above

Describe the bug

Trailing text Above is lost, i.e. I am getting:

Below
<span>[image: image.png]</span>

Details

Version of Node.js: v12.22.6

Server Operating System: Linux

Additional context:
I am trying to upgrade sanitize-html from version 1.21.1 to latest 2.5.2 (at the moment of submitting this issue) and getting the above described issue. According to my research (I have tried multiple versions) bug has appeared in version 1.27.3.

@tomholub
Copy link
Contributor

Seems to be a regression. Maintainers: could you point us in the right direction as far as any guesses on where to look? We may attempt a fix.

@boutell
Copy link
Member

boutell commented Oct 26, 2021

It sounds like it might be an upstream issue with the htmlparser2 module. Might be — haven't delved into it.

@alex-rantos
Copy link

Stumble upon the issue myself when upgraded to latest from 1.22.

It's a bug on transformTags on text property which replaces content. (fyi: @tomholub)

@IvanPizhenko
Copy link
Author

@alex-rantos Do you know how to fix this?

@alex-rantos
Copy link

Not really I just narrowed down the repro steps a bit further and shared it here.
However, I will try to give it a go once I find some free time but I cannot promise anything - not familiar with transformTags flow.

@alex-rantos
Copy link

alex-rantos commented Dec 4, 2021

It's a regression from this commit alex-rantos@b218a72

@abea the following condition fails to append the trailing text as addedText = true in this case.

else if (!addedText) {
  result += escaped;
}

In case you take a look this is the test case that fails:

  it('should preserve trailing text when replacing the tagName and adding new text via transforming function', function () {
    assert.equal(sanitizeHtml('<p>text before <br> text after</p>', {
      transformTags: {
        br: function (_tagName, _attribs) {
          return {
            tagName: 'span',
            text: ' '
          };
        }
      }
    }), '<p>text before <span> </span> text after</p>');
  });

@IvanPizhenko
Copy link
Author

Works correctly in the version 2.6.1. Thank you for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants