Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest version (2.5.1) of sanitize-html cuts extra word #542

Closed
IvanPizhenko opened this issue Sep 24, 2021 · 13 comments · Fixed by #1280
Closed

Latest version (2.5.1) of sanitize-html cuts extra word #542

IvanPizhenko opened this issue Sep 24, 2021 · 13 comments · Fixed by #1280
Assignees
Labels
actionable thirdparty Issue is caused by a third party bug or depends on a thirdparty update

Comments

@IvanPizhenko
Copy link
Contributor

IvanPizhenko commented Sep 24, 2021

UPDATE so far there is already version 2.5.2, but the problem still happen.

Latest version (2.5.1) of sanitize-html cuts extra word:

AssertionError {
    actual: {
      replyType: 'plain',
      subject: 'tiny inline img plain',
      text: `Below␊
      [image: image.png]`,
    },
    expected: {
      replyType: 'plain',
      subject: 'tiny inline img plain',
      text: `Below␊
      [image: image.png]␊
      Above`,
    },
    message: `expected { text: 'Below\\n[image: image.png]',␊
      replyType: 'plain',␊
      subject: 'tiny inline img plain' } to deeply equal { text: 'Below\\n[image: image.png]\\nAbove',␊
      replyType: 'plain',␊
      subject: 'tiny inline img plain' }`,
    operator: 'deepStrictEqual',
    showDiff: true,
  }

Above is cut away
So I am going to revert and restore the fix.
It can be separate issue to upgrade it and fix this.

Originally posted by @IvanPizhenko in #541 (comment)

@tomholub
Copy link
Collaborator

Maybe, for the new version, we have to use it slightly differently to achieve the same result. Or maybe it's just a bug in their lib.

@tomholub
Copy link
Collaborator

It's probably a representation of a larger issue that we were lucky to catch in tests. Good to have a look so that recipients don't see truncated messages.

@IvanPizhenko
Copy link
Contributor Author

@tomholub I will look into it, but please note for now we continue to use some previous version, so the problem does not show up now.

IvanPizhenko pushed a commit that referenced this issue Oct 15, 2021
@tomholub
Copy link
Collaborator

Understood - good to know. It will be good to know why this is happening (and find a way to fix it, either in the lib or ourselves) because this is the kind of library that we should keep up to date, because newer versions should have more patches against exploits.

@IvanPizhenko
Copy link
Contributor Author

IvanPizhenko commented Oct 15, 2021

Interesting - it is missing only in the plain text extracted from HTML.

    const { contentBlock, text } = fmtContentBlock(msgContentBlocks);
    console.log(`>>>> contentBlock:\n${JSON.stringify(contentBlock)}\n\ntext:\n${text}`);

prints:

>>>> msgContentBlocks[1]:
{"type":"plainAtt","content":"","complete":true,"attMeta":{"name":"image.png","type":"image/png","length":266,"data":"iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAABHNCSVQICAgIfAhkiAAAAMFJREFUOE+lU9sRg0AIZDNpym9rSAumJm0hNfidsgic5w1wGJ1kZ3zgwvI4AQtIAHrq4zKY5uJ715sGP7C44BdPnZj1gaRVERBPpYJfUSpoGLeyir2Glg64mxMQg9f6xQbU94zrBDBWgVCBBmecbyGWbcrLgpX+OkR+L4ShPw3bdtdCnMmZfSig2a+gtcD1R0LyA1mh6OdmsJNnmW0Sfwp75LYevQ5AsUI3g0aKI+llEe3KQbcx28SsnZi9LNO/6/wBmhVJ7HDmOd4AAAAASUVORK5CYII=","inline":false,"cid":"<ii_jz5exwmh0>"}}
>>>> contentBlock:
{"type":"plainHtml","content":"\n  <!DOCTYPE html><html>\n    <head>\n      <meta name=\"viewport\" content=\"width=device-width\" />\n      <style>\n        body { word-wrap: break-word; word-break: break-word; hyphens: auto; margin-left: 0px; padding-left: 0px; }\n        body img { display: inline !important; height: auto !important; max-width: 95% !important; }\n        body pre { white-space: pre-wrap !important; }\n        body > div.MsgBlock > table { zoom: 75% } /* table layouts tend to overflow - eg emails from fb */\n      </style>\n    </head>\n    <body><div class=\"MsgBlock plain\" style=\"background: white;padding-left: 8px;min-height: 50px;padding-top: 4px;padding-bottom: 4px;width: 100%;border: none;\"><div><div>Below</div><div><div><img src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAABHNCSVQICAgIfAhkiAAAAMFJREFUOE+lU9sRg0AIZDNpym9rSAumJm0hNfidsgic5w1wGJ1kZ3zgwvI4AQtIAHrq4zKY5uJ715sGP7C44BdPnZj1gaRVERBPpYJfUSpoGLeyir2Glg64mxMQg9f6xQbU94zrBDBWgVCBBmecbyGWbcrLgpX+OkR+L4ShPw3bdtdCnMmZfSig2a+gtcD1R0LyA1mh6OdmsJNnmW0Sfwp75LYevQ5AsUI3g0aKI+llEe3KQbcx28SsnZi9LNO/6/wBmhVJ7HDmOd4AAAAASUVORK5CYII=\" alt=\"image.png\" /><br /></div></div><div>Above<br /></div></div></div><!-- next MsgBlock -->\n</body>\n  </html>","complete":true}

text:
Below
[image: image.png]

HTML part above contains:

<div>Above<br /></div>

@IvanPizhenko
Copy link
Contributor Author

This one:

    text = dereq_html_sanitize(text, {
      allowedTags: ['img', 'span'],
      allowedAttributes: { img: ['src'] },
      allowedSchemes: Xss.ALLOWED_SCHEMES,
      transformTags: {
        'img': (tagName, attribs) => {
          return { tagName: 'span', attribs: {}, text: `[image: ${attribs.alt || attribs.title || 'no name'}]` };
        },
      }
    });

removes "Above" from the:

Below
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAABHNCSVQICAgIfAhkiAAAAMFJREFUOE+lU9sRg0AIZDNpym9rSAumJm0hNfidsgic5w1wGJ1kZ3zgwvI4AQtIAHrq4zKY5uJ715sGP7C44BdPnZj1gaRVERBPpYJfUSpoGLeyir2Glg64mxMQg9f6xQbU94zrBDBWgVCBBmecbyGWbcrLgpX+OkR+L4ShPw3bdtdCnMmZfSig2a+gtcD1R0LyA1mh6OdmsJNnmW0Sfwp75LYevQ5AsUI3g0aKI+llEe3KQbcx28SsnZi9LNO/6/wBmhVJ7HDmOd4AAAAASUVORK5CYII=" alt="image.png" />
Above

@IvanPizhenko
Copy link
Contributor Author

@tomholub I think it's bug in the sanitize-html and we should report it.

@tomholub
Copy link
Collaborator

sounds like - please give them a code sample they could run

@IvanPizhenko
Copy link
Contributor Author

Posted apostrophecms/sanitize-html#506

@IvanPizhenko IvanPizhenko added postpone Postpone this issue thirdparty Issue is caused by a third party bug or depends on a thirdparty update labels Oct 15, 2021
@IvanPizhenko
Copy link
Contributor Author

@tomholub Should I try to understand their sanitizer logic and try to fix that (will take time), or we will wait for their response?

@tomholub
Copy link
Collaborator

Let's wait a few weeks first. It may really be difficult.

@tomholub
Copy link
Collaborator

Un-assigning while we're waiting.

@IvanPizhenko
Copy link
Contributor Author

There is new version 2.6.1 that solves the issue. I will try to integrate it.

@IvanPizhenko IvanPizhenko added in progress Work on the issue is in progress and removed postpone Postpone this issue labels Jan 7, 2022
sosnovsky added a commit that referenced this issue Jan 8, 2022
sosnovsky added a commit that referenced this issue Jan 8, 2022
* issue #542 Upgrade to sanitize-html 2.5.2

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* recreate package-lock.json

* Recreate package-lock.json

* Upgrade to santize-html 2.6.1

* remove .only

* comment out debug stuff

* Update prod bundle

* remove debug stuff

* package order

* remove extra empty line

* remove debugPrintArray

* remove extra newline

* Update prod bundle

* #542 update fix-bundles.js

* update semaphore node version

* update fix-bundles.js

Co-authored-by: Ivan Pizhenko <Ivan.Pizhenko@users.noreply.github.com>
Co-authored-by: Roma Sosnovsky <roma.sosnovsky@gmail.com>
@tomholub tomholub removed the in progress Work on the issue is in progress label Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
actionable thirdparty Issue is caused by a third party bug or depends on a thirdparty update
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants