Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multibyte decoding in content_disposition.js #9773

Merged

Conversation

Rob--W
Copy link
Member

@Rob--W Rob--W commented Jun 3, 2018

I made some mistakes when trying to make the content_disposition.js compatible with non-modern browsers (IE/Edge). As a result text decoding was mostly broken (the original at https://github.com/Rob--W/open-in-browser/blob/55c71eb44e0ad71a3bb443457666fd48421612ac/extension/content-disposition.js works as expected).

I fixed and verified that it works as expected (see the commit description).
The tests in test-content-disposition.js contain multiple UTF-8 decoding tests, but just to make sure that #9616 is really fixed, I inserted the following too:

console.log('Test from https://github.com/mozilla/pdf.js/issues/9616');
check("attachment; filename=\"test-a-test.pdf\"; filename*=UTF-8''test-%C3%A4-%D1%82%D0%B5%D1%81%D1%82.pdf",
      "test-ä-тест.pdf");

Fixes #9616

I made some mistakes when trying to make the content_disposition.js
compatible with non-modern browsers (IE/Edge).
Notably, text decoding was usually skipped because of the inverted
logical check at the top of `textdecode`.

I verified that this new version works as expected, as follows:

1. Visit https://github.com/Rob--W/open-in-browser/tree/55c71eb44e0ad71a3bb443457666fd48421612ac/test/
   and get  test-content-disposition.js
   also get test-content-disposition.node.js if using Node.js,
     or get test-content-disposition.html if you use a browser.
2. Modify `test-content-disposition.node.js` (or the HTML file) and
   change `../extension/content-disposition.js` to `PDFJS-content_disposition.js`
3. Copy the `getFilenameFromContentDispositionHeader` function from
   `content_disposition.js` (i.e. the file without the trailing exports)
   and save it as `PDFJS-content_disposition.js`.
4. Run the tests (`node test-content-disposition.node.js` or by opening
   `test-content-disposition.html` in a browser).
5. Confirm that there are no failures: "Finished all tests (0 failures)"

The code has a best-efforts fallback for Microsoft Edge, which lacks the
TextDecoder API. The fallback only supports the common UTF-8 encoding.
To simulate this in a test, modify `PDFJS-content_disposition.js` and
deliberately throw an error before `new TextDecoder`. There will be two
failures because we don't want to include too much code to support text
decoding for non-UTF-8 encodings in Edge

```
test-content-disposition.js:265 Assertion failed: Input: attachment; filename*=ISO-8859-1''%c3%a4
Expected: "ä"
Actual  : "ä"
test-content-disposition.js:268 Assertion failed: Input: attachment; filename*=ISO-8859-1''%e2%82%ac
Expected: "€"
Actual  : "€"
```
@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Jun 3, 2018

From: Bot.io (Linux m4)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/d743fc2234bfc68/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Jun 3, 2018

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/666ffa940f71225/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Jun 3, 2018

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/d743fc2234bfc68/output.txt

Total script time: 18.87 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Jun 3, 2018

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/666ffa940f71225/output.txt

Total script time: 24.34 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/666ffa940f71225/reftest-analyzer.html#web=eq.log

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

pdfjsbot commented Jun 3, 2018

From: Bot.io (Linux m4)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/318b452eb7bc2b9/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Jun 3, 2018

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/318b452eb7bc2b9/output.txt

Total script time: 2.98 mins

Published

@timvandermeij timvandermeij merged commit 3859c63 into mozilla:master Jun 3, 2018
@timvandermeij
Copy link
Contributor

Looks good to me. Thank you for fixing this!

movsb pushed a commit to movsb/pdf.js that referenced this pull request Jul 14, 2018
…ecode-utf8

Fix multibyte decoding in content_disposition.js
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Chrome extension fails to render properly attachment file name in encoded utf-8
3 participants