Remove null chars, i.e. `\u0000`, when getting all text (PR 16286 follow-up) #16297

Snuffleupagus · 2023-04-16T06:44:03Z

I was playing with the new "copy all text" feature, and stumbled upon one document where the copied text was truncated; see http://mirrors.ctan.org/info/lshort/english/lshort.pdf

The problem turns out to be that on page 83 the textLayer contains \u0000 and apparently copying just stops when a null char is encountered. To fix this we can simply use an existing helper function, and with this patch we're able to successfully copy all the text in that document.

…low-up) I was playing with the new "copy all text" feature, and stumbled upon one document where the copied text was truncated; see http://mirrors.ctan.org/info/lshort/english/lshort.pdf The problem turns out to be that on [page 83](https://ftp.acc.umu.se/mirror/CTAN/info/lshort/english/lshort.pdf#page=83) the textLayer contains `\u0000` and apparently copying just stops when a null char is encountered. To fix this we can simply use an existing helper function, and with this patch we're able to successfully copy all the text in that document.

Snuffleupagus · 2023-04-16T06:44:54Z

/botio integrationtest

pdfjsbot · 2023-04-16T06:44:56Z

From: Bot.io (Linux m4)

Received

Command cmd_integrationtest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/7d99ee6bcff35e6/output.txt

pdfjsbot · 2023-04-16T06:44:56Z

From: Bot.io (Windows)

Received

Command cmd_integrationtest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/1cbee12ab7d0805/output.txt

pdfjsbot · 2023-04-16T06:48:59Z

From: Bot.io (Linux m4)

Success

Full output at http://54.241.84.105:8877/7d99ee6bcff35e6/output.txt

Total script time: 4.02 mins

Integration Tests: Passed

pdfjsbot · 2023-04-16T06:59:07Z

From: Bot.io (Windows)

Failed

Full output at http://54.193.163.58:8877/1cbee12ab7d0805/output.txt

Total script time: 14.15 mins

Integration Tests: FAILED

calixteman

LGTM.

Snuffleupagus added the text-selection label Apr 16, 2023

Snuffleupagus requested a review from calixteman April 16, 2023 06:44

calixteman approved these changes Apr 16, 2023

View reviewed changes

Snuffleupagus merged commit 5f7e43a into mozilla:master Apr 16, 2023

Snuffleupagus deleted the copy-all-null-chars branch April 16, 2023 10:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove null chars, i.e. `\u0000`, when getting all text (PR 16286 follow-up) #16297

Remove null chars, i.e. `\u0000`, when getting all text (PR 16286 follow-up) #16297

Snuffleupagus commented Apr 16, 2023

Snuffleupagus commented Apr 16, 2023

pdfjsbot commented Apr 16, 2023

pdfjsbot commented Apr 16, 2023

pdfjsbot commented Apr 16, 2023

pdfjsbot commented Apr 16, 2023

calixteman left a comment

Remove null chars, i.e. \u0000, when getting all text (PR 16286 follow-up) #16297

Remove null chars, i.e. \u0000, when getting all text (PR 16286 follow-up) #16297

Conversation

Snuffleupagus commented Apr 16, 2023

Snuffleupagus commented Apr 16, 2023

pdfjsbot commented Apr 16, 2023

From: Bot.io (Linux m4)

Received

pdfjsbot commented Apr 16, 2023

From: Bot.io (Windows)

Received

pdfjsbot commented Apr 16, 2023

From: Bot.io (Linux m4)

Success

pdfjsbot commented Apr 16, 2023

From: Bot.io (Windows)

Failed

calixteman left a comment

Choose a reason for hiding this comment

Remove null chars, i.e. `\u0000`, when getting all text (PR 16286 follow-up) #16297

Remove null chars, i.e. `\u0000`, when getting all text (PR 16286 follow-up) #16297