Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore null-chars when using structTree-data in the viewer #16891

Merged

Conversation

Snuffleupagus
Copy link
Collaborator

Testing the tagged_stamp.pdf document locally in the viewer, I noticed that e.g. the /Alt entry for the StampAnnotation contains "Secondary text for stamp\u0000".
Elsewhere in the viewer we're skipping null-chars and it's easy enough to do that in the StructTreeLayerBuilder class as well. (Note that we generally let the API itself return the data as-is.)

@calixteman
Copy link
Contributor

I wonder if we should strip out null chars in the worker instead of doing it in the main thread.
I'd tend think that's better to not "block" (I quote here because I'm not sure it takes that much time in general) the main thread.

@Snuffleupagus
Copy link
Collaborator Author

I wonder if we should strip out null chars in the worker instead of doing it in the main thread.

In that case we really ought to change "everything" all at once, in my opinion, rather than doing it piecemeal.

@calixteman
Copy link
Contributor

What do you mean by "everything" ?

@Snuffleupagus
Copy link
Collaborator Author

What do you mean by "everything" ?

Looking at the removeNullCharacters call-sites in the viewer, it seems to me that there's other API methods that can also return strings with null-chars. Hence why I think that we should try and keep the API consistent.

Copy link
Contributor

@calixteman calixteman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you.

@Snuffleupagus Snuffleupagus force-pushed the structElement-removeNullCharacters branch from 677e304 to 83f5ca9 Compare August 31, 2023 14:15
Testing the `tagged_stamp.pdf` document locally in the viewer, I noticed that e.g. the /Alt entry for the StampAnnotation contains "Secondary text for stamp\u0000".
Elsewhere in the viewer we're skipping null-chars and it's easy enough to do that in the `StructTreeLayerBuilder` class as well. (Note that we generally let the API itself return the data as-is.)
@Snuffleupagus Snuffleupagus force-pushed the structElement-removeNullCharacters branch from 83f5ca9 to 284f32f Compare August 31, 2023 14:29
@Snuffleupagus
Copy link
Collaborator Author

Now with an integration-test added, to prevent this from regressing.

/botio integrationtest

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Received

Command cmd_integrationtest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.193.163.58:8877/3c67bfa4414faad/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_integrationtest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.241.84.105:8877/e1aaf33b22cf0fb/output.txt

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/e1aaf33b22cf0fb/output.txt

Total script time: 5.19 mins

  • Integration Tests: FAILED

@moz-tools-bot
Copy link
Collaborator

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/3c67bfa4414faad/output.txt

Total script time: 16.11 mins

  • Integration Tests: FAILED

@Snuffleupagus Snuffleupagus merged commit 9190445 into mozilla:master Aug 31, 2023
3 checks passed
@Snuffleupagus Snuffleupagus deleted the structElement-removeNullCharacters branch August 31, 2023 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants