Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial support for "Whole words" searching in the viewer #10028

Merged
merged 2 commits into from
Sep 10, 2018

Conversation

Snuffleupagus
Copy link
Collaborator

@Snuffleupagus Snuffleupagus commented Sep 1, 2018

As outlined in https://bugzilla.mozilla.org/show_bug.cgi?id=1282759 the internal Firefox name for the feature is entireWord, hence that name is used here as well for consistency (with "Whole words" being limited to the UI).

Given existing limitations of the PDF.js search functionality, e.g. the existing problems of searching across "new lines", there's some edge-cases where "Whole words" searching will ignore (valid) results.
However, considering that this is a pre-existing issue related to the way that the find controller joins text-content together, that shouldn't have to block this new feature in my opionion.

Please note: In order to enable this feature in the MOZCENTRAL version, a small follow-up patch for PdfjsChromeUtils.jsm will be required once this has landed in mozilla-central.

Edit: See also https://bugzilla.mozilla.org/show_bug.cgi?id=1282759

Copy link
Contributor

@timvandermeij timvandermeij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with the comment addressed and a rebase. If you would like to add it, the Dutch translation for the new string is:

find_entire_word_label=Hele woorden

',': CharacterType.PUNCT,
'.': CharacterType.PUNCT,
';': CharacterType.PUNCT,
':': CharacterType.PUNCT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add a few more tests here for the missing character types, namely:

  • HAN_LETTER
  • KATAKANA_LETTER
  • HIRAGANA_LETTER
  • HALFWIDTH_KATAKANA_LETTER
  • THAI_LETTER

I think we can simply find them with Wikipedia or perhaps the Mozilla test suite. It would be good to have test coverage for those too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add a few more tests here for the missing character types [...]

Of course; this was just me being lazy and not wanting to spend to much time on it until there was feedback on the overall approach :-)


If you would like to add it, the Dutch translation for the new string is:
find_entire_word_label=Hele woorden

Thanks, it's been added to the patch.

As outlined in https://bugzilla.mozilla.org/show_bug.cgi?id=1282759 the internal Firefox name for the feature is `entireWord`, hence that name is used here as well for consistency (with "Whole words" being limited to the UI).

Given existing limitations of the PDF.js search functionality, e.g. the existing problems of searching across "new lines", there's some edge-cases where "Whole words" searching will ignore (valid) results.
However, considering that this is a pre-existing issue related to the way that the find controller joins text-content together, that shouldn't have to block this new feature in my opionion.

*Please note:* In order to enable this feature in the `MOZCENTRAL` version, a small follow-up patch for [PdfjsChromeUtils.jsm](https://hg.mozilla.org/mozilla-central/file/tip/browser/extensions/pdfjs/content/PdfjsChromeUtils.jsm) will be required once this has landed in `mozilla-central`.
The property is intended to contain a reference to a DOM element, which not only is nowhere to be found *now* but appears to never have existed in the first place.
@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/0f5be8cc5759682/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/e8b0c5914d19193/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/0f5be8cc5759682/output.txt

Total script time: 5.46 mins

  • Unit Tests: Passed

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/e8b0c5914d19193/output.txt

Total script time: 8.31 mins

  • Unit Tests: Passed

@mozilla mozilla deleted a comment from pdfjsbot Sep 10, 2018
@mozilla mozilla deleted a comment from pdfjsbot Sep 10, 2018
@mozilla mozilla deleted a comment from pdfjsbot Sep 10, 2018
@mozilla mozilla deleted a comment from pdfjsbot Sep 10, 2018
@mozilla mozilla deleted a comment from pdfjsbot Sep 10, 2018
@mozilla mozilla deleted a comment from pdfjsbot Sep 10, 2018
@mozilla mozilla deleted a comment from pdfjsbot Sep 10, 2018
@mozilla mozilla deleted a comment from pdfjsbot Sep 10, 2018
@timvandermeij
Copy link
Contributor

/botio-windows preview

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/18ac274adac108d/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/18ac274adac108d/output.txt

Total script time: 4.67 mins

Published

@timvandermeij timvandermeij merged commit bc5111d into mozilla:master Sep 10, 2018
@timvandermeij
Copy link
Contributor

Looks good now. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants