Add new option: search_word_boundary #2898

Mikk3lRo · 2017-10-02T23:14:16Z

Summary

New attempt at #2896 in a single commit.

The default / current word boundary regex ^|\\b|\\s is remarkably bad at actually finding word boundaries in languages that use non-unicode characters. A word boundary is basically detected after any non-ascii character (fx. ü, å, ø and æ to mention just a few - but there are MANY).

I've looked into possibilities, and unfortunately there doesn't seem to be any way to get decent word-boundary detection for anything except ascii in javascripts RegExp implementation... without either using a third-party library or including some 4k+ characters in the string.

Therefore, I don't see any way to reliably detect word boundaries with any pre-set, hardcoded regex.

Turning it into an option means that people can at least set something appropriate for their individual language and / or use case if they care about word boundaries being detected in "weird" places.

Please double-check that:

All changes were made in CoffeeScript files, not JavaScript files.
You used Grunt to build the JavaScript files and tested them locally.
You've updated both the jQuery and Prototype versions.
You haven't manually updated the version number in package.json.
If necessary, you've updated the documentation.

References

First partial PR from: #2894

Should solve this issue: #2862

Add new option: search_word_boundary

5aec6e9

tjschuck requested a review from satchmorun October 10, 2017 19:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new option: search_word_boundary #2898

Add new option: search_word_boundary #2898

Mikk3lRo commented Oct 2, 2017 •

edited

Loading

Add new option: search_word_boundary #2898

Are you sure you want to change the base?

Add new option: search_word_boundary #2898

Conversation

Mikk3lRo commented Oct 2, 2017 • edited Loading

Summary

References

Mikk3lRo commented Oct 2, 2017 •

edited

Loading