Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ASCII searching of non-ASCII data #1144

Closed
phiser678 opened this issue Jun 18, 2019 · 2 comments
Closed

Add support for ASCII searching of non-ASCII data #1144

phiser678 opened this issue Jun 18, 2019 · 2 comments
Labels

Comments

@phiser678
Copy link

Hi,
We have names with accents in our publications and texts, but if a users wants to find this person or text through the search box, he/she has to type the exact name with accents, which is't always obvious to find.
Is it possible to sanitize the accents in the index search list?
There is something interesting in Hugo which you can use with the urlize function to sanitze and that is put removePathAccents = true
in the config/_default/config.toml. This way the taxonomy will skip the accents in the URLs!

@phiser678 phiser678 changed the title Is is possible to sanitize accents in the search? Is it possible to sanitize accents in the search? Jun 18, 2019
@gcushen
Copy link
Collaborator

gcushen commented Jun 18, 2019

Hugo's removePathAccents is just for URLs rather than text and it doesn't work very well (i.e. can remove non-ASCII characters rather than replacing them and can replace non-ASCII characters with HTML codes). Hence, non-ASCII (accents etc.) text would end up very broken - there are a lot of issues on it that unfortunately get closed in the Hugo GitHub, for example see gohugoio/hugo#3476 (comment) .

Back to your question, Academic search is based on Fuse which has a related issue open for this here: krisk/Fuse#133

In the meantime, one option is to override assets/js/academic-search.js, and set fuseOptions(see Fuse JS docs at https://fusejs.io) to be less strict on the matching criterea.

Alternatively, using a different search provider such as Algolia or Google would likely offer improved results, especially in this case - see Academic docs on Search: https://sourcethemes.com/academic/docs/search/ .

@gcushen gcushen closed this as completed Jun 18, 2019
@gcushen gcushen changed the title Is it possible to sanitize accents in the search? Add support for ASCII searching of non-ASCII data Jun 18, 2019
@phiser678
Copy link
Author

Thanks for your reply, but I'm afraid you closed it too quickly!
So I tried this fuseOptions in assets/js/academic-search.js and set threshold even to 0.9. It only finds unrelated content, but not the words with an accent! You still have to type the accent to find the word. In this case the word is with an é which should be found with typing an e.
The other solution with Algolia is kind of awkward when your content changes alot, also we use it for the publication list and it is quiet handy to see results fly by while typing.
Is there a way to include a sanitized version of the words in the index.json file? That should really solve the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants