Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How well does uFuzzy support CJK, stopwords, stemmers? #68

Closed
Jieiku opened this issue Jun 28, 2024 · 2 comments
Closed

How well does uFuzzy support CJK, stopwords, stemmers? #68

Jieiku opened this issue Jun 28, 2024 · 2 comments

Comments

@Jieiku
Copy link

Jieiku commented Jun 28, 2024

Recently Zola static site generator got the ability to output the search index into a json format that is compatible with Fuse.js because it is a json format I was thinking it would likely also be compatible with uFuzzy.

getzola/zola#2507

https://www.getzola.org/documentation/content/search/#fuse

There is a discussion about adding additional searches to Zola here:
getzola/zola#1849

I am planning to try out any search libraries that look promising that will accept a json based index as input, but ones that support CJK, stopwords, stemmers are a plus!

Currently in the Abridge theme for Zola I support elasticlunr as the default and it handles other languages by loading additional js files for those languages as needed, you can see them all in this directory starting with lunr.languagecode:

https://github.com/Jieiku/abridge/tree/master/static/js

@leeoniya
Copy link
Owner

uFuzzy is a clever regexp compiler, not a fulltext search engine. it does not do any kind of processing of the haystack or needle, so any kind of stopword removal and stemming have to be done outside of uFuzzy.

uFuzzy supports CJK by using unicode regexps and supports diacritics by providing a util function to strip them (uFuzzy.latinize()).

@Jieiku
Copy link
Author

Jieiku commented Jun 28, 2024

Thank you, that very clearly explains what I was wanting to know! Nevertheless your benchmarks and readme page are still very useful and I may just find a use for ufuzzy someplace else in the future, Thanks!

@Jieiku Jieiku closed this as completed Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants