New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

可否设定对数字不做分词？ #150

Open

Jun90925 opened this issue May 20, 2024 · 1 comment

Jun90925 commented May 20, 2024

我需要在fts表里记录一个业务数据的id，如果这个id用UNINDEXED不做索引的话，那么在删除业务数据顺带删除索引时，因为id没有做索引，所以查询速度会很慢，从而有可能会阻塞数据库的写入。

删除是一个不可忽视的场景，其实使用频率并不低。

如果这个业务数据id是个纯数字（初步定是业务数据的rowid），那么在建表的时候把UNINDEXED去掉，让业务数据id也做索引，这样子查询速度就会很快了，虽然肯定比不是正常实体表的索引。

我看了分词器的参数，貌似是没有设定对数字不分词的选项，可否考虑增加？或者说有没有必要增加这个选项？

Owner

wangfenjin commented May 20, 2024

你说的这个问题跟分词器没关系，对于给定字段要不要索引是建表的时候定的，分词器只是把你需要分词的东西做分词。

https://www.wangfenjin.com/posts/simple-jieba-tokenizer/
你可以看看这个文章，讲了怎么组织数据结构

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment