I Can`t search for Chinese #11

ghost · 2022-09-14T06:58:21Z

I hope to be able to search for Chinese

emersonbottero · 2022-09-14T07:01:41Z

Do you have an repo with an example?

ghost · 2022-09-14T07:45:15Z

I can't search Chinese, for example "的".

emersonbottero · 2022-09-16T00:43:27Z

It works if you place a space after.. , not great... I'm gonna take a look.
I'll also should handle the frontmatter portion of the docs.

emersonbottero · 2022-09-17T00:06:20Z

@yyrc can you share your repo with me?
there seems to be other problems but I can't reproduce.

jonsam-ng · 2022-09-29T01:41:09Z

Will this be fixed?I notice that when I search English words in Chinese articles, the search result is not correct and Chinese words are not searched.

emersonbottero · 2022-09-29T01:49:17Z

I tried to clone your repo but then couldn't find it..
can you share the repo again?

with more data the better..

Charles7c · 2022-10-02T05:34:31Z

I tried to clone your repo but then couldn't find it.. can you share the repo again?

with more data the better..

You can clone my repo: https://github.com/Charles7c/charles7c.github.io.git

However, you need to enable vitepress-plugin-search in docs/vite.config.ts.

emersonbottero · 2022-10-05T01:50:25Z

Must add language support and make It avaiable in the plugin options https://github.com/MihaiValentin/lunr-languages

Charles7c · 2022-10-05T05:45:57Z

Must add language support and make It avaiable in the plugin options https://github.com/MihaiValentin/lunr-languages

var lunr = require('./lib/lunr.js');
require('./lunr.stemmer.support.js')(lunr);
require('./lunr.ru.js')(lunr);
require('./lunr.multi.js')(lunr);

var idx = lunr(function () {
  // the reason "en" does not appear above is that "en" is built in into lunr js
  this.use(lunr.multiLanguage('en', 'ru'));
  // then, the normal lunr index initialization
  // ...
});

How does the configuration take effect in the plug-in? @emersonbottero
Can you provide an example? thanks. :)

import { defineConfig } from 'vite'
import { SearchPlugin } from 'vitepress-plugin-search'

export default defineConfig({
  plugins: [
    SearchPlugin({
      //Add a wildcard at the end of the search
      wildcard: false,
      //The length of the result search preview item
      previewLength: 62,
    })
  ]
})

emersonbottero · 2022-10-05T12:10:57Z

I have to change my plugin based in my last comment

Charles7c · 2022-10-05T16:28:43Z

I have to change my plugin based in my last comment

Thanks a lot. 👍

ForeverSun · 2022-10-09T01:52:10Z

Expecting

emersonbottero · 2022-10-16T19:48:52Z

just for reference vitejs/vite#10486

emersonbottero · 2022-10-29T03:57:13Z

Just an updated..
The initial Idea of using the above link does not work, since I used lunr and those are for elasticLunr.

Due to the lack of maintenance in the lunr project I decide to switch the index library to flexsearch
I managed to create the library and it works great, but it fails on vitepress build to a problem in the library itself, see my comment there

Once this is fixed it should be possible to pass all index options in the library to the plugin.
the simplest way to set to chinese is specify here or just add the cjk deafult language.

but we can and should improve that with an actual chinese language!
You guys could help to add the chinese language to the flexsearch library..
for now it should be:

saving the default as chinese and add the stop word
if it does make sense add stemmer..
- stemmer is like considering drive, driving, driven to be use as alias in the search

li-zheng-hao · 2022-11-02T02:31:15Z

Is there an easy solution for now? to be honest I'm not familiar with any of the libraries mentioned above...

emersonbottero · 2022-11-02T23:24:10Z

just notice we can download the flexsearch files.
I'll try bundle it all together with my plugin.
if it works we would be able to configure as mention above.

emersonbottero · 2022-11-03T03:32:13Z

I did it.. 😁
please try adding the options as suggested above with flexsearch.

emersonbottero · 2022-11-03T21:38:31Z

Could Someone tell me if It works?
@yyrc @Charles7c @li-zheng-hao @jonsam-ng

Charles7c · 2022-11-04T02:47:38Z

Could Someone tell me if It works? @yyrc @Charles7c @li-zheng-hao @jonsam-ng

import { defineConfig } from 'vite'
import { SearchPlugin } from 'vitepress-plugin-search'

export default defineConfig({
  plugins: [
    SearchPlugin({
      lang: 'zh',
      encode: str => str.replace(/[\x00-\x7F]/g, "").split("")
    })
  ]
})

I upgraded the version to 1.0.4-alpha.15, and then looked at the link below, but didn't quite understand how to configure it, and finally it didn't work.

li-zheng-hao · 2022-11-04T03:08:50Z

i tried same config, it only works when search one word ,like this:

Could Someone tell me if It works? @yyrc @Charles7c @li-zheng-hao @jonsam-ng
import { defineConfig } from 'vite'
import { SearchPlugin } from 'vitepress-plugin-search'

export default defineConfig({
  plugins: [
    SearchPlugin({
      lang: 'zh',
      encode: str => str.replace(/[\x00-\x7F]/g, "").split("")
    })
  ]
})
I upgraded the version to 1.0.4-alpha.15, and then looked at the link below, but didn't quite understand how to configure it, and finally it didn't work.

https://github.com/nextapps-de/flexsearch#cjk-word-break-chinese-japanese-korean

Chinese and English at the same time? nextapps-de/flexsearch#207

有大佬知道搜索功能如何改为中文吗？ alex-shpak/hugo-book#327

emersonbottero · 2022-11-04T08:11:58Z

It should be

{
encode: str => str.replace(/[\x00-\x7F]/g, "").split("")
}

And you Will only find whole words..
Sto for example Will have 0 found... But stop should work. Try that plz.

emersonbottero · 2022-11-04T08:13:01Z

To search for partials there should be another setting options
tokenize: "full"

li-zheng-hao · 2022-11-04T08:15:53Z

It should be

{ encode: str => str.replace(/[\x00-\x7F]/g, "").split("") }

And you Will only find whole words.. Sto for example Will have 0 found... But stop should work. Try that plz.

no.. it not work for me...

my config:

import { SearchPlugin } from "vitepress-plugin-search";
import { defineConfig } from "vite";

export default defineConfig({
  plugins: [SearchPlugin({
    encode: str => str.replace(/[\x00-\x7F]/g, "").split("")
  })],
});

emersonbottero · 2022-11-04T08:20:27Z

Try both settings toguether.

import { SearchPlugin } from "vitepress-plugin-search";
import { defineConfig } from "vite";

export default defineConfig({
  plugins: [SearchPlugin({
    encode: str => str.replace(/[\x00-\x7F]/g, "").split(""),
    tokenize: "full"
  })],
});

li-zheng-hao · 2022-11-04T08:22:13Z

still not work...

export default defineConfig({
  plugins: [SearchPlugin({
    encode: str => str.replace(/[\x00-\x7F]/g, "").split(""),
    tokenize: "full"
  })],
});

emersonbottero · 2022-11-04T20:46:48Z

I'll take a look..
If I can't managed I'll ask some of the vitepress dev that know chinese to help me.

emersonbottero · 2022-11-04T23:06:33Z

plz, try

 SearchPlugin({
      encode: false,
      tokenize: function (str) {
        return str.replace(/[\x00-\x7F]/g, "").split("");
      },
      filter:
        "的 一 不 在 人 有 是 为 以 于 上 他 而 后 之 来 及 了 因 下 可 到 由 这 与 也 此 但 并 个 其 已 无 小 我 们 起 最 再 今 去 好 只 又 或 很 亦 某 把 那 你 乃 它 吧 被 比 别 趁 当 从 到 得 打 凡 儿 尔 该 各 给 跟 和 何 还 即 几 既 看 据 距 靠 啦 了 另 么 每 们 嘛 拿 哪 那 您 凭 且 却 让 仍 啥 如 若 使 谁 虽 随 同 所 她 哇 嗡 往 哪 些 向 沿 哟 用 于 咱 则 怎 曾 至 致 着 诸 自".split(
          " "
        ),
    }),

if the filter does not make sense you can remove

li-zheng-hao · 2022-11-05T00:48:58Z

export default defineConfig({
  plugins: [SearchPlugin({
    encode: false,
    tokenize: function (str) {
      return str.replace(/[\x00-\x7F]/g, "").split("");
    },
    // filter:
    //   "的 一 不 在 人 有 是 为 以 于 上 他 而 后 之 来 及 了 因 下 可 到 由 这 与 也 此 但 并 个 其 已 无 小 我 们 起 最 再 今 去 好 只 又 或 很 亦 某 把 那 你 乃 它 吧 被 比 别 趁 当 从 到 得 打 凡 儿 尔 该 各 给 跟 和 何 还 即 几 既 看 据 距 靠 啦 了 另 么 每 们 嘛 拿 哪 那 您 凭 且 却 让 仍 啥 如 若 使 谁 虽 随 同 所 她 哇 嗡 往 哪 些 向 沿 哟 用 于 咱 则 怎 曾 至 致 着 诸 自".split(
    //     " "
    //   ),
  })],
});

i tried this ,filter is not work , and i can only search the whole words , one word is not work, like this:

Charles7c · 2022-11-05T02:42:48Z

你把 模 加进去应该就好了。[狗头]

li-zheng-hao · 2022-11-05T02:44:06Z

你把 模 加进去应该就好了。[狗头]

总不能写了啥我还得手动加一下吧哈哈哈

emersonbottero · 2022-11-05T20:25:47Z

all that this does
return str.replace(/[\x00-\x7F]/g, "").split("");
is remove non chinese caractheres..

emersonbottero · 2022-11-05T20:29:36Z

tokenize: "full" should return a lot of results.
@li-zheng-hao could you try only with that setting?

Is really hard for me to debug because I don't know chinese.
@Charles7c
could you list the term you are search and what return is expected?
don't past only images..
I need the to be able to copy the words to test.. 😁

li-zheng-hao · 2022-11-06T02:18:39Z

wow! it works!!!!

export default defineConfig({
  plugins: [SearchPlugin({
    encode: false,
    tokenize: "full"
  })],
});

enter 单、单元、单元测试 will list 单元测试 (means unit test 😁)

ForeverSun · 2022-11-06T02:28:30Z

nb

…

---Original--- From: ***@***.***> Date: Sun, Nov 6, 2022 10:18 AM To: ***@***.***>; Cc: ***@***.******@***.***>; Subject: Re: [emersonbottero/vitepress-plugin-search] I Can`t search forChinese (Issue #11) wow! it works!!!! export default defineConfig({ plugins: [SearchPlugin({ encode: false, tokenize: "full" })], }); enter 单、单元、单元测试 will list 单元测试 (means unit test 😁) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

Charles7c · 2022-11-06T11:30:08Z

tokenize: "full" should return a lot of results. @li-zheng-hao could you try only with that setting?

Is really hard for me to debug because I don't know chinese. @Charles7c could you list the term you are search and what return is expected? don't past only images.. I need the to be able to copy the words to test.. 😁

thanks a lot. @emersonbottero , as configured for the @li-zheng-hao test, it worked. 😁

// vite.config.ts
import { defineConfig } from 'vite'
import { SearchPlugin } from 'vitepress-plugin-search'

export default defineConfig({
  plugins: [
    SearchPlugin({
      encode: false,
      tokenize: 'full'
    })
  ]
})

emersonbottero · 2022-11-06T14:14:09Z

Uhuuuu 🎉

arcqiufeng · 2022-11-19T04:20:45Z

tokenize: "full" should return a lot of results. @li-zheng-hao could you try only with that setting?
Is really hard for me to debug because I don't know chinese. @Charles7c could you list the term you are search and what return is expected? don't past only images.. I need the to be able to copy the words to test.. 😁

thanks a lot. @emersonbottero , as configured for the @li-zheng-hao test, it worked. 😁
// vite.config.ts
import { defineConfig } from 'vite'
import { SearchPlugin } from 'vitepress-plugin-search'

export default defineConfig({
  plugins: [
    SearchPlugin({
      encode: false,
      tokenize: 'full'
    })
  ]
})

It works for me now. Thank you @emersonbottero @Charles7c @li-zheng-hao

奇怪，不知道为什么一开始没有搜到这个贴子。感谢。

arcqiufeng · 2022-11-21T13:44:10Z

tokenize: 'full'
Then the index file is really huge. Now I have a size of 80M+ index file. (82.2 MB virtual_search-data.d06d4ff8.js)

emersonbottero · 2022-11-24T08:58:52Z

You can try forward.
It should reduce a Lot the size If you reais chinese from left to right

arcqiufeng · 2022-11-24T09:30:00Z

You can try forward. It should reduce a Lot the size If you reais Chinese from left to right

Thanks for your reply.

If I change tokenize to "forward". That will reduce the count of results. I can only find the results that the search word located on the start of the whole sentence.

I think it is because CJK language words are not divided by space but by semantics.

arcqiufeng · 2022-11-27T01:51:36Z

Finally, I think I got the solution.

I found a word splitter for Chinese text: https://github.com/leizongmin/node-segment

I installed it:
yarn add segment -D

however, I have to split the key words by space manually in searchbox that in the nav bar. Else I will get nothing if the two words in searchbox is not separated by space. (Can it be auto?)

Now the size of index file is reduced to 1,662Kb

83M+ -> 1.6M. Really great progress.

If I change the tokenizer to "full". It will be about 2,581Kb.

// docs/vite.config.ts

import { SearchPlugin } from "vitepress-plugin-search";
import { defineConfig } from "vite";

// 分词器来源
// https://wenjiangs.com/article/segment.html
// https://github.com/leizongmin/node-segment
// 安装：
// yarn add segment -D
// 以下为样例

// 载入模块
var Segment = require('segment');
// 创建实例
var segment = new Segment();
// 使用默认的识别模块及字典，载入字典文件需要1秒，仅初始化时执行一次即可
segment.useDefault();
// 开始分词
// console.log(segment.doSegment('这是一个基于Node.js的中文分词模块。'));

var options = {

  // 采用分词器优化，
  encode: function (str) {
    return segment.doSegment(str, {simple: true});
  },
  tokenize: "forward", // 解决汉字搜索问题。来源：https://github.com/emersonbottero/vitepress-plugin-search/issues/11

  // 以下代码返回完美的结果，但内存与空间消耗巨大，索引文件达到80M+
  // encode: false,
  // tokenize: "full",

};

export default defineConfig({
  plugins: [SearchPlugin(options)],
});

beierzhijin · 2023-03-08T03:33:02Z

Finally, I think I got the solution.

I found a word splitter for Chinese text: https://github.com/leizongmin/node-segment

I installed it: yarn add segment -D

however, I have to split the key words by space manually in searchbox that in the nav bar. Else I will get nothing if the two words in searchbox is not separated by space. (Can it be auto?)

Now the size of index file is reduced to 1,662Kb

83M+ -> 1.6M. Really great progress.

If I change the tokenizer to "full". It will be about 2,581Kb.

// docs/vite.config.ts

import { SearchPlugin } from "vitepress-plugin-search";
import { defineConfig } from "vite";

// 分词器来源
// https://wenjiangs.com/article/segment.html
// https://github.com/leizongmin/node-segment
// 安装：
// yarn add segment -D
// 以下为样例

// 载入模块
var Segment = require('segment');
// 创建实例
var segment = new Segment();
// 使用默认的识别模块及字典，载入字典文件需要1秒，仅初始化时执行一次即可
segment.useDefault();
// 开始分词
// console.log(segment.doSegment('这是一个基于Node.js的中文分词模块。'));

var options = {

  // 采用分词器优化，
  encode: function (str) {
    return segment.doSegment(str, {simple: true});
  },
  tokenize: "forward", // 解决汉字搜索问题。来源：https://github.com/emersonbottero/vitepress-plugin-search/issues/11

  // 以下代码返回完美的结果，但内存与空间消耗巨大，索引文件达到80M+
  // encode: false,
  // tokenize: "full",

};

export default defineConfig({
  plugins: [SearchPlugin(options)],
});

当vitepress里存在base设置时，就是这个 /developer-guide/，https://beierzhijin.github.io/developer-guide/ ,

部署到github page，搜索后回车，base会丢失，导致404，本地跑没有这种情况

zkrisj · 2023-04-25T07:01:31Z

Finally, I think I got the solution.

I found a word splitter for Chinese text: https://github.com/leizongmin/node-segment

I installed it: yarn add segment -D

however, I have to split the key words by space manually in searchbox that in the nav bar. Else I will get nothing if the two words in searchbox is not separated by space. (Can it be auto?)

Now the size of index file is reduced to 1,662Kb

83M+ -> 1.6M. Really great progress.

If I change the tokenizer to "full". It will be about 2,581Kb.

// docs/vite.config.ts

import { SearchPlugin } from "vitepress-plugin-search";
import { defineConfig } from "vite";

// 分词器来源
// https://wenjiangs.com/article/segment.html
// https://github.com/leizongmin/node-segment
// 安装：
// yarn add segment -D
// 以下为样例

// 载入模块
var Segment = require('segment');
// 创建实例
var segment = new Segment();
// 使用默认的识别模块及字典，载入字典文件需要1秒，仅初始化时执行一次即可
segment.useDefault();
// 开始分词
// console.log(segment.doSegment('这是一个基于Node.js的中文分词模块。'));

var options = {

  // 采用分词器优化，
  encode: function (str) {
    return segment.doSegment(str, {simple: true});
  },
  tokenize: "forward", // 解决汉字搜索问题。来源：https://github.com/emersonbottero/vitepress-plugin-search/issues/11

  // 以下代码返回完美的结果，但内存与空间消耗巨大，索引文件达到80M+
  // encode: false,
  // tokenize: "full",

};

export default defineConfig({
  plugins: [SearchPlugin(options)],
});

请问，只能搜索文章中的包含的标题名称，而不能搜索文章名称吗？
Can only search for the title names included in the article, not the article name

emersonbottero self-assigned this Sep 14, 2022

emersonbottero added the enhancement New feature or request label Sep 14, 2022

emersonbottero added the bug Something isn't working label Sep 16, 2022

emersonbottero closed this as completed Nov 6, 2022

emersonbottero reopened this Nov 6, 2022

emersonbottero closed this as completed Nov 6, 2022

Whbbit1999 mentioned this issue Nov 18, 2022

Search CJK text #24

Closed

arcqiufeng mentioned this issue Nov 19, 2022

Please demo the usage of options #25

Closed

emersonbottero mentioned this issue Dec 6, 2022

Will the plugin support The lang attribute for the search box ? #27

Closed

emersonbottero mentioned this issue Mar 13, 2023

How can I apply the word split function to the CJK text in search box? #65

Open

jiazengp mentioned this issue Apr 14, 2023

[Bug report] Search unavailable kongying-tavern/docs#165

Closed

I Can`t search for Chinese #11

I Can`t search for Chinese #11

Comments

ghost commented Sep 14, 2022

emersonbottero commented Sep 14, 2022

ghost commented Sep 14, 2022

emersonbottero commented Sep 16, 2022

emersonbottero commented Sep 17, 2022

jonsam-ng commented Sep 29, 2022

emersonbottero commented Sep 29, 2022

Charles7c commented Oct 2, 2022

emersonbottero commented Oct 5, 2022

Charles7c commented Oct 5, 2022 • edited Loading

emersonbottero commented Oct 5, 2022

Charles7c commented Oct 5, 2022

ForeverSun commented Oct 9, 2022

emersonbottero commented Oct 16, 2022

emersonbottero commented Oct 29, 2022

li-zheng-hao commented Nov 2, 2022

emersonbottero commented Nov 2, 2022

emersonbottero commented Nov 3, 2022

emersonbottero commented Nov 3, 2022 • edited Loading

Charles7c commented Nov 4, 2022

li-zheng-hao commented Nov 4, 2022

emersonbottero commented Nov 4, 2022

emersonbottero commented Nov 4, 2022 • edited Loading

li-zheng-hao commented Nov 4, 2022

emersonbottero commented Nov 4, 2022

li-zheng-hao commented Nov 4, 2022

emersonbottero commented Nov 4, 2022

emersonbottero commented Nov 4, 2022

li-zheng-hao commented Nov 5, 2022

Charles7c commented Nov 5, 2022

li-zheng-hao commented Nov 5, 2022

emersonbottero commented Nov 5, 2022

emersonbottero commented Nov 5, 2022 • edited Loading

li-zheng-hao commented Nov 6, 2022

ForeverSun commented Nov 6, 2022 via email

Charles7c commented Nov 6, 2022 • edited Loading

emersonbottero commented Nov 6, 2022

arcqiufeng commented Nov 19, 2022

arcqiufeng commented Nov 21, 2022 • edited Loading

emersonbottero commented Nov 24, 2022

arcqiufeng commented Nov 24, 2022 • edited Loading

arcqiufeng commented Nov 27, 2022 • edited Loading

beierzhijin commented Mar 8, 2023 • edited Loading

zkrisj commented Apr 25, 2023

Charles7c commented Oct 5, 2022 •

edited

Loading

emersonbottero commented Nov 3, 2022 •

edited

Loading

emersonbottero commented Nov 4, 2022 •

edited

Loading

emersonbottero commented Nov 5, 2022 •

edited

Loading

Charles7c commented Nov 6, 2022 •

edited

Loading

arcqiufeng commented Nov 21, 2022 •

edited

Loading

arcqiufeng commented Nov 24, 2022 •

edited

Loading

arcqiufeng commented Nov 27, 2022 •

edited

Loading

beierzhijin commented Mar 8, 2023 •

edited

Loading