forked from ikawaha/kagome
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chore: revamp sample dir as _example
Refactor directory structure of examples. - related issue ikawaha#200 ikawaha#299 (comment)
- Loading branch information
Showing
21 changed files
with
196 additions
and
55 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# Full-text search with Kagome and SQLite3 | ||
|
||
This example provides a practical example of how to work with Japanese text data and **perform efficient [full-text search](https://en.wikipedia.org/wiki/Full-text_search) using Kagome and SQLite3**. | ||
|
||
- Target text data is as follows: | ||
|
||
```text | ||
人魚は、南の方の海にばかり棲んでいるのではありません。 | ||
北の海にも棲んでいたのであります。 | ||
北方の海の色は、青うございました。 | ||
ある時、岩の上に、女の人魚があがって、 | ||
あたりの景色を眺めながら休んでいました。 | ||
小川未明 『赤い蝋燭と人魚』 | ||
``` | ||
|
||
- Example output: | ||
|
||
```shellsession | ||
$ cd /path/to/kagome/_examples/db_search | ||
$ go run . | ||
Searching for: 人魚 | ||
Found content: 人魚は、南の方の海にばかり棲んでいるのではありません。 at line: 1 | ||
Found content: ある時、岩の上に、女の人魚があがって、 at line: 4 | ||
Found content: 小川未明 『赤い蝋燭と人魚』 at line: 6 | ||
Searching for: 人 | ||
No results found | ||
Searching for: 北方 | ||
Found content: 北方の海の色は、青うございました。 at line: 3 | ||
Searching for: 北 | ||
Found content: 北の海にも棲んでいたのであります。 at line: 2 | ||
``` | ||
|
||
- [View main.go](main.go) | ||
|
||
## Details | ||
|
||
In this example, each line of text is inserted into a row of the SQLite3 database, and then the database is searched for the word "人魚" and "人". | ||
|
||
Note that the string tokenized by Kagome, a.k.a. "Wakati", is recorded in a separate table for [FTS4](https://www.sqlite.org/fts3.html) (Full-Text-Search) at the same time as the original text. | ||
|
||
This allows Unicode text data that is not separated by spaces, such as Japanese, to be searched by FTS. | ||
|
||
### Aim of this example | ||
|
||
This example can be useful in scenarios where you need to perform full-text searches on Japanese text. | ||
|
||
It demonstrates how to tokenize Japanese text using Kagome, which is a common requirement when working with text data in the Japanese language. | ||
|
||
By using SQLite with FTS4, it efficiently manages and searches through a large amount of text data, making it suitable for applications like: | ||
|
||
1. **Search Engines:** You can use this code as a basis for building a search engine that indexes and searches Japanese text content. | ||
2. **Document Management Systems:** This code can be integrated into a document management system to enable full-text search capabilities for Japanese documents. | ||
3. **Content Recommendation Systems:** When you have a large collection of Japanese content, you can use this code to implement content recommendation systems based on user queries. | ||
4. **Chatbots and NLP:** If you're building chatbots or natural language processing (NLP) systems for Japanese language, this code can assist in text analysis and search within the chatbot's knowledge base. | ||
|
||
## Acknowledgements | ||
|
||
This example is taken in part from the following book for reference. | ||
|
||
- p.204, 9.2 "データーベース登録プログラム", "Go言語プログラミングエッセンス エンジニア選書" | ||
- Written by: [Mattn](https://github.com/mattn) | ||
- Published: 2023/3/9 (技術評論社) | ||
- ISBN: 4297134195 / 978-4297134198 | ||
- ASIN: B0BVZCJQ4F / [https://amazon.co.jp/dp/4297134195](https://amazon.co.jp/dp/4297134195) | ||
- Original sample code: [https://github.com/mattn/aozora-search](https://github.com/mattn/aozora-search) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
module kagome/examples | ||
module kagome/examples/db_search | ||
|
||
go 1.19 | ||
|
||
|
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
module kagome/examples/tokenize | ||
|
||
go 1.19 | ||
|
||
require ( | ||
github.com/ikawaha/kagome-dict/ipa v1.0.10 | ||
github.com/ikawaha/kagome/v2 v2.9.3 | ||
) | ||
|
||
require github.com/ikawaha/kagome-dict v1.0.9 // indirect | ||
|
||
replace github.com/ikawaha/kagome/v2 => ../../ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
github.com/ikawaha/kagome-dict v1.0.9 h1:1Gg735LbBYsdFu13fdTvW6eVt0qIf5+S2qXGJtlG8C0= | ||
github.com/ikawaha/kagome-dict v1.0.9/go.mod h1:mn9itZLkFb6Ixko7q8eZmUabHbg3i9EYewnhOtvd2RM= | ||
github.com/ikawaha/kagome-dict/ipa v1.0.10 h1:wk9I21yg+fKdL6HJB9WgGiyXIiu1VttumJwmIRwn0g8= | ||
github.com/ikawaha/kagome-dict/ipa v1.0.10/go.mod h1:rbaOKrF58zhtpV2+2sVZBj0sUSp9dVKPjr660MehJbs= |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
module kagome/examples/user_dict | ||
|
||
go 1.19 | ||
|
||
require ( | ||
github.com/ikawaha/kagome-dict v1.0.9 | ||
github.com/ikawaha/kagome-dict/ipa v1.0.10 | ||
github.com/ikawaha/kagome/v2 v2.9.3 | ||
) | ||
|
||
replace github.com/ikawaha/kagome/v2 => ../../ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
github.com/ikawaha/kagome-dict v1.0.9 h1:1Gg735LbBYsdFu13fdTvW6eVt0qIf5+S2qXGJtlG8C0= | ||
github.com/ikawaha/kagome-dict v1.0.9/go.mod h1:mn9itZLkFb6Ixko7q8eZmUabHbg3i9EYewnhOtvd2RM= | ||
github.com/ikawaha/kagome-dict/ipa v1.0.10 h1:wk9I21yg+fKdL6HJB9WgGiyXIiu1VttumJwmIRwn0g8= | ||
github.com/ikawaha/kagome-dict/ipa v1.0.10/go.mod h1:rbaOKrF58zhtpV2+2sVZBj0sUSp9dVKPjr660MehJbs= |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
package main | ||
|
||
import ( | ||
"fmt" | ||
|
||
"github.com/ikawaha/kagome-dict/dict" | ||
"github.com/ikawaha/kagome-dict/ipa" | ||
"github.com/ikawaha/kagome/v2/tokenizer" | ||
) | ||
|
||
func main() { | ||
// Use IPA dictionary as a system dictionary. | ||
sysDic := ipa.Dict() | ||
|
||
// Build a user dictionary from a file. | ||
userDic, err := dict.NewUserDict("userdict.txt") | ||
if err != nil { | ||
panic(err) | ||
} | ||
|
||
// Specify the user dictionary as an option. | ||
t, err := tokenizer.New(sysDic, tokenizer.UserDict(userDic), tokenizer.OmitBosEos()) | ||
if err != nil { | ||
panic(err) | ||
} | ||
|
||
tokens := t.Analyze("関西国際空港限定トートバッグ", tokenizer.Search) | ||
for _, token := range tokens { | ||
fmt.Printf("%s\t%v\n", token.Surface, token.Features()) | ||
} | ||
|
||
// Output: | ||
// 関西国際空港 [テスト名詞 関西/国際/空港 カンサイ/コクサイ/クウコウ] | ||
// 限定 [名詞 サ変接続 * * * * 限定 ゲンテイ ゲンテイ] | ||
// トートバッグ [名詞 一般 * * * * *] | ||
} |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
module kagome/examples/wakati | ||
|
||
go 1.19 | ||
|
||
require ( | ||
github.com/ikawaha/kagome-dict/ipa v1.0.10 | ||
github.com/ikawaha/kagome/v2 v2.9.3 | ||
) | ||
|
||
require github.com/ikawaha/kagome-dict v1.0.9 // indirect | ||
|
||
replace github.com/ikawaha/kagome/v2 => ../../ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
github.com/ikawaha/kagome-dict v1.0.9 h1:1Gg735LbBYsdFu13fdTvW6eVt0qIf5+S2qXGJtlG8C0= | ||
github.com/ikawaha/kagome-dict v1.0.9/go.mod h1:mn9itZLkFb6Ixko7q8eZmUabHbg3i9EYewnhOtvd2RM= | ||
github.com/ikawaha/kagome-dict/ipa v1.0.10 h1:wk9I21yg+fKdL6HJB9WgGiyXIiu1VttumJwmIRwn0g8= | ||
github.com/ikawaha/kagome-dict/ipa v1.0.10/go.mod h1:rbaOKrF58zhtpV2+2sVZBj0sUSp9dVKPjr660MehJbs= |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# WebAssembly Example of Kagome | ||
|
||
- Build | ||
|
||
```sh | ||
GOOS=js GOARCH=wasm go build -o kagome.wasm main.go | ||
``` | ||
|
||
```shellsession | ||
├── docs ... gh-pages | ||
│ ├── index.html | ||
│ ├── kagome.wasm | ||
│ └── wasm_exec.js | ||
├── _examples | ||
│ └── wasm | ||
│ ├── README.md ... this document | ||
│ ├── kagome.html ... html sample | ||
│ ├── main.go ... source code | ||
│ ├── go.mod | ||
│ └── go.sum | ||
``` | ||
|
||
- Online demo: [https://ikawaha.github.io/kagome/](https://ikawaha.github.io/kagome/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
module kagome/examples/wasm | ||
|
||
go 1.19 | ||
|
||
require ( | ||
github.com/ikawaha/kagome-dict/ipa v1.0.10 | ||
github.com/ikawaha/kagome/v2 v2.9.3 | ||
) | ||
|
||
require github.com/ikawaha/kagome-dict v1.0.9 // indirect | ||
|
||
replace github.com/ikawaha/kagome/v2 => ../../ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
github.com/ikawaha/kagome-dict v1.0.9 h1:1Gg735LbBYsdFu13fdTvW6eVt0qIf5+S2qXGJtlG8C0= | ||
github.com/ikawaha/kagome-dict v1.0.9/go.mod h1:mn9itZLkFb6Ixko7q8eZmUabHbg3i9EYewnhOtvd2RM= | ||
github.com/ikawaha/kagome-dict/ipa v1.0.10 h1:wk9I21yg+fKdL6HJB9WgGiyXIiu1VttumJwmIRwn0g8= | ||
github.com/ikawaha/kagome-dict/ipa v1.0.10/go.mod h1:rbaOKrF58zhtpV2+2sVZBj0sUSp9dVKPjr660MehJbs= |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
//go:build ignore | ||
// +build ignore | ||
//go:build js && wasm | ||
// +build js,wasm | ||
|
||
package main | ||
|
||
|
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.