Skip to content

Commit

Permalink
chore: revamp sample dir as _example
Browse files Browse the repository at this point in the history
Refactor directory structure of examples.

- related issue ikawaha#200 ikawaha#299 (comment)
  • Loading branch information
KEINOS committed Apr 13, 2024
1 parent 4aaed1d commit 7685a8f
Show file tree
Hide file tree
Showing 21 changed files with 196 additions and 55 deletions.
65 changes: 65 additions & 0 deletions _examples/db_search/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Full-text search with Kagome and SQLite3

This example provides a practical example of how to work with Japanese text data and **perform efficient [full-text search](https://en.wikipedia.org/wiki/Full-text_search) using Kagome and SQLite3**.

- Target text data is as follows:

```text
人魚は、南の方の海にばかり棲んでいるのではありません。
北の海にも棲んでいたのであります。
北方の海の色は、青うございました。
ある時、岩の上に、女の人魚があがって、
あたりの景色を眺めながら休んでいました。
小川未明 『赤い蝋燭と人魚』
```

- Example output:

```shellsession
$ cd /path/to/kagome/_examples/db_search
$ go run .
Searching for: 人魚
Found content: 人魚は、南の方の海にばかり棲んでいるのではありません。 at line: 1
Found content: ある時、岩の上に、女の人魚があがって、 at line: 4
Found content: 小川未明 『赤い蝋燭と人魚』 at line: 6
Searching for: 人
No results found
Searching for: 北方
Found content: 北方の海の色は、青うございました。 at line: 3
Searching for: 北
Found content: 北の海にも棲んでいたのであります。 at line: 2
```

- [View main.go](main.go)

## Details

In this example, each line of text is inserted into a row of the SQLite3 database, and then the database is searched for the word "人魚" and "人".

Note that the string tokenized by Kagome, a.k.a. "Wakati", is recorded in a separate table for [FTS4](https://www.sqlite.org/fts3.html) (Full-Text-Search) at the same time as the original text.

This allows Unicode text data that is not separated by spaces, such as Japanese, to be searched by FTS.

### Aim of this example

This example can be useful in scenarios where you need to perform full-text searches on Japanese text.

It demonstrates how to tokenize Japanese text using Kagome, which is a common requirement when working with text data in the Japanese language.

By using SQLite with FTS4, it efficiently manages and searches through a large amount of text data, making it suitable for applications like:

1. **Search Engines:** You can use this code as a basis for building a search engine that indexes and searches Japanese text content.
2. **Document Management Systems:** This code can be integrated into a document management system to enable full-text search capabilities for Japanese documents.
3. **Content Recommendation Systems:** When you have a large collection of Japanese content, you can use this code to implement content recommendation systems based on user queries.
4. **Chatbots and NLP:** If you're building chatbots or natural language processing (NLP) systems for Japanese language, this code can assist in text analysis and search within the chatbot's knowledge base.

## Acknowledgements

This example is taken in part from the following book for reference.

- p.204, 9.2 "データーベース登録プログラム", "Go言語プログラミングエッセンス エンジニア選書"
- Written by: [Mattn](https://github.com/mattn)
- Published: 2023/3/9 (技術評論社)
- ISBN: 4297134195 / 978-4297134198
- ASIN: B0BVZCJQ4F / [https://amazon.co.jp/dp/4297134195](https://amazon.co.jp/dp/4297134195)
- Original sample code: [https://github.com/mattn/aozora-search](https://github.com/mattn/aozora-search)
2 changes: 1 addition & 1 deletion sample/_example/go.mod → _examples/db_search/go.mod
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
module kagome/examples
module kagome/examples/db_search

go 1.19

Expand Down
File renamed without changes.
34 changes: 6 additions & 28 deletions sample/_example/db_search/main.go → _examples/db_search/main.go
Original file line number Diff line number Diff line change
@@ -1,35 +1,9 @@
/*
# TL; DR
# Full-text search with Kagome and SQLite3
This example provides a practical example of how to work with Japanese text data and perform efficient full-text search using Kagome and SQLite3.
# TS; WM
In this example, each line of text is inserted into a row of the SQLite3 database, and then the database is searched for the word "人魚" and "人".
Note that the string tokenized by Kagome, a.k.a. "Wakati", is recorded in a separate table for FTS (Full-Text-Search) at the same time as the original text.
This allows Unicode text data that is not separated by spaces, such as Japanese, to be searched by FTS.
Aim of this example:
This example can be useful in scenarios where you need to perform full-text searches on Japanese text. It demonstrates how to tokenize Japanese text using Kagome, which is a common requirement when working with text data in the Japanese language. By using SQLite with FTS4, it efficiently manages and searches through a large amount of text data, making it suitable for applications like:
1. **Search Engines:** You can use this code as a basis for building a search engine that indexes and searches Japanese text content.
2. **Document Management Systems:** This code can be integrated into a document management system to enable full-text search capabilities for Japanese documents.
3. **Content Recommendation Systems:** When you have a large collection of Japanese content, you can use this code to implement content recommendation systems based on user queries.
4. **Chatbots and NLP:** If you're building chatbots or natural language processing (NLP) systems for Japanese language, this code can assist in text analysis and search within the chatbot's knowledge base.
Acknowledgements:
This example is taken in part from the following book for reference.
- p.204, 9.2 "データーベース登録プログラム", "Go言語プログラミングエッセンス エンジニア選書"
- Written by: Mattn
- Published: 2023/3/9 (技術評論社)
- ISBN: 4297134195 / 978-4297134198
- ASIN: B0BVZCJQ4F / https://amazon.co.jp/dp/4297134195
- Original sample code: https://github.com/mattn/aozora-search
For details and acknowledgements, see the README.md file in the same directory.
*/
package main

Expand All @@ -39,6 +13,7 @@ import (
"fmt"
"log"
"os"
"slices"
"strings"

"github.com/ikawaha/kagome-dict/ipa"
Expand Down Expand Up @@ -165,6 +140,9 @@ func insertSearchToken(db *sql.DB, rowID int64, content string) error {
}

seg := tknzr.Wakati(content)

seg = slices.Compact(seg) // remove duplicate segment tokens

tokenizedContent := strings.Join(seg, " ")

_, err = db.Exec(
Expand Down
12 changes: 12 additions & 0 deletions _examples/tokenize/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
module kagome/examples/tokenize

go 1.19

require (
github.com/ikawaha/kagome-dict/ipa v1.0.10
github.com/ikawaha/kagome/v2 v2.9.3
)

require github.com/ikawaha/kagome-dict v1.0.9 // indirect

replace github.com/ikawaha/kagome/v2 => ../../
4 changes: 4 additions & 0 deletions _examples/tokenize/go.sum
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
github.com/ikawaha/kagome-dict v1.0.9 h1:1Gg735LbBYsdFu13fdTvW6eVt0qIf5+S2qXGJtlG8C0=
github.com/ikawaha/kagome-dict v1.0.9/go.mod h1:mn9itZLkFb6Ixko7q8eZmUabHbg3i9EYewnhOtvd2RM=
github.com/ikawaha/kagome-dict/ipa v1.0.10 h1:wk9I21yg+fKdL6HJB9WgGiyXIiu1VttumJwmIRwn0g8=
github.com/ikawaha/kagome-dict/ipa v1.0.10/go.mod h1:rbaOKrF58zhtpV2+2sVZBj0sUSp9dVKPjr660MehJbs=
File renamed without changes.
11 changes: 11 additions & 0 deletions _examples/user_dict/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
module kagome/examples/user_dict

go 1.19

require (
github.com/ikawaha/kagome-dict v1.0.9
github.com/ikawaha/kagome-dict/ipa v1.0.10
github.com/ikawaha/kagome/v2 v2.9.3
)

replace github.com/ikawaha/kagome/v2 => ../../
4 changes: 4 additions & 0 deletions _examples/user_dict/go.sum
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
github.com/ikawaha/kagome-dict v1.0.9 h1:1Gg735LbBYsdFu13fdTvW6eVt0qIf5+S2qXGJtlG8C0=
github.com/ikawaha/kagome-dict v1.0.9/go.mod h1:mn9itZLkFb6Ixko7q8eZmUabHbg3i9EYewnhOtvd2RM=
github.com/ikawaha/kagome-dict/ipa v1.0.10 h1:wk9I21yg+fKdL6HJB9WgGiyXIiu1VttumJwmIRwn0g8=
github.com/ikawaha/kagome-dict/ipa v1.0.10/go.mod h1:rbaOKrF58zhtpV2+2sVZBj0sUSp9dVKPjr660MehJbs=
36 changes: 36 additions & 0 deletions _examples/user_dict/main.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
package main

import (
"fmt"

"github.com/ikawaha/kagome-dict/dict"
"github.com/ikawaha/kagome-dict/ipa"
"github.com/ikawaha/kagome/v2/tokenizer"
)

func main() {
// Use IPA dictionary as a system dictionary.
sysDic := ipa.Dict()

// Build a user dictionary from a file.
userDic, err := dict.NewUserDict("userdict.txt")
if err != nil {
panic(err)
}

// Specify the user dictionary as an option.
t, err := tokenizer.New(sysDic, tokenizer.UserDict(userDic), tokenizer.OmitBosEos())
if err != nil {
panic(err)
}

tokens := t.Analyze("関西国際空港限定トートバッグ", tokenizer.Search)
for _, token := range tokens {
fmt.Printf("%s\t%v\n", token.Surface, token.Features())
}

// Output:
// 関西国際空港 [テスト名詞 関西/国際/空港 カンサイ/コクサイ/クウコウ]
// 限定 [名詞 サ変接続 * * * * 限定 ゲンテイ ゲンテイ]
// トートバッグ [名詞 一般 * * * * *]
}
File renamed without changes.
12 changes: 12 additions & 0 deletions _examples/wakati/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
module kagome/examples/wakati

go 1.19

require (
github.com/ikawaha/kagome-dict/ipa v1.0.10
github.com/ikawaha/kagome/v2 v2.9.3
)

require github.com/ikawaha/kagome-dict v1.0.9 // indirect

replace github.com/ikawaha/kagome/v2 => ../../
4 changes: 4 additions & 0 deletions _examples/wakati/go.sum
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
github.com/ikawaha/kagome-dict v1.0.9 h1:1Gg735LbBYsdFu13fdTvW6eVt0qIf5+S2qXGJtlG8C0=
github.com/ikawaha/kagome-dict v1.0.9/go.mod h1:mn9itZLkFb6Ixko7q8eZmUabHbg3i9EYewnhOtvd2RM=
github.com/ikawaha/kagome-dict/ipa v1.0.10 h1:wk9I21yg+fKdL6HJB9WgGiyXIiu1VttumJwmIRwn0g8=
github.com/ikawaha/kagome-dict/ipa v1.0.10/go.mod h1:rbaOKrF58zhtpV2+2sVZBj0sUSp9dVKPjr660MehJbs=
File renamed without changes.
23 changes: 23 additions & 0 deletions _examples/wasm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# WebAssembly Example of Kagome

- Build

```sh
GOOS=js GOARCH=wasm go build -o kagome.wasm main.go
```

```shellsession
├── docs ... gh-pages
│   ├── index.html
│   ├── kagome.wasm
│   └── wasm_exec.js
├── _examples
│   └── wasm
│   ├── README.md ... this document
│   ├── kagome.html ... html sample
│   ├── main.go ... source code
│   ├── go.mod
│   └── go.sum
```

- Online demo: [https://ikawaha.github.io/kagome/](https://ikawaha.github.io/kagome/)
12 changes: 12 additions & 0 deletions _examples/wasm/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
module kagome/examples/wasm

go 1.19

require (
github.com/ikawaha/kagome-dict/ipa v1.0.10
github.com/ikawaha/kagome/v2 v2.9.3
)

require github.com/ikawaha/kagome-dict v1.0.9 // indirect

replace github.com/ikawaha/kagome/v2 => ../../
4 changes: 4 additions & 0 deletions _examples/wasm/go.sum
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
github.com/ikawaha/kagome-dict v1.0.9 h1:1Gg735LbBYsdFu13fdTvW6eVt0qIf5+S2qXGJtlG8C0=
github.com/ikawaha/kagome-dict v1.0.9/go.mod h1:mn9itZLkFb6Ixko7q8eZmUabHbg3i9EYewnhOtvd2RM=
github.com/ikawaha/kagome-dict/ipa v1.0.10 h1:wk9I21yg+fKdL6HJB9WgGiyXIiu1VttumJwmIRwn0g8=
github.com/ikawaha/kagome-dict/ipa v1.0.10/go.mod h1:rbaOKrF58zhtpV2+2sVZBj0sUSp9dVKPjr660MehJbs=
File renamed without changes.
4 changes: 2 additions & 2 deletions sample/wasm/main.go → _examples/wasm/main.go
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
//go:build ignore
// +build ignore
//go:build js && wasm
// +build js,wasm

package main

Expand Down
21 changes: 0 additions & 21 deletions sample/wasm/README.md

This file was deleted.

3 changes: 0 additions & 3 deletions sample/wasm/go.mod

This file was deleted.

0 comments on commit 7685a8f

Please sign in to comment.