forked from meilisearch/charabia
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
170: Use cargo workspace r=ManyTheFish a=choznerol # Pull Request ## Related issue Fixes meilisearch/product#582 (part 1) ## What does this PR do? Convert the codebase structure to [cargo workspace](https://doc.rust-lang.org/cargo/reference/workspaces.html) according to the conclusion in meilisearch/product#582 (reply in thread) . I found the changing related to cargo workspace can accumulate merge conflict pretty easily, so I'm opening this as a standalone PR. The actual work to bring in https://github.com/choznerol/kvariants will be another follow-up PR meilisearch#171 base on this branch. ## PR checklist Please check if your PR fulfills the following requirements: - [x] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)? - [x] Have you read the contributing guidelines? - [x] Have you made sure that the title is accurate and descriptive of the changes? Thank you so much for contributing to Meilisearch! Co-authored-by: Lawrence Chou <choznerol@protonmail.com>
- Loading branch information
Showing
30 changed files
with
72 additions
and
66 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
/target | ||
meilisearch-core/target | ||
charabia/target | ||
**/*.csv | ||
**/*.json_lines | ||
**/*.rs.bk | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,57 +1,5 @@ | ||
[package] | ||
name = "charabia" | ||
version = "0.7.0" | ||
license = "MIT" | ||
authors = ["Many <many@meilisearch.com>"] | ||
edition = "2021" | ||
description = "A simple library to detect the language, tokenize the text and normalize the tokens" | ||
documentation = "https://docs.rs/charabia" | ||
repository = "https://github.com/meilisearch/charabia" | ||
keywords = ["segmenter", "tokenizer", "normalize", "language"] | ||
categories = ["text-processing"] | ||
exclude = ["/dictionaries/txt/thai/words.txt"] | ||
[workspace] | ||
resolver = "2" | ||
members = ["charabia"] | ||
default-members = ["charabia"] | ||
|
||
[dependencies] | ||
cow-utils = "0.1" | ||
csv = "1.1" | ||
deunicode = "1.1.1" | ||
fst = "0.4" | ||
jieba-rs = { version = "0.6", optional = true } | ||
once_cell = "1.5.2" | ||
serde = "1.0" | ||
slice-group-by = "0.3.0" | ||
unicode-segmentation = "1.6.0" | ||
whatlang = "0.16.1" | ||
lindera = { version = "=0.17.0", default-features = false, optional = true } | ||
pinyin = { version = "0.9", default-features = false, features = [ | ||
"with_tone", | ||
], optional = true } | ||
wana_kana = { version = "2.1.0", optional = true } | ||
unicode-normalization = "0.1.22" | ||
|
||
[features] | ||
default = ["chinese", "hebrew", "japanese", "thai", "korean"] | ||
|
||
# allow chinese specialized tokenization | ||
chinese = ["dep:pinyin", "dep:jieba-rs"] | ||
|
||
# allow hebrew specialized tokenization | ||
hebrew = [] | ||
|
||
# allow japanese specialized tokenization | ||
japanese = ["lindera/ipadic"] | ||
japanese-transliteration = ["dep:wana_kana"] | ||
|
||
# allow korean specialized tokenization | ||
korean = ["lindera/ko-dic"] | ||
|
||
# allow thai specialized tokenization | ||
thai = [] | ||
|
||
[dev-dependencies] | ||
criterion = "0.3" | ||
jemallocator = "0.3.0" | ||
|
||
[[bench]] | ||
name = "bench" | ||
harness = false |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
[package] | ||
name = "charabia" | ||
version = "0.7.0" | ||
license = "MIT" | ||
authors = ["Many <many@meilisearch.com>"] | ||
edition = "2021" | ||
description = "A simple library to detect the language, tokenize the text and normalize the tokens" | ||
documentation = "https://docs.rs/charabia" | ||
repository = "https://github.com/meilisearch/charabia" | ||
keywords = ["segmenter", "tokenizer", "normalize", "language"] | ||
categories = ["text-processing"] | ||
exclude = ["../dictionaries/txt/thai/words.txt"] | ||
|
||
[dependencies] | ||
cow-utils = "0.1" | ||
csv = "1.1" | ||
deunicode = "1.1.1" | ||
fst = "0.4" | ||
jieba-rs = { version = "0.6", optional = true } | ||
once_cell = "1.5.2" | ||
serde = "1.0" | ||
slice-group-by = "0.3.0" | ||
unicode-segmentation = "1.6.0" | ||
whatlang = "0.16.1" | ||
lindera = { version = "=0.17.0", default-features = false, optional = true } | ||
pinyin = { version = "0.9", default-features = false, features = [ | ||
"with_tone", | ||
], optional = true } | ||
wana_kana = { version = "2.1.0", optional = true } | ||
unicode-normalization = "0.1.22" | ||
|
||
[features] | ||
default = ["chinese", "hebrew", "japanese", "thai", "korean"] | ||
|
||
# allow chinese specialized tokenization | ||
chinese = ["dep:pinyin", "dep:jieba-rs"] | ||
|
||
# allow hebrew specialized tokenization | ||
hebrew = [] | ||
|
||
# allow japanese specialized tokenization | ||
japanese = ["lindera/ipadic"] | ||
japanese-transliteration = ["dep:wana_kana"] | ||
|
||
# allow korean specialized tokenization | ||
korean = ["lindera/ko-dic"] | ||
|
||
# allow thai specialized tokenization | ||
thai = [] | ||
|
||
[dev-dependencies] | ||
criterion = "0.3" | ||
jemallocator = "0.3.0" | ||
|
||
[[bench]] | ||
name = "bench" | ||
harness = false |
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.