|
2 | 2 |
|
3 | 3 | [](https://badge.fury.io/py/fast-langdetect)
|
4 | 4 | [](https://pepy.tech/project/fast-langdetect)
|
5 |
| -[](https://pepy.tech/project/fast-langdetect/month) |
| 5 | +[](https://pepy.tech/project/fast-langdetect/) |
6 | 6 |
|
7 |
| -Python 3.9-3.12 support only. 🐍 |
| 7 | +## Overview |
8 | 8 |
|
9 |
| -80x faster and 95% accurate language identification with Fasttext 🏎️ |
| 9 | +**fast-langdetect** provides ultra-fast and highly accurate language detection based on FastText, a library developed by |
| 10 | +Facebook. This package is 80x faster than traditional methods and offers 95% accuracy. |
10 | 11 |
|
11 |
| -This library is a wrapper for the language detection model trained on fasttext by Facebook. For more information, please |
12 |
| -visit: https://fasttext.cc/docs/en/language-identification.html 📘 |
| 12 | +It supports Python versions 3.9 to 3.12. |
13 | 13 |
|
14 |
| -This repository is patched |
15 |
| -from [zafercavdar/fasttext-langdetect](https://github.com/zafercavdar/fasttext-langdetect#benchmark), adding |
16 |
| -multi-language segmentation and better packaging |
17 |
| -support. 🌐 |
| 14 | +This project builds upon [zafercavdar/fasttext-langdetect](https://github.com/zafercavdar/fasttext-langdetect#benchmark) |
| 15 | +with enhancements in packaging. |
18 | 16 |
|
19 |
| -Facilitates more accurate TTS implementation. 🗣️ |
| 17 | +For more information on the underlying FastText model, refer to the official |
| 18 | +documentation: [FastText Language Identification](https://fasttext.cc/docs/en/language-identification.html). |
20 | 19 |
|
21 |
| -**Need 200M+ memory to use low_memory mode** 💾 |
| 20 | +> [!NOTE] |
| 21 | +> This library requires over 200MB of memory to use in low memory mode. |
22 | 22 |
|
23 | 23 | ## Installation 💻
|
24 | 24 |
|
| 25 | +To install fast-langdetect, you can use either `pip` or `pdm`: |
| 26 | + |
| 27 | +### Using pip |
| 28 | + |
25 | 29 | ```bash
|
26 | 30 | pip install fast-langdetect
|
27 | 31 | ```
|
28 | 32 |
|
29 |
| -## Usage 🖥️ |
| 33 | +### Using pdm |
| 34 | + |
| 35 | +```bash |
| 36 | +pdm add fast-langdetect |
| 37 | +``` |
30 | 38 |
|
31 |
| -**For more accurate language detection, please use `detect(text,low_memory=False)` to load the big model.** |
| 39 | +## Usage 🖥️ |
32 | 40 |
|
33 |
| -**Model will be downloaded in `/tmp/fasttext-langdetect` directory when you first use it.** |
| 41 | +For optimal performance and accuracy in language detection, use `detect(text, low_memory=False)` to load the larger |
| 42 | +model. |
34 | 43 |
|
35 |
| -```python |
36 |
| -from fast_langdetect import detect_langs |
| 44 | +> The model will be downloaded to the `/tmp/fasttext-langdetect` directory upon first use. |
37 | 45 |
|
38 |
| -print(detect_langs("Hello, world!")) |
39 |
| -# EN |
| 46 | +### Native API (Recommended) |
40 | 47 |
|
41 |
| -print(detect_langs("Привет, мир!")) |
42 |
| -# RU |
| 48 | +```python |
| 49 | +from fast_langdetect import detect, detect_multilingual |
43 | 50 |
|
| 51 | +# Single language detection |
| 52 | +print(detect("Hello, world!")) |
| 53 | +# Output: {'lang': 'en', 'score': 0.1520957201719284} |
44 | 54 |
|
45 |
| -print(detect_langs("你好,世界!")) |
46 |
| -# ZH |
| 55 | +print(detect("Привет, мир!")["lang"]) |
| 56 | +# Output: ru |
47 | 57 |
|
| 58 | +# Multi-language detection |
| 59 | +print(detect_multilingual("Hello, world!你好世界!Привет, мир!")) |
| 60 | +# Output: [ |
| 61 | +# {'lang': 'ru', 'score': 0.39008623361587524}, |
| 62 | +# {'lang': 'zh', 'score': 0.18235979974269867}, |
| 63 | +# ] |
48 | 64 | ```
|
49 | 65 |
|
50 |
| -## Advanced usage 🚀 |
| 66 | +### Convenient `detect_language` Function |
51 | 67 |
|
52 | 68 | ```python
|
53 |
| -from fast_langdetect import detect, detect_multilingual |
| 69 | +from fast_langdetect import detect_language |
54 | 70 |
|
55 |
| -print(detect("Hello, world!")) |
56 |
| -# {'lang': 'en', 'score': 0.1520957201719284} |
| 71 | +# Single language detection |
| 72 | +print(detect_language("Hello, world!")) |
| 73 | +# Output: EN |
57 | 74 |
|
58 |
| -print(detect_multilingual("Hello, world!你好世界!Привет, мир!")) |
59 |
| -# [{'lang': 'ru', 'score': 0.39008623361587524}, {'lang': 'zh', 'score': 0.18235979974269867}, {'lang': 'ja', 'score': 0.08473210036754608}, {'lang': 'sr', 'score': 0.057975586503744125}, {'lang': 'en', 'score': 0.05422825738787651}] |
| 75 | +print(detect_language("Привет, мир!")) |
| 76 | +# Output: RU |
| 77 | + |
| 78 | +print(detect_language("你好,世界!")) |
| 79 | +# Output: ZH |
60 | 80 | ```
|
61 | 81 |
|
62 |
| -### Splitting text by language 🌐 |
| 82 | +### Splitting Text by Language 🌐 |
63 | 83 |
|
64 |
| -check out the [split-lang](https://github.com/DoodleBears/split-lang). |
| 84 | +For text splitting based on language, please refer to the [split-lang](https://github.com/DoodleBears/split-lang) |
| 85 | +repository. |
65 | 86 |
|
66 | 87 | ## Accuracy 🎯
|
67 | 88 |
|
68 |
| -References to the [benchmark](https://github.com/zafercavdar/fasttext-langdetect#benchmark) |
| 89 | +For detailed benchmark results, refer |
| 90 | +to [zafercavdar/fasttext-langdetect#benchmark](https://github.com/zafercavdar/fasttext-langdetect#benchmark). |
0 commit comments