PyThaiNLP v5.0.0 Released!
We are excited to announce the latest release of PyThaiNLP - version 5.0! PyThaiNLP is a Python library for Thai natural language processing (NLP). We are welcome to release PyThaiNLP 5.0!
With PyThaiNLP 5.0, you can expect improved performance and accuracy for NLP tasks in Thai. We have also added new functions to make your NLP tasks even easier and more efficient.
Install: pip install pythainlp
Upgrade: pip install -U pythainlp
- Documentation: https://pythainlp.github.io/docs/5.0
- Report bug: https://github.com/PyThaiNLP/pythainlp/issues
See PyThaiNLP 5.0 Change Log: #788.
What is new?
License information
- Use SPDX license identifier at the header of source code #876
Deprecation and other API changes
- Change default NER to thainer-v2 5e97e7c
- Move
pythainlp.util.is_native_thai
topythainlp.morpheme.is_native_thai
524759a
Dependency
- Add tzdata as a dependency on Windows by @BLKSerene in #841
New API
- Add
pythainlp.coref
for Thai coreference resolution #802 - Add
wtpsplit
to sentence segmentation & paragraph segmentation #804 and addparagraph_threshold
intoparagraph_tokenize()
function #806 - Add word approximation to
pythainlp.soundex.sound
#809 by @wannaphong - Add
pythainlp.wsd
for Thai word sense disambiguation #818 by @wannaphong - Add
pythainlp.chat
andWangChanGLM
topythainlp.generate
#819 by @wannaphong - Add
pythainlp.cls
a param-free classification model #821 by @c4n - Add
pythainlp.el
entity linking #822 by @wannaphong - Add
pythainlp.ancient
by @wannaphong in #833 - Add
pythainlp.util.rhyme
by @wannaphong in #849 - Add
remove_trailing_repeat_consonants
by @konbraphat51 in #862 - Add
pythainlp.util.to_idn
by @wannaphong in #875 - Add
pythainlp.corpus.find_synonyms
by @wannaphong in #890 - Add
pythainlp.util.morse
by @wannaphong in #891 - Add
pythainlp.morpheme
by @wannaphong in #896
Improve
- Update code comments and clean up codes by @BLKSerene in #845
- Improving the documentation byt fixing the typos, adding necesarry details and explanation of the code and the missing necessary details about model and example. by @Saharshjain78 in #850
- Fix tests of khavee functions by @BLKSerene in #854
- Update Git Actions versions by @bact in #878
- Fix ruff args in workflow by @bact in #880
- Revise ruff args in workflow by @bact in #881
- Fix coref return type and add fallback by @bact in #883
- Fix wrong/incompatible types, code readability by @bact in #884
- Bump protobuf from 3.20 to 3.20.2 by #885
- Add license info to /tests and README_TH.md by @bact in #886
- phayathaibert, khavee, parse: Code clean up by @bact in #889
- ruff: docstring-code-format = true by @bact in #892
Tokenizer
- Add wtpsplit engine to sentence_tokenize #804
- New
paragraph_tokenize
funtion to split Thai text to a paragraph #804 - Add
paragraph_threshold
intoparagraph_tokenize()
function #806 by @pavaris-pm in - Add 🪿 Han-solo by @wannaphong in #830
- Fix
newmm
to better handle non-Thai characters in tokens #856 by @konbraphat51 - Fix incorrect passing of flags to re.split by @hauntsaninja in #832
- Add syllable_tokenize by @wannaphong in #834
- Add wanchanberta_thai_grammarly by @wannaphong in #836
- Add extra segmentation style for paragraph_tokenize function by @pavaris-pm in #844
- Improve: [newmm tokenizer] Change regular expression of "non-thai-characters" by @konbraphat51 in #856
Tag
- Add function for pos tag with transformers by @MpolaarbearM in #857
- Update pos_tag_transformers function by @pavaris-pm in #865
- Add PhayaThaiBERT engine with new features by @pavaris-pm in #873
Chat
- Fixed bug #828
Translate
- Add small100 to
pythainlp.translate
#815 by @wannaphong
Transliterate
- Fix duplicate keys in ISO 11940 and IPA-RTGS phoneme mapping #851 #852 by @BLKSerene and @bact
- Fix duplicate key in IPA to RTGS phoneme mapping by @BLKSerene in #852
Corpus
- Add
pythainlp.corpus.thai_orst_words()
Thai word list from Royal Society of Thailand (ORST) #810 by @wannaphong - Add
pythainlp.corpus.thai_wikipedia_titles()
Thai word list (noun and noun phrases) from Thai Wikipedia titles #869 by @konbraphat51 - Add
pythainlp.corpus.thai_volubilis_words()
Thai word list from Volubilis dictionary #870 by @konbraphat51 - Add
pythainlp.corpus.thai_icu_words()
Thai word list from ICU BreakIterator dictionary #879 by @pavaris-pm - Rename Volubilis/Wikipedia corpus function names for consistency / Fix types by @bact in #882
Util
- Add
pythainlp.util.encoding
#813 by @wannaphong - Add
pythainlp.util.spell_words
#817 by @wannaphong - Add
pythainlp.util.remove_trailing_repeat_consonants()
#862 by @konbraphat51
New Contributors
- @pavaris-pm made their first contribution in #806
- @hauntsaninja made their first contribution in #832
- @Saharshjain78 made their first contribution in #850
- @konbraphat51 made their first contribution in #856
- @MpolaarbearM made their first contribution in #857
Full Changelog: v4.0.2...v5.0.0
Contributors
Thanks all the contributors. (Image made with contributors-img)