We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pythainlp.util.collate() results a wrong ordering, as current implementation ignores tone marks and symbols in the ordering.
pythainlp.util.collate()
Try this code:
from pythainlp.util import collate collate(["ก้วย", "ก๋วย", "ก่วย", "กวย", "ก้วย", "ก่วย", "ก๊วย"])
Ordering according to Thai dictionary
['กวย', 'ก่วย', 'ก่วย', 'ก้วย', 'ก้วย', 'ก๊วย', 'ก๋วย']
['ก้วย', 'ก๋วย', 'ก่วย', 'ก้วย', 'ก่วย', 'ก๊วย', 'กวย']
pythainlp/util/collate.py
class TestUtilPackage(unittest.TestCase): # ### pythainlp.util.collate def test_collate(self): self.assertEqual( collate(["ก้วย", "ก๋วย", "กวย", "ก่วย", "ก๊วย"]), collate(["ก๋วย", "ก่วย", "ก้วย", "ก๊วย", "กวย"]), ) # should guarantee same order self.assertEqual( collate(["ก้วย", "ก๋วย", "ก่วย", "กวย", "ก้วย", "ก่วย", "ก๊วย"]), ["กวย", "ก่วย", "ก่วย", "ก้วย", "ก้วย", "ก๊วย", "ก๋วย"], )
The text was updated successfully, but these errors were encountered:
Added notes on this to collate()'s docstring bc8223a
Sorry, something went wrong.
May try to implement libthai's thcoll https://github.com/tlwg/libthai/tree/master/src/thcoll
See character weight table at https://github.com/tlwg/libthai/blob/master/src/thcoll/cweight.c
Can I assign myself to this task. If yes, Is any rule I have to follow before pull request eg. code styling.
Thank you. Here is the list of pull request.
pycodestyle
If you have a quetion, you can direct contact me at my Facebook. https://www.facebook.com/tontanwannaphong/
Successfully merging a pull request may close this issue.
Description
pythainlp.util.collate()
results a wrong ordering,as current implementation ignores tone marks and symbols in the ordering.
Try this code:
Expected results
Ordering according to Thai dictionary
Current results
Your environment
Files
pythainlp/util/collate.py
Proposed test case
The text was updated successfully, but these errors were encountered: