Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong ordering from collate() #558

Closed
bact opened this issue May 16, 2021 · 4 comments · Fixed by #926
Closed

Wrong ordering from collate() #558

bact opened this issue May 16, 2021 · 4 comments · Fixed by #926
Labels
bug bugs in the library Hacktoberfest for Hacktoberfest event help wanted no contributor yet
Milestone

Comments

@bact
Copy link
Member

bact commented May 16, 2021

Description

pythainlp.util.collate() results a wrong ordering,
as current implementation ignores tone marks and symbols in the ordering.

Try this code:

from pythainlp.util import collate

collate(["ก้วย", "ก๋วย", "ก่วย", "กวย", "ก้วย", "ก่วย", "ก๊วย"])

Expected results

Ordering according to Thai dictionary

['กวย', 'ก่วย', 'ก่วย', 'ก้วย', 'ก้วย', 'ก๊วย', 'ก๋วย']

Current results

['ก้วย', 'ก๋วย', 'ก่วย', 'ก้วย', 'ก่วย', 'ก๊วย', 'กวย']

Your environment

  • PyThaiNLP version: 2.3.1

Files

pythainlp/util/collate.py

Proposed test case

class TestUtilPackage(unittest.TestCase):

    # ### pythainlp.util.collate

    def test_collate(self):
        self.assertEqual(
            collate(["ก้วย", "ก๋วย", "กวย", "ก่วย", "ก๊วย"]),
            collate(["ก๋วย", "ก่วย", "ก้วย", "ก๊วย", "กวย"]),
        )  # should guarantee same order
        self.assertEqual(
            collate(["ก้วย", "ก๋วย", "ก่วย", "กวย", "ก้วย", "ก่วย", "ก๊วย"]),
            ["กวย", "ก่วย", "ก่วย", "ก้วย", "ก้วย", "ก๊วย", "ก๋วย"],
        )
@bact
Copy link
Member Author

bact commented May 16, 2021

Added notes on this to collate()'s docstring bc8223a

@wannaphong wannaphong added the bug bugs in the library label May 16, 2021
@bact bact added this to the Future milestone May 16, 2021
@bact
Copy link
Member Author

bact commented May 16, 2021

@wannaphong wannaphong added the Hacktoberfest for Hacktoberfest event label Sep 29, 2021
@sahussawud
Copy link

Can I assign myself to this task. If yes, Is any rule I have to follow before pull request eg. code styling.

@wannaphong
Copy link
Member

wannaphong commented Dec 1, 2021

Can I assign myself to this task. If yes, Is any rule I have to follow before pull request eg. code styling.

Thank you. Here is the list of pull request.

  • Write code by PEP8 code style. We have PEP8 checker when have pull request. You can use pycodestyle.
  • Pass unittest of your function.
  • If you create new function, you wants add document and unittest.

If you have a quetion, you can direct contact me at my Facebook. https://www.facebook.com/tontanwannaphong/

@bact bact added the help wanted no contributor yet label Oct 17, 2023
@wannaphong wannaphong linked a pull request Oct 10, 2024 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug bugs in the library Hacktoberfest for Hacktoberfest event help wanted no contributor yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants