Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed issue with Arabic where only a single expansion character was being removed during stemming #84

Merged
merged 2 commits into from
Sep 28, 2022

Conversation

brendon1982
Copy link
Contributor

  • Describe in detail what you did and why you did it
    • What you did: I fixed issue with Arabic where only a single expansion character was being removed during stemming. All expansion characters are now removed.
    • Why you did it: The expansion character is only for aesthetic purposes and does not change the meaning of the word. There is no limit to the number of expansion characters a word can have, so when only one is removed instead of all of them, you end up with many versions of the same word in the inverted index, each with a different amount of expansions.
  • Add tests. The more the better
    • with at least 1 scenario in case of a bug-fix and 4 scenarios in case of a new language
  • Don't open a Pull request for more unrelated topics. (eg a bug-fix, a new language, and few changes to stemming for existing languages in a single pull request)

…eing removed during stemming. All expansion characters are now removed.
…+ regex for replacing all occurrences of expansion character as it is compatible with more JS environments.
@MihaiValentin MihaiValentin merged commit 98d0553 into MihaiValentin:master Sep 28, 2022
@MihaiValentin
Copy link
Owner

Thanks for the contribution @brendon1982 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants