Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved the singularize method in inflect.py #220

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Commits on Mar 11, 2018

  1. Improved the singularize method in inflect

    Though 95% accuracy was previously achieved on measuring via CELEX
    English morphology word forms, the following changes have incremented
    the accuracy to 99%
    
    1. Added more words to the set singular_uninflected
    
    2. In the singularize method, changed the if condition for the set
       singular_uninflected from
          if x.endswith(w): return word
       to
          if x == w or w == x + "s": return x
       because the former statement considered the words in the set to be
       word endings. Hence, it affected words with prefix to the words in
       the set.
       The new condition checks if the word passed in the argument is
       present in the given list as it is or with a succeeding "s" and then
       returns the word's singular form from the list and not the word,
       which may be passed in a plural form.
    
    3. Added more words to the list singular_uncountable categorized via
       commenting such as abstract ideas and expressions, natural phenomena,
       general, etc for ease in reading and understanding
    
    4. Added more words to the list singular_ie and dicts singular_irregular
    
    5. Certain words which could be grouped via regex instead of adding in the
       above mentioned lists and dictionaries were written in the form of
       regular expressions (regex) in the singular_rules.
    
    6. In singularize method, changed the if condition for the dictionary
       singular_irregular from
          if w.endswith(x):
       to
          if x == w:
       because the former considered the word or key x in the dict to be an
       ending to the word passed as an argument to the singularize method.
       The latter condition checks whether the word w passed as argument is
       present in the dict by equating it to x. If True, it returns the
       singularized form of word w, that is, singular_irregular[x]
    
    7. Added more regex expressions to the list singular_rules to suit the
       singularization rules and improve accuracy for the singularize method
    
    8. Henceworth, this commit solves the following issues opened currently
       Issue - singularized on - earlier effect - current effect
       141 , 175   - flour     - flmy           - flour
       141         - colour    - colmy          - colour
       141         - your      - ymy            - your
       141         - olives    - olife          - olive
       176         - hummus    - hummu          - hummus
    
       [141](clips#141)
       [175](clips#175)
       [176](clips#176)
    
    9. The words added to sets singular_uninflected and singular_uncountable
       were also added to the lists in dict plural_categories["uninflected"]
       and plural_categories["uncountable"] for consistency.
    
    It is to keep in mind that the 99% accuracy is reported after being
    tested from the corpora/test_en.py and is subject to the dataset of CELEX
    English morphology word forms only.
    TanyaaCJain committed Mar 11, 2018
    Configuration menu
    Copy the full SHA
    52f360d View commit details
    Browse the repository at this point in the history