Improved the singularize method in inflect.py #220

Though 95% accuracy was previously achieved on measuring via CELEX English morphology word forms, the following changes have incremented the accuracy to 99% 1. Added more words to the set singular_uninflected 2. In the singularize method, changed the if condition for the set singular_uninflected from if x.endswith(w): return word to if x == w or w == x + "s": return x because the former statement considered the words in the set to be word endings. Hence, it affected words with prefix to the words in the set. The new condition checks if the word passed in the argument is present in the given list as it is or with a succeeding "s" and then returns the word's singular form from the list and not the word, which may be passed in a plural form. 3. Added more words to the list singular_uncountable categorized via commenting such as abstract ideas and expressions, natural phenomena, general, etc for ease in reading and understanding 4. Added more words to the list singular_ie and dicts singular_irregular 5. Certain words which could be grouped via regex instead of adding in the above mentioned lists and dictionaries were written in the form of regular expressions (regex) in the singular_rules. 6. In singularize method, changed the if condition for the dictionary singular_irregular from if w.endswith(x): to if x == w: because the former considered the word or key x in the dict to be an ending to the word passed as an argument to the singularize method. The latter condition checks whether the word w passed as argument is present in the dict by equating it to x. If True, it returns the singularized form of word w, that is, singular_irregular[x] 7. Added more regex expressions to the list singular_rules to suit the singularization rules and improve accuracy for the singularize method 8. Henceworth, this commit solves the following issues opened currently Issue - singularized on - earlier effect - current effect 141 , 175 - flour - flmy - flour 141 - colour - colmy - colour 141 - your - ymy - your 141 - olives - olife - olive 176 - hummus - hummu - hummus [141](clips#141) [175](clips#175) [176](clips#176) 9. The words added to sets singular_uninflected and singular_uncountable were also added to the lists in dict plural_categories["uninflected"] and plural_categories["uncountable"] for consistency. It is to keep in mind that the 99% accuracy is reported after being tested from the corpora/test_en.py and is subject to the dataset of CELEX English morphology word forms only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved the singularize method in inflect.py #220

Improved the singularize method in inflect.py #220

Commits on Mar 11, 2018

Improved the singularize method in inflect.py #220

Are you sure you want to change the base?

Improved the singularize method in inflect.py #220

Commits on Mar 11, 2018