Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable support for simple singularization #868

Closed
kendallb opened this issue Oct 25, 2019 · 0 comments
Closed

Disable support for simple singularization #868

kendallb opened this issue Oct 25, 2019 · 0 comments

Comments

@kendallb
Copy link
Contributor

We are currently using this library to help with product search on our web site, and we take all the keywords we index against and get singular and plural versions of the words to add to the index. Unfortunately the singular algorithm has the simplest rule at the start, which is anything ending in an 's' should simply have the 's' removed. While this normally makes sense, in our case it just introduces a lot of noise because our search indexing will always happily match on a partial work. So if we have 'tires' in our search index, if someone enter 'tire' or 'tires' they will both match so adding the singular for 'tire' into the index is not necessary.

Stuff does sideways with this simple algorithm because then it starts to convert things like brand names to singular like 'Traxxas' becomes 'Traxxa' but that is not a real word.

The simple solution is to just remove the first element in the rule list, but there is no way to do that using the stock library as I cannot modify the internal rule list, nor can I replace the vocabulary with my own (I can't create a Vocabulary class as it's internal).

For now I plan to simply fork the library and hack it out so I can do what I need, but what I would prefer to do is modify the library so I can adjust the way it works for my needs and get that accepted upstream so I don't need to maintain my own library.

I see there are a couple of ways to do this:

  1. Add a function to be able to remove a rule from the default vocabulary
  2. Add a parameter on the Singular function to tell it to ignore the simple first rule
  3. Allow me to create my own vocabulary (make the class constructor not internal)

I am not sure that 1 or 3 are really good long term solutions as then I am changing the default vocabulary and if someone else in our team uses it for some other purpose in the same context, they might get unexpected results. So I am leaning towards doing #2.

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants