You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are currently using this library to help with product search on our web site, and we take all the keywords we index against and get singular and plural versions of the words to add to the index. Unfortunately the singular algorithm has the simplest rule at the start, which is anything ending in an 's' should simply have the 's' removed. While this normally makes sense, in our case it just introduces a lot of noise because our search indexing will always happily match on a partial work. So if we have 'tires' in our search index, if someone enter 'tire' or 'tires' they will both match so adding the singular for 'tire' into the index is not necessary.
Stuff does sideways with this simple algorithm because then it starts to convert things like brand names to singular like 'Traxxas' becomes 'Traxxa' but that is not a real word.
The simple solution is to just remove the first element in the rule list, but there is no way to do that using the stock library as I cannot modify the internal rule list, nor can I replace the vocabulary with my own (I can't create a Vocabulary class as it's internal).
For now I plan to simply fork the library and hack it out so I can do what I need, but what I would prefer to do is modify the library so I can adjust the way it works for my needs and get that accepted upstream so I don't need to maintain my own library.
I see there are a couple of ways to do this:
Add a function to be able to remove a rule from the default vocabulary
Add a parameter on the Singular function to tell it to ignore the simple first rule
Allow me to create my own vocabulary (make the class constructor not internal)
I am not sure that 1 or 3 are really good long term solutions as then I am changing the default vocabulary and if someone else in our team uses it for some other purpose in the same context, they might get unexpected results. So I am leaning towards doing #2.
Thoughts?
The text was updated successfully, but these errors were encountered:
We are currently using this library to help with product search on our web site, and we take all the keywords we index against and get singular and plural versions of the words to add to the index. Unfortunately the singular algorithm has the simplest rule at the start, which is anything ending in an 's' should simply have the 's' removed. While this normally makes sense, in our case it just introduces a lot of noise because our search indexing will always happily match on a partial work. So if we have 'tires' in our search index, if someone enter 'tire' or 'tires' they will both match so adding the singular for 'tire' into the index is not necessary.
Stuff does sideways with this simple algorithm because then it starts to convert things like brand names to singular like 'Traxxas' becomes 'Traxxa' but that is not a real word.
The simple solution is to just remove the first element in the rule list, but there is no way to do that using the stock library as I cannot modify the internal rule list, nor can I replace the vocabulary with my own (I can't create a Vocabulary class as it's internal).
For now I plan to simply fork the library and hack it out so I can do what I need, but what I would prefer to do is modify the library so I can adjust the way it works for my needs and get that accepted upstream so I don't need to maintain my own library.
I see there are a couple of ways to do this:
I am not sure that 1 or 3 are really good long term solutions as then I am changing the default vocabulary and if someone else in our team uses it for some other purpose in the same context, they might get unexpected results. So I am leaning towards doing #2.
Thoughts?
The text was updated successfully, but these errors were encountered: