Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: performance improvements #39

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wooorm
Copy link

@wooorm wooorm commented Nov 5, 2014

By adding stemmer and metaphone, performance on the benchmark increases significantly.

The benchmarks were done on a MacBook Air (late 2011), while listening to music and such, but the difference is certainly there.

Results w/ stemmer and metaphone:

             139 op/s » tiny
              29 op/s » small
              10 op/s » medium
               0 op/s » large

Results w/ natural.Metaphone.process and natural.PorterStemmer.stem:

              96 op/s » tiny
              20 op/s » small
               5 op/s » medium
               0 op/s » large

P.S. As my result are, both with and without this feature, slower than the benchmark in Readme.md, I’ve not updated it.

This adds `stemmer` and `metaphone` as dependencies for their
higher performance. This improves the benchmark result by 31% on
`tiny` and `small`, and 50% on `medium`.
@stockholmux
Copy link
Collaborator

This would be a breaking change to those who already have an existing index, no?

@wooorm
Copy link
Author

wooorm commented Nov 5, 2014

Maybe, as stemmer and metaphone do sometimes return slightly different results. However, these changes have to do with bugs [1, 2] present in Natural’s implementation that do not exist in stemmer and metaphone

@stockholmux
Copy link
Collaborator

Yep. Reds relies on the search query to generate the same metaphone and stem as is stored in Redis. If the stemmer and metaphone output different results than stored, the search would not return the expected results.

Since it could be breaking for users, perhaps the wise thing to do is have stemmer and metaphone as options. Since natural is loaded anyway, use the new modules if specified in a second argument configuration object on createSearch. That way greenfield installs could take advantage of the faster routines in the new modules.

@wooorm
Copy link
Author

wooorm commented Nov 6, 2014

Right, but I’d suggest pushing version 0.3.0 over making this optional.

Also, to avoid loading the complete natural, it’s also possible to swap it with something like huned/node-stopwords, but that would also be breaking.

Who has merge-access? @tj only?

@plus-
Copy link
Contributor

plus- commented Nov 6, 2014

Since he moved to Go, @tj no longer maintain his node projects.

Someone has to step up and ask him contribution rights for this module if you want to see anything change.

source: https://medium.com/code-adventures/farewell-node-js-4ba9e7f3e52b

@wooorm
Copy link
Author

wooorm commented Nov 6, 2014

Anyone interested?

@stockholmux
Copy link
Collaborator

@wooorm I still think making it an option is the best idea here. Impossible to know, but I'm sure there are lots of installs of this module running around with a package.json set to "reds" : "*" and a silent, breaking issue is not great. Granted, star-versions in package.json is dumb, but it unfortunately too common.

As far as who is the maintainer, that would be me. TJ gave me merge access after he left for go.

@wooorm
Copy link
Author

wooorm commented Nov 6, 2014

Okay, so it’s your call!

Also, I just checked the first 15 pages of GitHub results for "reds" in JSON, and I found just three packages [1, 2, 3] with broad version ranges. IMHO, that’s their problem, but I understand you protecting their functioning!

@stockholmux
Copy link
Collaborator

I worry about people who have non-open projects that rely on reds, not just other modules on github. For example, we run reds in production over about 60k documents. So, reindexing is non-trivial. It would be upsetting to have a breaking change introduced. I'm sure I'm not the only one with this situation. Hopefully the package.json is versioned properly, but you never know since star-versioning is the lazy option.

I'd be glad to write the options object createSearch but I won't get to it in the near term. I'd entertain a PR for it though.

@plus-
Copy link
Contributor

plus- commented Nov 6, 2014

I don't see the problem with a major version upgrade breaking the index.

@wooorm
Copy link
Author

wooorm commented Nov 6, 2014

@plus- 👍

@thelinuxlich
Copy link

👍

@jonathanong
Copy link
Contributor

if you guys want to maintain this, please contact @tj.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants