-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
please reconsider the addition of ndarray and hashbrown as dependencies #34
Comments
* [performance] Improve the performance of damerau_levenshtein My tests show the new version is approximately 2.5 times faster than the old ones. Most of the gains come from using hashbrown for the hashmap. * Switch back to rust 2015 to fix CI * Increase minimum required rust version to 1.31 * Simplify the levenshtein function
Hi, thanks for bringing this up. I did consider the usual trade offs with taking on dependencies. The performance gain seemed worth it to me, but I didn't know that hashbrown was going to be merged into the standard library until I saw it on Reddit recently. Based on that, I agree it doesn't need to be a dependency here. For ndarray, that was laziness on my part. I should have separated it out to see if it alone would improve performance significantly. Based on the first comment in #32, it probably won't. I will try it for myself this weekend to verify. Apologies for not doing so before merging. After that, I will pull hashbrown out and ndarray as well, barring any surprising results from my testing. @lovasoa, please let me know if you have any objections or concerns. @BurntSushi, I'll try to be more diligent in the future, as I agree that lower level libraries should require a correspondingly higher bar for taking on dependencies. |
Thank you so much! :-) |
From my testing, it appears that using ndarray does result in a speedup of about 17%. 0.9.0:
with ndarray:
17% is a bigger value than I expected, but it still doesn't seem large enough to justify all the sub-dependencies, especially considering only one of the metrics is affected. I'm going to sleep on it. |
Can you point me in the right direction to re-run those benchmarks? I can take a look. It is surprising to me that ndarray is responsible for that speedup. |
A compromise between adding a dependency to ndarray and using a vector of vectors may be to use a single fixed size array, and make the necessary computations at array access time, replacing distances[i][j] by distances[i*width + j] |
Instead of representing a 2x2 grid with a vector of vectors, just use a single vector to improve performance. We can do this since the dimensions are fixed. This method was suggested by @lovasoa as an alternative to adding a dependency on the ndarray crate. In my benchmark testing, the new approach is about as fast using ndarray. On my machine, the original approach takes about 22,000 ns/iter, whereas the new approach takes about 17,000 ns/iter. See #34 for more context.
@BurntSushi, you can run @lovasoa, nice suggestion. I just tried using a single, flat vector (see the flatten-vectors branch), and it's as fast as using I suppose |
We can drop a dependency while retaining the performance boost. Related to #34
Ok, I've published v0.9.2, which removes hashbrown and switches out ndarray for the one vector approach. Thanks everyone! |
Thank you! :-) |
I left a comment here on the commit in which these deps were introduced, but I'm not sure if it was seen: d6717db#r33186236
So I'd like to open an issue to increase its visibility. In particular, it would be great if
strsim
could remain fairly lightweight since it is depended on by a bunch of crates viaclap
. I'm hoping that we, as an ecosystem, can be a bit more judicious in adding dependencies to crates.For hashbrown, which is awesome, it is going to be included in the standard library very soon. Folks upgrading their Rust version will automatically get this optimization soon enough. I don't think it's worth adding it as an explicit dependency for a small gain for users on older versions of Rust, because it impacts compilation times for everyone.
For ndarray, it looks like it was just used for some small convenience. I don't think it's worth bringing ndarray in (as excellent as it is) along with all of its dependencies just for that.
The text was updated successfully, but these errors were encountered: