please reconsider the addition of ndarray and hashbrown as dependencies #34

BurntSushi · 2019-04-26T12:22:33Z

I left a comment here on the commit in which these deps were introduced, but I'm not sure if it was seen: d6717db#r33186236

So I'd like to open an issue to increase its visibility. In particular, it would be great if strsim could remain fairly lightweight since it is depended on by a bunch of crates via clap. I'm hoping that we, as an ecosystem, can be a bit more judicious in adding dependencies to crates.

For hashbrown, which is awesome, it is going to be included in the standard library very soon. Folks upgrading their Rust version will automatically get this optimization soon enough. I don't think it's worth adding it as an explicit dependency for a small gain for users on older versions of Rust, because it impacts compilation times for everyone.

For ndarray, it looks like it was just used for some small convenience. I don't think it's worth bringing ndarray in (as excellent as it is) along with all of its dependencies just for that.

The text was updated successfully, but these errors were encountered:

* [performance] Improve the performance of damerau_levenshtein My tests show the new version is approximately 2.5 times faster than the old ones. Most of the gains come from using hashbrown for the hashmap. * Switch back to rust 2015 to fix CI * Increase minimum required rust version to 1.31 * Simplify the levenshtein function

dguo · 2019-04-26T12:51:31Z

Hi, thanks for bringing this up. I did consider the usual trade offs with taking on dependencies. The performance gain seemed worth it to me, but I didn't know that hashbrown was going to be merged into the standard library until I saw it on Reddit recently. Based on that, I agree it doesn't need to be a dependency here.

For ndarray, that was laziness on my part. I should have separated it out to see if it alone would improve performance significantly. Based on the first comment in #32, it probably won't. I will try it for myself this weekend to verify. Apologies for not doing so before merging. After that, I will pull hashbrown out and ndarray as well, barring any surprising results from my testing.

@lovasoa, please let me know if you have any objections or concerns.

@BurntSushi, I'll try to be more diligent in the future, as I agree that lower level libraries should require a correspondingly higher bar for taking on dependencies.

BurntSushi · 2019-04-26T12:52:39Z

Thank you so much! :-)

dguo · 2019-04-30T03:19:14Z

From my testing, it appears that using ndarray does result in a speedup of about 17%.

0.9.0:

test benches::bench_damerau_levenshtein            ... bench:      20,601 ns/iter (+/- 474)

with ndarray:

test benches::bench_damerau_levenshtein            ... bench:      17,160 ns/iter (+/- 576)

17% is a bigger value than I expected, but it still doesn't seem large enough to justify all the sub-dependencies, especially considering only one of the metrics is affected. I'm going to sleep on it.

BurntSushi · 2019-04-30T10:37:31Z

Can you point me in the right direction to re-run those benchmarks? I can take a look. It is surprising to me that ndarray is responsible for that speedup.

lovasoa · 2019-04-30T13:33:44Z

A compromise between adding a dependency to ndarray and using a vector of vectors may be to use a single fixed size array, and make the necessary computations at array access time, replacing

distances[i][j]

by

distances[i*width + j]

@lovasoa

Instead of representing a 2x2 grid with a vector of vectors, just use a single vector to improve performance. We can do this since the dimensions are fixed. This method was suggested by @lovasoa as an alternative to adding a dependency on the ndarray crate. In my benchmark testing, the new approach is about as fast using ndarray. On my machine, the original approach takes about 22,000 ns/iter, whereas the new approach takes about 17,000 ns/iter. See #34 for more context.

dguo · 2019-05-05T21:17:52Z

@BurntSushi, you can run cargo +nightly bench in the ndarray branch that I just pushed up and compare it to 0.9.0.

@lovasoa, nice suggestion. I just tried using a single, flat vector (see the flatten-vectors branch), and it's as fast as using ndarray. Looks like that's the way to go, and we can drop ndarray while retaining the performance boost.

I suppose ndarray is doing something similar under the hood, rather than the naive, vector of vectors approach that I did.

Related to #34

We can drop a dependency while retaining the performance boost. Related to #34

dguo · 2019-05-09T05:31:30Z

Ok, I've published v0.9.2, which removes hashbrown and switches out ndarray for the one vector approach. Thanks everyone!

BurntSushi · 2019-05-09T10:40:02Z

Thank you! :-)

dguo added a commit that referenced this issue May 9, 2019

Revert back to the standard library hashmap

687dc6d

Related to #34

dguo added a commit that referenced this issue May 9, 2019

Use a single vector instread of ndarray in Damerau-Levenshtein

dfba131

We can drop a dependency while retaining the performance boost. Related to #34

dguo closed this as completed May 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

please reconsider the addition of ndarray and hashbrown as dependencies #34

please reconsider the addition of ndarray and hashbrown as dependencies #34

BurntSushi commented Apr 26, 2019

dguo commented Apr 26, 2019

BurntSushi commented Apr 26, 2019

dguo commented Apr 30, 2019

BurntSushi commented Apr 30, 2019

lovasoa commented Apr 30, 2019

dguo commented May 5, 2019 •

edited

Loading

dguo commented May 9, 2019

BurntSushi commented May 9, 2019

please reconsider the addition of ndarray and hashbrown as dependencies #34

please reconsider the addition of ndarray and hashbrown as dependencies #34

Comments

BurntSushi commented Apr 26, 2019

dguo commented Apr 26, 2019

BurntSushi commented Apr 26, 2019

dguo commented Apr 30, 2019

BurntSushi commented Apr 30, 2019

lovasoa commented Apr 30, 2019

dguo commented May 5, 2019 • edited Loading

dguo commented May 9, 2019

BurntSushi commented May 9, 2019

dguo commented May 5, 2019 •

edited

Loading