Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an iterator version of wrap #79

Merged
merged 3 commits into from
Aug 30, 2017
Merged

Add an iterator version of wrap #79

merged 3 commits into from
Aug 30, 2017

Conversation

hcpl
Copy link
Contributor

@hcpl hcpl commented Aug 24, 2017

  • Add #[derive(Clone)] for word splitter types defined in this crate (hyphenation::Corpus already has a Clone impl);
  • Add a Clone requirement (which should be considered a breaking change but I think there are no custom word splitters to worry about);
  • Adapt docs.

@hcpl
Copy link
Contributor Author

hcpl commented Aug 24, 2017

@mgeisler Sorry for the wait, I was messing with docs at the time.

@hcpl
Copy link
Contributor Author

hcpl commented Aug 24, 2017

Some questions:

  • Does WrapIter need to behave like a fused iterator? Safety and all, but this implementation comes with a cost of a finished: bool field and Fuse iterator wrapper which also consumes additional memory, though I guess they are negligible enough to ignore.
  • Should I add new unit-tests (new doctests are already there)?
  • Are there any other places where wrap_iter should be mentioned?

@hcpl
Copy link
Contributor Author

hcpl commented Aug 24, 2017

So WrapIter can be fused without Fuse!

src/lib.rs Outdated

// String to wrap.
source: &'a str,
// Fused CharIndices iterator over self.source.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Remove "Fused"

@mgeisler
Copy link
Owner

Hey @hcpl! Thanks very much for this -- the first outside contribution :-) I think it looks very nice and would like to merge it.

However, I just tried to run the benchmarks and compared them with master. Something is off on the benchmarks that do hyphenation:

name                  before ns/iter  after ns/iter  diff ns/iter    diff %  speedup 
 fill_100              462             472                      10     2.16%   x 0.98 
 fill_200              964             1,031                    67     6.95%   x 0.94 
 fill_400              1,951           2,037                    86     4.41%   x 0.96 
 fill_800              3,942           4,129                   187     4.74%   x 0.95 
 hyphenation_fill_100  488             41,512               41,024  8406.56%   x 0.01 
 hyphenation_fill_200  1,426           43,209               41,783  2930.08%   x 0.03 
 hyphenation_fill_400  3,121           45,414               42,293  1355.11%   x 0.07 
 hyphenation_fill_800  7,049           50,408               43,359   615.11%   x 0.14 
 wrap_100              432             443                      11     2.55%   x 0.98 
 wrap_200              913             959                      46     5.04%   x 0.95 
 wrap_400              1,883           1,956                    73     3.88%   x 0.96 
 wrap_800              3,824           3,952                   128     3.35%   x 0.97 

The comparison was simply:

$ git checkout master && cargo bench --all-features > before
$ git checkout - && cargo bench --all-features > after
$ cargo benchcmp before after

I haven't looked into why this can be yet -- maybe you have an idea?

@hcpl
Copy link
Contributor Author

hcpl commented Aug 29, 2017

@mgeisler I believe the slowdown is caused by cloning, because both hyphenation::Patterns and hyphenation::Exceptions contain a HashMap field, so cloning is not cheap. But deducing from benchmarks, it looks like the cloning stage itself consumes only 40 microseconds, so it's passable for me.

In case if cloning with such a slowdown is not desirable, we can internally utilize a borrowing variant of WrapIter instead of owning for {Wrapper::}wrap and {Wrapper::}fill. Alternatively, we can have both {Wrapper::}wrap_iter (as borrowing) and {Wrapper::}wrap_iter_owned as publicly available variants. Personally I prefer the second solution, because it gives flexibility.

And there can be consuming Wrapper::into_wrap_iter which is owned and doesn't require additional cloning too (useful to implement wrap_iter function to eliminate unnecessary cloning).

@hcpl
Copy link
Contributor Author

hcpl commented Aug 29, 2017

Now iterator-based methods of Wrapper are split into 3 variants:

  • wrap_iter() - owns a copy of Wrapper by cloning internal data;
  • wrap_iter_borrow() - borrows a Wrapper (as a result, it cannot outlive the Wrapper);
  • into_wrap_iter() - owns a Wrapper by consuming it.

Wrapper::fill() and Wrapper::wrap() now both use Wrapper::wrap_iter_borrow().

Latest benchmark results:

 name                  before ns/iter  after ns/iter  diff ns/iter  diff %  speedup 
 fill_100              2,142           2,232                    90   4.20%   x 0.96 
 fill_200              4,429           4,765                   336   7.59%   x 0.93 
 fill_400              8,983           9,473                   490   5.45%   x 0.95 
 fill_800              18,283          19,518                1,235   6.75%   x 0.94 
 hyphenation_fill_100  2,162           2,185                    23   1.06%   x 0.99 
 hyphenation_fill_200  6,633           6,783                   150   2.26%   x 0.98 
 hyphenation_fill_400  15,092          15,481                  389   2.58%   x 0.97 
 hyphenation_fill_800  36,268          37,451                1,183   3.26%   x 0.97 
 wrap_100              2,035           2,206                   171   8.40%   x 0.92 
 wrap_200              4,168           4,674                   506  12.14%   x 0.89 
 wrap_400              8,344           9,327                   983  11.78%   x 0.89 
 wrap_800              17,889          19,347                1,458   8.15%   x 0.92

@mgeisler
Copy link
Owner

Awesome that you found a good solution so fast! I just ran the benchmarks on my machine and somehow the hyphenation benchmarks are faster now:

 name                  before ns/iter  after ns/iter  diff ns/iter  diff %  speedup 
 fill_100              473             485                      12   2.54%   x 0.98 
 fill_200              989             1,060                    71   7.18%   x 0.93 
 fill_400              1,982           2,141                   159   8.02%   x 0.93 
 fill_800              3,940           4,346                   406  10.30%   x 0.91 
 hyphenation_fill_100  500             460                     -40  -8.00%   x 1.09 
 hyphenation_fill_200  1,479           1,432                   -47  -3.18%   x 1.03 
 hyphenation_fill_400  3,231           3,127                  -104  -3.22%   x 1.03 
 hyphenation_fill_800  7,258           7,152                  -106  -1.46%   x 1.01 
 wrap_100              441             480                      39   8.84%   x 0.92 
 wrap_200              939             1,054                   115  12.25%   x 0.89 
 wrap_400              1,933           2,108                   175   9.05%   x 0.92 
 wrap_800              3,915           4,287                   372   9.50%   x 0.91 

That's pretty cool and the other slowdowns are fine too :-)

@mgeisler mgeisler merged commit cd80725 into mgeisler:master Aug 30, 2017
@mgeisler
Copy link
Owner

Thanks for working on this! I was wondering if all three methods are really needed:

  • into_wrap_iter
  • wrap_iter
  • wrap_iter_borrow

Would it not be enough to just offer the wrap_iter_borrow method (which could then be called wrap_iter)? It does not cloning, so it is fast, and callers can probably hold onto the Wrapper while they iterate over the lines. I've seen that other collections have a bunch of methods like you added here -- so I'm sure it's nice and idiomatic Rust.

However, it seems like a lot of infrastructure for something that should be simple :-) As I said somewhere, I'm still new to Rust (coming from Python), so I'm still trying to get used to the style of the language.

@hcpl
Copy link
Contributor Author

hcpl commented Aug 31, 2017

somehow the hyphenation benchmarks are faster now:

For evidence I just ran benchmarks again:

 name                  before ns/iter  after ns/iter  diff ns/iter  diff %  speedup 
 fill_100              2,319           2,346                    27   1.16%   x 0.99 
 fill_200              4,838           4,857                    19   0.39%   x 1.00 
 fill_400              9,580           9,422                  -158  -1.65%   x 1.02 
 fill_800              20,538          19,459               -1,079  -5.25%   x 1.06 
 hyphenation_fill_100  2,224           2,193                   -31  -1.39%   x 1.01 
 hyphenation_fill_200  6,844           7,049                   205   3.00%   x 0.97 
 hyphenation_fill_400  15,731          15,972                  241   1.53%   x 0.98 
 hyphenation_fill_800  37,560          38,715                1,155   3.08%   x 0.97 
 wrap_100              2,246           2,224                   -22  -0.98%   x 1.01 
 wrap_200              4,780           4,759                   -21  -0.44%   x 1.00 
 wrap_400              9,415           9,654                   239   2.54%   x 0.98 
 wrap_800              19,828          19,355                 -473  -2.39%   x 1.02

So for something countable in nano-/microseconds, I wouldn't pay much attention to speedups/slowdowns with less than 25% of difference. They just fall within the margin of error, IMO.

Would it not be enough to just offer the wrap_iter_borrow method (which could then be called wrap_iter)? It does not cloning, so it is fast, and callers can probably hold onto the Wrapper while they iterate over the lines. I've seen that other collections have a bunch of methods like you added here -- so I'm sure it's nice and idiomatic Rust.

At least, we need the into_wrap_iter method if we want the wrap_iter function around because of ownership rules. And because of that I thought a variant that outlives but does not consume the Wrapper would be a good idea - thus a cloning wrap_iter appeared.

However, it seems like a lot of infrastructure for something that should be simple :-)

That's what I particularly like about Rust, because "a lot of infrastructure" in a way that Rust API tends to have is an example of "Explicit is better than implicit." done well. Users are presented with different options, so that they are able to pull maximum performance out of every possible use case. But that doesn't necessarily mean limiting oneselves only to "fastest" solutions -- users are free to do however they want if that helps to keep their code simple, even if doing so needs sacrificing a bit (or a lot) of performance.

Last but not least, Rust (and accompanying rustdoc documentation) has a type system expressive enough to assist in wrapping one's head around such an API. And even without looking up types you can guess with 95% of confidence what does something mean just by looking at its name if it is a usage of some idiom in Rust thanks to community-supported API guidelines (that's true at least for popular libraries).

As I said somewhere, I'm still new to Rust (coming from Python), so I'm still trying to get used to the style of the language.

You're welcome! :D

@mgeisler
Copy link
Owner

mgeisler commented Sep 2, 2017

Thanks for the explanations, much appreciated!

I think I'll try renaming wrap_iter_borrow to just wrap_iter (and remove the cloning wrap_iter in the process). My thinking is that the caller can clone the Wrapper if needed and use into_wrap_iter if the iterator and the wrapper must have independent lifetimes:

let lines = wrapper.clone().into_wrap_iter(text);

I think that will be rare and that a simple borrowing iterator is enough in most cases.

I'll put up a PR and add you as a reviewer.

@mgeisler
Copy link
Owner

mgeisler commented Sep 2, 2017

Fixes #59. Thanks @hcpl!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants