Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize IO's encoding #90

Merged
merged 11 commits into from
Nov 30, 2017
Merged

Optimize IO's encoding #90

merged 11 commits into from
Nov 30, 2017

Conversation

alexandru
Copy link
Member

@alexandru alexandru commented Nov 28, 2017

Fixes #89 and improves performance.

This implementation is inspired by the internal encoding of the Monix Task — it's not an exact port because Monix's Task is cancellable and has to do more work because of that. Also some minor optimizations were left out for now until further benchmarks (e.g. having a separate state for IO.apply was left out), but the gist is here.

Benchmarking (last update: 2017-11-29)

The PR includes a JMH setup with benchmarks/vPrev and benchmarks/vNext as sub-projects for measuring the performance impact of changes, compared with whatever previous version we want.

In order to run the benchmarks one needs to execute the script:

./benchmarks/run-benchmarks-all

The results will be dumped in bechmarks/results.

ShallowBindBenchmark

This measures a plain tail-recursive flatMap loop. Previous version:

Benchmark                   (size)   Mode  Cnt     Score    Error  Units
ShallowBindBenchmark.async   10000  thrpt   20    49.247 ±  4.749  ops/s
ShallowBindBenchmark.delay   10000  thrpt   20  1589.975 ± 32.753  ops/s
ShallowBindBenchmark.pure    10000  thrpt   20  2556.298 ± 35.352  ops/s

PR changes:

Benchmark                   (size)   Mode  Cnt     Score    Error  Units
ShallowBindBenchmark.async   10000  thrpt   20    82.844 ± 14.171  ops/s
ShallowBindBenchmark.delay   10000  thrpt   20  3619.257 ± 49.135  ops/s
ShallowBindBenchmark.pure    10000  thrpt   20  6291.561 ± 52.802  ops/s

Over twice the throughput.

DeepBindBenchmark

This one measures a non-tail-recursive flatMap loop (like in issue #89). Previous version:

Benchmark                (size)   Mode  Cnt     Score     Error  Units
DeepBindBenchmark.async    3000  thrpt   20     2.704 ±   0.062  ops/s
DeepBindBenchmark.delay    3000  thrpt   20     4.354 ±   0.076  ops/s
DeepBindBenchmark.pure     3000  thrpt   20  2883.429 ± 103.256  ops/s

After PR changes:

Benchmark                (size)   Mode  Cnt     Score     Error  Units
DeepBindBenchmark.async    3000  thrpt   20   375.433 ±  19.089  ops/s
DeepBindBenchmark.delay    3000  thrpt   20  5485.506 ±  71.269  ops/s
DeepBindBenchmark.pure     3000  thrpt   20  7089.853 ± 111.027  ops/s

The differences are dramatic due to memory usage.

AttemptBenchmark

This one measures the performance of attempt, both for the happy path and for handling errors:
Previous version:

Benchmark                     (size)   Mode  Cnt     Score    Error  Units
AttemptBenchmark.errorRaised   10000  thrpt   20   349.944 ±  3.795  ops/s
AttemptBenchmark.happyPath     10000  thrpt   20  2282.834 ± 23.092  ops/s

After the PR changes:

Benchmark                     (size)   Mode  Cnt     Score    Error  Units
AttemptBenchmark.errorRaised   10000  thrpt   20  2150.224 ± 25.714  ops/s
AttemptBenchmark.happyPath     10000  thrpt   20  2284.496 ± 39.242  ops/s

The differences are dramatic for when errors get handled.

HandleErrorBenchmark

This one measure the performance of handleErrorWith, that is optimized in the new version.
So previous version:

Benchmark                         (size)   Mode  Cnt    Score     Error  Units
HandleErrorBenchmark.errorRaised   10000  thrpt   20  282.673 ±   3.525  ops/s
HandleErrorBenchmark.happyPath     10000  thrpt   20  979.575 ± 130.618  ops/s

After PR changes:

Benchmark                         (size)   Mode  Cnt     Score    Error  Units
HandleErrorBenchmark.errorRaised   10000  thrpt   20  1939.384 ± 14.190  ops/s
HandleErrorBenchmark.happyPath     10000  thrpt   20  3068.821 ± 37.069  ops/s

Moving Forward

More optimizations are possible, but at this point this provides a good baseline — other micro-optimizations can come in separate PRs, along with the proof that they work.

@alexandru alexandru changed the title Optimize IO's encoding WIP: Optimize IO's encoding Nov 28, 2017
@alexandru alexandru changed the title WIP: Optimize IO's encoding Optimize IO's encoding Nov 28, 2017
@alexandru alexandru mentioned this pull request Nov 28, 2017
@codecov-io
Copy link

codecov-io commented Nov 28, 2017

Codecov Report

Merging #90 into master will increase coverage by 0.83%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #90      +/-   ##
==========================================
+ Coverage   85.93%   86.77%   +0.83%     
==========================================
  Files          19       20       +1     
  Lines         384      378       -6     
  Branches       21       27       +6     
==========================================
- Hits          330      328       -2     
+ Misses         54       50       -4

@alexandru alexandru changed the title Optimize IO's encoding WIP: Optimize IO's encoding Nov 28, 2017
}

/** Pops the next bind function from the stack, but filters out
* `Mapping.OnError` references, because we know they won't do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Mapping.OnError/IOFrame.ErrorHandler/g

@mpilquist mpilquist self-requested a review November 29, 2017 00:25
Copy link
Member

@mpilquist mpilquist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Interested to see some updated benchmarks.

@pchlupacek
Copy link

@alexandru this is excellent news. Btw does new IO encoding support the flatMapping over result, i.e. flatMap { result: Either[Throwable, A] => ??? } ?

@alexandru
Copy link
Member Author

@pchlupacek the new internal encoding is optimized for flatMap-ing over both successful values and errors, however we are not exposing it in the API.

Monix's Task has a transformWith[B](f: a => Task[B], g: Throwable => Task[B]): Task[B] operator. I would like that in IO as well for efficiency reasons, but this PR is already too big and I'd propose that in a separate PR.

@pchlupacek
Copy link

pchlupacek commented Nov 29, 2017

@alexandru yeah, would be excellent to have that flatMap-ish design if possible, with same parformance as normal flatMap.

@alexandru
Copy link
Member Author

alexandru commented Nov 29, 2017

@pchlupacek note that these changes makes handleErrorWith much, much lighter for the happy path.

A transformWith would be better still, but even without it you should see big improvements with this PR for defensive code.

See the HandleErrorBenchmark, which is relevant here.

@alexandru alexandru changed the title WIP: Optimize IO's encoding Optimize IO's encoding Nov 29, 2017
@alexandru alexandru requested a review from djspiewak November 29, 2017 17:46
@djspiewak
Copy link
Member

@alexandru This looks very impressive at first glance. Thanks for taking the time to do this! I won't have a chance to review it potentially for a few days. @mpilquist has already given his stamp of approval, so if you guys want to move forward, feel free to do so. :-) Otherwise, I'll get to it asap.

@non
Copy link

non commented Nov 29, 2017

👍 This looks great, thanks @alexandru!

@mpilquist
Copy link
Member

@alexandru Could you take a look at the compilation failure on the 2.10 build? Once that's fixed, I think we can go ahead and merge this given Daniel's response from yesterday.

@mpilquist mpilquist merged commit 964e8d0 into typelevel:master Nov 30, 2017
@LukaJCB
Copy link
Member

LukaJCB commented Nov 30, 2017

Amazing work!

@alexandru
Copy link
Member Author

@mpilquist thanks for the review and the merge.

@djspiewak when you have the time, please publish a hash version for testing purposes.

This might not be the last PR for performance optimizations. I'm tormenting myself with some profiling tools from Intel with an UI made in 1994 and I'm doing experiments, but as I said it would be better to introduce further optimizations piecemeal with some proof that they work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants