Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few more chain optimizations #4170

Merged
merged 11 commits into from
Apr 11, 2022
Merged

A few more chain optimizations #4170

merged 11 commits into from
Apr 11, 2022

Conversation

johnynek
Copy link
Contributor

@johnynek johnynek commented Apr 9, 2022

cc @bplommer

  1. reorganize some matches in order I guess they are more likely to match
  2. tighten the type on some loops so we don't have to check against Empty
  3. implement foldLeft the same way as length (this was not a benchmark win).
  4. override foldMap to use the iterator (to leverage Monoid.combineAll).

while (iter.hasNext) { result = f(result, iter.next()) }
result
@annotation.tailrec
def loop(h: Chain.NonEmpty[A], tail: List[Chain.NonEmpty[A]], acc: B): B =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by passing the destructured list, we avoid an allocation in the Append case (which is common since Append is the way we combine two chains).

// of the same code as foldLeft
@annotation.tailrec
def loop(head: Chain.NonEmpty[A], tail: List[Chain.NonEmpty[A]], acc: Long): Int =
if (acc < 0L) 1 // head is nonempty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not missing something, then if (acc <= 0L) 1 should work as well.
Because we know that the head is not empty, so seems we can skip looking further if acc == 0 is given.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. Good call. Updated.

@bplommer
Copy link
Contributor

Would there be any benefit to leaving these abstract in Chain and implementing them separately in Chain.Empty and Chain.NonEmpty?

@johnynek
Copy link
Contributor Author

I ran the benchmarks (and added two). Here are the relevant numbers:

PR:

[info] Benchmark                                  Mode  Cnt          Score           Error  Units
[info] ChainBench.foldLeftLargeChain             thrpt    5         54.823 ±        14.480  ops/s
[info] ChainBench.foldLeftLargeList              thrpt    5        152.206 ±         5.792  ops/s
[info] ChainBench.foldLeftSmallChain             thrpt    5   74133841.930 ±    400335.145  ops/s
[info] ChainBench.foldLeftSmallList              thrpt    5   81573852.962 ±    281124.077  ops/s
[info] ChainBench.lengthLargeChain               thrpt    5     208482.471 ±      1696.443  ops/s
[info] ChainBench.lengthLargeList                thrpt    5        529.225 ±         0.394  ops/s
[info] ChainBench.reverseLargeChain              thrpt    5      12154.755 ±      3341.891  ops/s
[info] ChainBench.reverseLargeList               thrpt    5        131.259 ±       169.766  ops/s

main:
[info] Benchmark                                  Mode  Cnt          Score          Error  Units
[info] ChainBench.foldLeftLargeChain             thrpt    5        132.305 ±        3.237  ops/s
[info] ChainBench.foldLeftLargeList              thrpt    5        152.748 ±        3.895  ops/s
[info] ChainBench.foldLeftSmallChain             thrpt    5   76658960.380 ±   237704.833  ops/s
[info] ChainBench.foldLeftSmallList              thrpt    5   81733659.606 ±    45817.907  ops/s
[info] ChainBench.lengthLargeChain               thrpt    5      12564.835 ±    21329.022  ops/s
[info] ChainBench.lengthLargeList                thrpt    5        536.412 ±       10.714  ops/s
[info] ChainBench.reverseLargeChain              thrpt    5        175.292 ±       49.151  ops/s
[info] ChainBench.reverseLargeList               thrpt    5        174.277 ±       21.204  ops/s

so it looks like the reverse and length changes are big wins, but the foldLeft change is not an improvement. I'll revert that part.

Copy link
Contributor

@satorg satorg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really cool, thanks!

@satorg
Copy link
Contributor

satorg commented Apr 11, 2022

Would there be any benefit to leaving these abstract in Chain and implementing them separately in Chain.Empty and Chain.NonEmpty?

AFAIU, implementing all the logic in an abstract class rather than spreading it among descendants may benefit from code locality i.e. it should be less likely that an instruction cache will be reloading while executing a method of the abstract class. Perhaps @johnynek may correct me if I'm wrong about that.

@johnynek
Copy link
Contributor Author

I really don't know. Of course experiments are the best way to find out but even our current benchmarks might steer us wrong depending on the frequency of Empty, frequency or wrapping and structure of concatenation.

@johnynek johnynek merged commit ad318c3 into main Apr 11, 2022
@armanbilge armanbilge deleted the oscar/20220409_chain_opts branch April 11, 2022 01:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants