Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Override and short-circuit Chunk.flatten #2990

Merged
merged 1 commit into from
Sep 22, 2022

Conversation

CremboC
Copy link
Contributor

@CremboC CremboC commented Sep 22, 2022

Override and short-circuit Chunk.flatten method. This should help out http4s when parsing JSON bodies via circe.

Currently JSON is parsed via the follow method:
https://github.com/http4s/http4s/blob/fbcbcf3e23031a5ebba3c8f24b3486f0f5d956a8/circe/src/main/scala/org/http4s/circe/CirceInstances.scala#L55
Which calls
https://github.com/http4s/http4s/blob/fbcbcf3e23031a5ebba3c8f24b3486f0f5d956a8/core/shared/src/main/scala/org/http4s/EntityDecoder.scala#L208

The latter method calls Chunk.flatten.

From what I can tell (this one is for http4s people), most requests (depending on size) will contain one chunk, so the short-circuit in this PR should reduce allocations and reduce CPU usage quite a lot.

A brief jmh benchmark shows this obviously has a big impact, as now flatten in certain cases is O(1) in both time and space. Note I've only tested two size: a chunk of 1 chunk, and a chunk of 5 chunks.

before

[info] Benchmark                                                 (chunkCount)  (chunkSize)   Mode  Cnt         Score          Error   Units
[info] ChunksBenchmark.flatten                                              1         4096  thrpt    3  98347244.843 ± 51950933.224   ops/s
[info] ChunksBenchmark.flatten:·gc.alloc.rate                               1         4096  thrpt    3      7145.822 ±     3767.669  MB/sec
[info] ChunksBenchmark.flatten:·gc.alloc.rate.norm                          1         4096  thrpt    3        80.011 ±        0.001    B/op
[info] ChunksBenchmark.flatten:·gc.churn.G1_Eden_Space                      1         4096  thrpt    3      7158.089 ±     3839.141  MB/sec
[info] ChunksBenchmark.flatten:·gc.churn.G1_Eden_Space.norm                 1         4096  thrpt    3        80.148 ±        1.800    B/op
[info] ChunksBenchmark.flatten:·gc.churn.G1_Survivor_Space                  1         4096  thrpt    3         0.029 ±        0.011  MB/sec
[info] ChunksBenchmark.flatten:·gc.churn.G1_Survivor_Space.norm             1         4096  thrpt    3        ≈ 10⁻³                   B/op
[info] ChunksBenchmark.flatten:·gc.count                                    1         4096  thrpt    3      1484.000                 counts
[info] ChunksBenchmark.flatten:·gc.time                                     1         4096  thrpt    3      1173.000                     ms
[info] ChunksBenchmark.flatten                                              5         4096  thrpt    3  16869148.061 ±  2009678.811   ops/s
[info] ChunksBenchmark.flatten:·gc.alloc.rate                               5         4096  thrpt    3      6741.347 ±      822.955  MB/sec
[info] ChunksBenchmark.flatten:·gc.alloc.rate.norm                          5         4096  thrpt    3       440.063 ±        0.003    B/op
[info] ChunksBenchmark.flatten:·gc.churn.G1_Eden_Space                      5         4096  thrpt    3      6752.349 ±      848.276  MB/sec
[info] ChunksBenchmark.flatten:·gc.churn.G1_Eden_Space.norm                 5         4096  thrpt    3       440.780 ±        5.449    B/op
[info] ChunksBenchmark.flatten:·gc.churn.G1_Survivor_Space                  5         4096  thrpt    3         0.037 ±        0.018  MB/sec
[info] ChunksBenchmark.flatten:·gc.churn.G1_Survivor_Space.norm             5         4096  thrpt    3         0.002 ±        0.001    B/op
[info] ChunksBenchmark.flatten:·gc.count                                    5         4096  thrpt    3      1400.000                 counts
[info] ChunksBenchmark.flatten:·gc.time                                     5         4096  thrpt    3      1114.000                     ms

after

[info] Benchmark                                                 (chunkCount)  (chunkSize)   Mode  Cnt          Score            Error   Units
[info] ChunksBenchmark.flatten                                              1         4096  thrpt    3  964210584.460 ± 1880426013.685   ops/s
[info] ChunksBenchmark.flatten:·gc.alloc.rate                               1         4096  thrpt    3         ≈ 10⁻⁴                   MB/sec
[info] ChunksBenchmark.flatten:·gc.alloc.rate.norm                          1         4096  thrpt    3         ≈ 10⁻⁷                     B/op
[info] ChunksBenchmark.flatten:·gc.count                                    1         4096  thrpt    3            ≈ 0                   counts
[info] ChunksBenchmark.flatten                                              5         4096  thrpt    3   17118101.722 ±    6470972.580   ops/s
[info] ChunksBenchmark.flatten:·gc.alloc.rate                               5         4096  thrpt    3       6714.921 ±       2548.729  MB/sec
[info] ChunksBenchmark.flatten:·gc.alloc.rate.norm                          5         4096  thrpt    3        432.061 ±          0.004    B/op
[info] ChunksBenchmark.flatten:·gc.churn.G1_Eden_Space                      5         4096  thrpt    3       6726.793 ±       2437.002  MB/sec
[info] ChunksBenchmark.flatten:·gc.churn.G1_Eden_Space.norm                 5         4096  thrpt    3        432.831 ±         11.463    B/op
[info] ChunksBenchmark.flatten:·gc.churn.G1_Survivor_Space                  5         4096  thrpt    3          0.034 ±          0.025  MB/sec
[info] ChunksBenchmark.flatten:·gc.churn.G1_Survivor_Space.norm             5         4096  thrpt    3          0.002 ±          0.001    B/op
[info] ChunksBenchmark.flatten:·gc.count                                    5         4096  thrpt    3       1395.000                   counts
[info] ChunksBenchmark.flatten:·gc.time                                     5         4096  thrpt    3       1103.000                       ms

@diesalbla
Copy link
Contributor

diesalbla commented Sep 22, 2022

There are other methods that would be worth optimising for singleton Chunk, as well.

@mpilquist
Copy link
Member

Awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants