Modulate response output writes to 16KiB blocks #1790

schlosna · 2022-09-22T14:14:01Z

Before this PR

When Dialogue would execute RequestBody#writeTo, a RequestBody implementation may attempt to write a large buffer to the resulting output stream causing suboptimal cipher throughput due to the large size.

See palantir/hadoop-crypto#586 & palantir/hadoop-crypto#587

After this PR

==COMMIT_MSG==
Modulate response output writes to 16KiB blocks

Block size of 16 KB is small enough to allow cipher implementations to become hot and optimize properly when given large inputs. Otherwise large array writes into a javax.crypto.CipherOutputStream fail to use intrinsified implementations. If 16 KB blocks aren't enough to produce hot methods, the I/O is small and infrequent enough that performance isn't relevant. For more information, see the details around com.sun.crypto.provider.GHASH::processBlocks in
palantir/hadoop-crypto#586 (comment)
==COMMIT_MSG==

Possible downsides?

Block size of 16 KB is small enough to allow cipher implementations to become hot and optimize properly when given large inputs. Otherwise large array writes into a javax.crypto.CipherOutputStream fail to use intrinsified implementations. If 16 KB blocks aren't enough to produce hot methods, the I/O is small and infrequent enough that performance isn't relevant. For more information, see the details around com.sun.crypto.provider.GHASH::processBlocks in palantir/hadoop-crypto#586 (comment)

changelog-app · 2022-09-22T14:14:05Z

Generate changelog in `changelog/@unreleased`

Type

Description

Modulate response output writes to 16KiB blocks

Block size of 16 KB is small enough to allow cipher implementations to become hot and optimize properly when given large inputs. Otherwise large array writes into a javax.crypto.CipherOutputStream fail to use intrinsified implementations. If 16 KB blocks aren't enough to produce hot methods, the I/O is small and infrequent enough that performance isn't relevant. For more information, see the details around com.sun.crypto.provider.GHASH::processBlocks in
palantir/hadoop-crypto#586 (comment)

Check the box to generate changelog(s)

Generate changelog entry

schlosna · 2022-09-22T14:16:51Z

...ache-hc5-client/src/main/java/com/palantir/dialogue/hc5/ApacheHttpClientBlockingChannel.java

+     * in order to prevent degraded performance on large buffers as described in
+     * <a href="https://github.com/palantir/hadoop-crypto/pull/586">hadoop-crypto#586</a>.
+     */
+    static final class ModulatingOutputStream extends FilterOutputStream {


Adjusted naming from ChunkingOutputStream to ModulatingOutputStream to avoid confusion with multipart chunks since this is just wrapping & writing to a single OutputStream in 16KB blocks

carterkozak · 2022-09-22T14:23:36Z

...ache-hc5-client/src/main/java/com/palantir/dialogue/hc5/ApacheHttpClientBlockingChannel.java

+                remaining -= toWrite;
+            }
+        }
+    }


Might be helpful to override write(int) as a simple passthrough to out and avoid array allocation

done, though I don't think it is actually necessary as FilterOutputStream#write(int) just does out.write(b);

Ah right, sorry for the runaround!

carterkozak

Thanks!

svc-autorelease · 2022-09-22T17:54:43Z

Released 3.67.0

Allow for optimization when underlying input stream (such as ByteArrayInputStream, ChannelInputStream) overrides transferTo(OutputStream) to avoid extra array allocations and copy larger chunks at a time (e.g. allowing 16KiB chunks via ApacheHttpClientBlockingChannel.ModulatingOutputStream from #1790). When moving to JDK 21+, this will also enable 16KiB byte chunk copies via InputStream.transferTo(OutputStream) per JDK-8299336, where as on JDK < 21 and when using Guava ByteStreams.copy 8KiB byte chunk copies are used. References: * palantir/hadoop-crypto#586 * https://bugs.openjdk.org/browse/JDK-8299336 * https://bugs.openjdk.org/browse/JDK-8067661 * https://bugs.openjdk.org/browse/JDK-8265891 * https://bugs.openjdk.org/browse/JDK-8273038 * https://bugs.openjdk.org/browse/JDK-8279283 * https://bugs.openjdk.org/browse/JDK-8296431

…eam (#1983) BinaryRequestBody and ContentBody use InputStream.transferToOutputStream Allow for optimization when underlying input stream (such as `ByteArrayInputStream`, `ChannelInputStream`) overrides `transferTo(OutputStream)` to avoid extra array allocations and copy larger chunks at a time (e.g. allowing 16KiB chunks via `ApacheHttpClientBlockingChannel.ModulatingOutputStream` from #1790). When running on JDK 21+, this also enables 16KiB byte chunk copies via `InputStream.transferTo(OutputStream)` per JDK-8299336, where as on JDK < 21 and when using Guava `ByteStreams.copy` 8KiB byte chunk copies are used. References: * palantir/hadoop-crypto#586 * https://bugs.openjdk.org/browse/JDK-8299336 * https://bugs.openjdk.org/browse/JDK-8067661 * https://bugs.openjdk.org/browse/JDK-8265891 * https://bugs.openjdk.org/browse/JDK-8273038 * https://bugs.openjdk.org/browse/JDK-8279283 * https://bugs.openjdk.org/browse/JDK-8296431

schlosna requested a review from carterkozak September 22, 2022 14:14

probot-autolabeler bot added the autorelease label Sep 22, 2022

Add generated changelog entries

035ef15

schlosna commented Sep 22, 2022

View reviewed changes

carterkozak reviewed Sep 22, 2022

View reviewed changes

Add write(int)

db3eb2e

schlosna marked this pull request as ready for review September 22, 2022 15:24

schlosna added the merge when ready label Sep 22, 2022

schlosna requested a review from carterkozak September 22, 2022 17:49

carterkozak approved these changes Sep 22, 2022

View reviewed changes

bulldozer-bot bot merged commit 4d0cc04 into develop Sep 22, 2022

bulldozer-bot bot deleted the ds/chunks branch September 22, 2022 17:54

schlosna mentioned this pull request Jul 28, 2023

BinaryRequestBody and ContentBody use InputStream.transferTo(OutputStream) #1983

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modulate response output writes to 16KiB blocks #1790

Modulate response output writes to 16KiB blocks #1790

schlosna commented Sep 22, 2022

changelog-app bot commented Sep 22, 2022 •

edited by schlosna

Loading

schlosna Sep 22, 2022

carterkozak Sep 22, 2022

schlosna Sep 22, 2022

carterkozak Sep 22, 2022

carterkozak left a comment

svc-autorelease commented Sep 22, 2022

Modulate response output writes to 16KiB blocks #1790

Modulate response output writes to 16KiB blocks #1790

Conversation

schlosna commented Sep 22, 2022

Before this PR

After this PR

Possible downsides?

changelog-app bot commented Sep 22, 2022 • edited by schlosna Loading

Generate changelog in changelog/@unreleased

schlosna Sep 22, 2022

Choose a reason for hiding this comment

carterkozak Sep 22, 2022

Choose a reason for hiding this comment

schlosna Sep 22, 2022

Choose a reason for hiding this comment

carterkozak Sep 22, 2022

Choose a reason for hiding this comment

carterkozak left a comment

Choose a reason for hiding this comment

svc-autorelease commented Sep 22, 2022

changelog-app bot commented Sep 22, 2022 •

edited by schlosna

Loading

Generate changelog in `changelog/@unreleased`