Add additional transport compression options #74587

Tim-Brooks · 2021-06-25T00:44:16Z

This commit is related to #73497. It adds two new settings. The first setting
is transport.compression_scheme. This setting allows the user to
configure LZ4 or DEFLATE as the transport compression. Additionally, it
modifies transport.compress to support the value indexing_data. When
this setting is set to indexing_data only messages which are primarily
composed of raw source data will be compressed. This is bulk, operations
recovery, and shard changes messages.

elasticmachine · 2021-06-25T00:44:19Z

Pinging @elastic/es-distributed (Team:Distributed)

henningandersen

I added a number of detailed comments. In addition, I would like to have the documentation added in this PR, marking the new options experimental. Finally, I wonder if there is additional testing we can do (added one such comment only but did not investigate more deeply which tests to add).

server/src/main/java/org/elasticsearch/transport/DeflateTransportDecompressor.java

henningandersen · 2021-06-28T06:58:18Z

server/src/main/java/org/elasticsearch/transport/Lz4TransportDecompressor.java

+        return bytesConsumed;
+    }
+
+    private int decodeBlock(BytesReference reference) throws IOException {


I would think we need a test or assertion to verify at least one of:

Randomized compression and decompression works.

The java library is the expected version

Ideally we would split this method into a class that can be contributed upstream to the the lz4 library (can do in follow-up).

The java library is the expected version

I'm not sure what you meant here.

The testing added (plus the ones I suggested) should suffice here. I was only worried that we would not catch discrepancies fast if the library is updated to a newer version which changes the block format. And one way to catch that would be to assert something on the library version. But I prefer adding randomized testing so all good here.

server/src/main/java/org/elasticsearch/transport/Lz4TransportDecompressor.java

server/src/main/java/org/elasticsearch/transport/CompressionScheme.java

server/src/test/java/org/elasticsearch/transport/RemoteClusterServiceTests.java

server/src/test/java/org/elasticsearch/transport/RemoteConnectionStrategyTests.java

henningandersen · 2021-06-28T08:39:33Z

test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java

@@ -440,6 +441,12 @@ private static Settings getRandomNodeSettings(long seed) {
        Random random = new Random(seed);
        Builder builder = Settings.builder();
        builder.put(TransportSettings.TRANSPORT_COMPRESS.getKey(), rarely(random));
+        builder.put(TransportSettings.TRANSPORT_COMPRESS_RAW_DATA.getKey(), random.nextBoolean());


I imagine we use rarely above to avoid the overhead of compression. Can you run the full test suite with raw data compression on vs compression off to verify the penalty on builds? That would also give some indication of the overhead of lz4/raw_data.

I wonder if we only want LZ4/raw_data to run "every second run" and the other combinations (like deflate/raw_data and full compression) to only run rarely.

I put all compression back to rarely. I will test this out a bit soon. Hopefully this is not a blocker as it is easy to fix if build times are impacted?

OK, let us see how it goes, would be good to do a bit of checking after this goes in.

server/src/main/java/org/elasticsearch/transport/OutboundMessage.java

server/src/main/java/org/elasticsearch/transport/Lz4TransportDecompressor.java

original-brownbear

Some quick comments, unfortunately I didn't get through all of it today.

server/src/main/java/org/elasticsearch/transport/RawDataTransportRequest.java

server/src/main/java/org/elasticsearch/transport/Lz4TransportDecompressor.java

original-brownbear · 2021-06-28T18:44:50Z

server/src/main/java/org/elasticsearch/transport/Lz4TransportDecompressor.java

+     */
+    static final int COMPRESSION_LEVEL_BASE = 10;
+
+    static final int MIN_BLOCK_SIZE = 64;


This seems unused?

original-brownbear · 2021-06-28T18:44:59Z

server/src/main/java/org/elasticsearch/transport/Lz4TransportDecompressor.java

+
+    static final int MIN_BLOCK_SIZE = 64;
+    static final int MAX_BLOCK_SIZE = 1 << COMPRESSION_LEVEL_BASE + 0x0F;   //  32 M
+    static final int DEFAULT_BLOCK_SIZE = 1 << 16;  // 64 KB


Unused as well?

server/src/main/java/org/elasticsearch/transport/Lz4TransportDecompressor.java

server/src/main/java/org/elasticsearch/transport/OutboundMessage.java

server/src/main/java/org/elasticsearch/transport/Lz4TransportDecompressor.java

original-brownbear · 2021-06-28T19:17:17Z

server/src/main/java/org/elasticsearch/transport/TransportDecompressor.java

+        }
+        byte firstByte = bytes.get(0);
+        byte[] header;
+        if (firstByte == CompressionScheme.DEFLATE_HEADER[0]) {


I wonder if we should just interpret the headers as int, then we don't have to have a mutable public static byte[] and can just switch on that int here (faster and less code)?

original-brownbear · 2021-06-28T19:17:17Z

server/src/main/java/org/elasticsearch/transport/TransportDecompressor.java

+        }
+        byte firstByte = bytes.get(0);
+        byte[] header;
+        if (firstByte == CompressionScheme.DEFLATE_HEADER[0]) {


I wonder if we should just interpret the header as int, then we don't have to have a mutable public static byte[] and can just switch on that int here (faster and less code)?

henningandersen

LGTM. I left a number of smaller things to address.

docs/reference/modules/transport.asciidoc

qa/full-cluster-restart/src/test/java/org/elasticsearch/upgrades/FullClusterRestartIT.java

server/src/main/java/org/elasticsearch/transport/NetworkMessage.java

docs/reference/modules/transport.asciidoc

server/src/main/java/org/elasticsearch/transport/RawDataTransportRequest.java

server/src/test/java/org/elasticsearch/transport/OutboundHandlerTests.java

test/framework/src/main/java/org/elasticsearch/transport/AbstractSimpleTransportTestCase.java

henningandersen · 2021-06-29T10:26:47Z

server/src/main/java/org/elasticsearch/transport/Lz4TransportDecompressor.java

+        return bytesConsumed;
+    }
+
+    private int decodeBlock(BytesReference reference) throws IOException {


The testing added (plus the ones I suggested) should suffice here. I was only worried that we would not catch discrepancies fast if the library is updated to a newer version which changes the block format. And one way to catch that would be to assert something on the library version. But I prefer adding randomized testing so all good here.

henningandersen · 2021-06-29T10:29:39Z

test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java

@@ -440,6 +441,12 @@ private static Settings getRandomNodeSettings(long seed) {
        Random random = new Random(seed);
        Builder builder = Settings.builder();
        builder.put(TransportSettings.TRANSPORT_COMPRESS.getKey(), rarely(random));
+        builder.put(TransportSettings.TRANSPORT_COMPRESS_RAW_DATA.getKey(), random.nextBoolean());


OK, let us see how it goes, would be good to do a bit of checking after this goes in.

This commit is related to elastic#73497. It adds two new settings. The first setting is transport.compression_scheme. This setting allows the user to configure LZ4 or DEFLATE as the transport compression. Additionally, it modifies transport.compress to support the value indexing_data. When this setting is set to indexing_data only messages which are primarily composed of raw source data will be compressed. This is bulk, operations recovery, and shard changes messages.

PR elastic#74587 added lz4-java as a dependency. This broken multiple third party audits which had ignored those missing classes. This commit fixes the issue.

This commit reenables the BWC tests after #74587 was backported.

Compression using indexing_data or lz4 as well as recovery from snapshot are primarily intended for ESS and is therefore marked ESS only in docs. Relates elastic#76237 and elastic#74587

Compression using indexing_data or lz4 as well as recovery from snapshot are primarily intended for ESS and is therefore marked ESS only in docs. Relates #76237 and #74587

Tim-Brooks added 5 commits June 21, 2021 16:26

Initial

33a85bb

Changes

a31922b

Changes

8bc7cbf

Changes

0e67dc4

Merge remote-tracking branch 'upstream/master' into compress_raw_data

6c7f888

Tim-Brooks added >enhancement :Distributed Coordination/Network Http and internode communication implementations v8.0.0 v7.14.0 labels Jun 25, 2021

elasticmachine added the Team:Distributed Meta label for distributed team (obsolete) label Jun 25, 2021

Tim-Brooks added 16 commits June 24, 2021 19:00

Fix

62da09a

Fix

6e8bef4

Fix

0ffd172

Changes

723d7b6

Changes

a4a9e1a

Fix

d26ec36

Delete

68d4a6d

Merge remote-tracking branch 'upstream/master' into compress_raw_data

b3204b7

Finish

9ab0faa

Changes

7a91488

Fix

dcf3e3c

Deps

4371da6

License

cc7588f

Fix violation

56e6359

Fix

9e1fca9

Fix test

76158a0

henningandersen reviewed Jun 28, 2021

View reviewed changes

henningandersen requested a review from original-brownbear June 28, 2021 08:48

Changes

23b3f88

original-brownbear reviewed Jun 28, 2021

View reviewed changes

Tim-Brooks added 9 commits June 28, 2021 16:30

Changes

223e1fb

Changes

a17f900

Changes

c45326f

Changes

97047ec

Changes

4300ca1

Test

99ab30e

Fix

d67213a

Merge remote-tracking branch 'upstream/master' into compress_raw_data

36a399f

Changes

c912529

Tim-Brooks requested a review from henningandersen June 29, 2021 05:40

henningandersen approved these changes Jun 29, 2021

View reviewed changes

Tim-Brooks added 2 commits June 29, 2021 10:57

Changes

72797cb

Merge remote-tracking branch 'upstream/master' into compress_raw_data

2537a16

Tim-Brooks merged commit 293d490 into elastic:master Jun 29, 2021

Tim-Brooks added the backport pending label Jun 29, 2021

Tim-Brooks removed the backport pending label Jun 29, 2021

Tim-Brooks added a commit that referenced this pull request Jun 29, 2021

Reenabled BWC tests after LZ4 compression backport

b3ab321

This commit reenables the BWC tests after #74587 was backported.

Tim-Brooks mentioned this pull request Jul 2, 2021

Enable transport compression between nodes to reduce DTS costs #73497

Closed

12 tasks

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

henningandersen mentioned this pull request Sep 1, 2021

Indexing_data/lz4, recover from snapshot ESS only #77130

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add additional transport compression options #74587

Add additional transport compression options #74587

Tim-Brooks commented Jun 25, 2021 •

edited

Loading

elasticmachine commented Jun 25, 2021

henningandersen left a comment

henningandersen Jun 28, 2021

Tim-Brooks Jun 29, 2021

henningandersen Jun 29, 2021

henningandersen Jun 28, 2021

Tim-Brooks Jun 29, 2021

henningandersen Jun 29, 2021

original-brownbear left a comment

original-brownbear Jun 28, 2021

original-brownbear Jun 28, 2021

original-brownbear Jun 28, 2021

original-brownbear Jun 28, 2021

henningandersen left a comment

henningandersen Jun 29, 2021

henningandersen Jun 29, 2021

Add additional transport compression options #74587

Add additional transport compression options #74587

Conversation

Tim-Brooks commented Jun 25, 2021 • edited Loading

elasticmachine commented Jun 25, 2021

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

original-brownbear left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

henningandersen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tim-Brooks commented Jun 25, 2021 •

edited

Loading