Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional transport compression options #74587

Merged
merged 33 commits into from
Jun 29, 2021

Conversation

Tim-Brooks
Copy link
Contributor

@Tim-Brooks Tim-Brooks commented Jun 25, 2021

This commit is related to #73497. It adds two new settings. The first setting
is transport.compression_scheme. This setting allows the user to
configure LZ4 or DEFLATE as the transport compression. Additionally, it
modifies transport.compress to support the value indexing_data. When
this setting is set to indexing_data only messages which are primarily
composed of raw source data will be compressed. This is bulk, operations
recovery, and shard changes messages.

@Tim-Brooks Tim-Brooks added >enhancement :Distributed Coordination/Network Http and internode communication implementations v8.0.0 v7.14.0 labels Jun 25, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team (obsolete) label Jun 25, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a number of detailed comments. In addition, I would like to have the documentation added in this PR, marking the new options experimental. Finally, I wonder if there is additional testing we can do (added one such comment only but did not investigate more deeply which tests to add).

return bytesConsumed;
}

private int decodeBlock(BytesReference reference) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think we need a test or assertion to verify at least one of:

  1. Randomized compression and decompression works.
  2. The java library is the expected version

Ideally we would split this method into a class that can be contributed upstream to the the lz4 library (can do in follow-up).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The java library is the expected version

I'm not sure what you meant here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testing added (plus the ones I suggested) should suffice here. I was only worried that we would not catch discrepancies fast if the library is updated to a newer version which changes the block format. And one way to catch that would be to assert something on the library version. But I prefer adding randomized testing so all good here.

@@ -440,6 +441,12 @@ private static Settings getRandomNodeSettings(long seed) {
Random random = new Random(seed);
Builder builder = Settings.builder();
builder.put(TransportSettings.TRANSPORT_COMPRESS.getKey(), rarely(random));
builder.put(TransportSettings.TRANSPORT_COMPRESS_RAW_DATA.getKey(), random.nextBoolean());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine we use rarely above to avoid the overhead of compression. Can you run the full test suite with raw data compression on vs compression off to verify the penalty on builds? That would also give some indication of the overhead of lz4/raw_data.

I wonder if we only want LZ4/raw_data to run "every second run" and the other combinations (like deflate/raw_data and full compression) to only run rarely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put all compression back to rarely. I will test this out a bit soon. Hopefully this is not a blocker as it is easy to fix if build times are impacted?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let us see how it goes, would be good to do a bit of checking after this goes in.

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick comments, unfortunately I didn't get through all of it today.

*/
static final int COMPRESSION_LEVEL_BASE = 10;

static final int MIN_BLOCK_SIZE = 64;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unused?


static final int MIN_BLOCK_SIZE = 64;
static final int MAX_BLOCK_SIZE = 1 << COMPRESSION_LEVEL_BASE + 0x0F; // 32 M
static final int DEFAULT_BLOCK_SIZE = 1 << 16; // 64 KB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused as well?

}
byte firstByte = bytes.get(0);
byte[] header;
if (firstByte == CompressionScheme.DEFLATE_HEADER[0]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just interpret the headers as int, then we don't have to have a mutable public static byte[] and can just switch on that int here (faster and less code)?

}
byte firstByte = bytes.get(0);
byte[] header;
if (firstByte == CompressionScheme.DEFLATE_HEADER[0]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just interpret the header as int, then we don't have to have a mutable public static byte[] and can just switch on that int here (faster and less code)?

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left a number of smaller things to address.

docs/reference/modules/transport.asciidoc Show resolved Hide resolved
docs/reference/modules/transport.asciidoc Outdated Show resolved Hide resolved
return bytesConsumed;
}

private int decodeBlock(BytesReference reference) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testing added (plus the ones I suggested) should suffice here. I was only worried that we would not catch discrepancies fast if the library is updated to a newer version which changes the block format. And one way to catch that would be to assert something on the library version. But I prefer adding randomized testing so all good here.

@@ -440,6 +441,12 @@ private static Settings getRandomNodeSettings(long seed) {
Random random = new Random(seed);
Builder builder = Settings.builder();
builder.put(TransportSettings.TRANSPORT_COMPRESS.getKey(), rarely(random));
builder.put(TransportSettings.TRANSPORT_COMPRESS_RAW_DATA.getKey(), random.nextBoolean());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let us see how it goes, would be good to do a bit of checking after this goes in.

@Tim-Brooks Tim-Brooks merged commit 293d490 into elastic:master Jun 29, 2021
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Jun 29, 2021
This commit is related to elastic#73497. It adds two new settings. The first setting
is transport.compression_scheme. This setting allows the user to
configure LZ4 or DEFLATE as the transport compression. Additionally, it
modifies transport.compress to support the value indexing_data. When
this setting is set to indexing_data only messages which are primarily
composed of raw source data will be compressed. This is bulk, operations
recovery, and shard changes messages.
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Jun 29, 2021
PR elastic#74587 added lz4-java as a dependency. This broken multiple third
party audits which had ignored those missing classes. This commit fixes
the issue.
Tim-Brooks added a commit that referenced this pull request Jun 29, 2021
This commit reenables the BWC tests after #74587 was backported.
henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Sep 1, 2021
Compression using indexing_data or lz4 as well as recovery from snapshot
are primarily intended for ESS and is therefore marked ESS only in docs.

Relates elastic#76237 and elastic#74587
henningandersen added a commit that referenced this pull request Sep 1, 2021
Compression using indexing_data or lz4 as well as recovery from snapshot
are primarily intended for ESS and is therefore marked ESS only in docs.

Relates #76237 and #74587
henningandersen added a commit that referenced this pull request Sep 1, 2021
Compression using indexing_data or lz4 as well as recovery from snapshot
are primarily intended for ESS and is therefore marked ESS only in docs.

Relates #76237 and #74587
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Network Http and internode communication implementations >enhancement Team:Distributed Meta label for distributed team (obsolete) v7.14.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants