Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve][meta] Log a warning when ZK batch fails with connectionloss #22566

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Apr 23, 2024

Motivation

A batch operation might cause a connection to break which is a severe issue. It's better to log a warning whenever this happens instead of silently ignoring it.

Modifications

  • Log a warning when a batch operation fails with connectionloss.
  • Add summary information to the log message.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@lhotari lhotari added this to the 3.3.0 milestone Apr 23, 2024
@lhotari lhotari requested a review from merlimat April 23, 2024 11:03
@lhotari lhotari self-assigned this Apr 23, 2024
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Apr 23, 2024
@lhotari
Copy link
Member Author

lhotari commented Apr 23, 2024

I might have encountered this issue recently when working on the /metrics endpoint issue.
A few times I got errors such as java.io.IOException: Packet len 15589885 is out of range! without any other proper explanation. More details of test setup in #22477 and #22494 comments. It seems that https://github.com/lhotari/pulsar-playground/blob/master/src/main/java/com/github/lhotari/pulsar/playground/TestScenarioCreateLongNamedTopics.java and https://github.com/lhotari/pulsar-playground/blob/master/src/main/java/com/github/lhotari/pulsar/playground/TestScenarioLoadAll.java would reproduce the problem I faced.

@lhotari
Copy link
Member Author

lhotari commented Apr 23, 2024

The test

@Test(dataProvider = "impl")
public void testBigBatchSize(String provider, Supplier<String> urlSupplier) throws Exception {
@Cleanup
MetadataStore store = MetadataStoreFactory.create(urlSupplier.get(), MetadataStoreConfig.builder()
.batchingEnabled(true)
.batchingMaxDelayMillis(1_000)
.build());
String key1 = newKey();
// Create 20 MB of data and try to read it out
int dataSize = 500 * 1024;
byte[] payload = new byte[dataSize];
int N = 40;
List<CompletableFuture<Stat>> putFutures = new ArrayList<>();
for (int i = 0; i < N; i++) {
putFutures.add(store.put(key1 + "/" + i, payload, Optional.empty()));
}
FutureUtil.waitForAll(putFutures).join();
List<CompletableFuture<Optional<GetResult>>> getFutures = new ArrayList<>();
for (int i = 0; i < N; i++) {
getFutures.add(store.get(key1 + "/" + i));
}
FutureUtil.waitForAll(getFutures).join();
}
will reproduce the problem.
it will log "2024-04-23T17:03:51,416 - WARN - [main-EventThread:ZKMetadataStore@204] - Connection loss while executing batch operation of 40 GET entries of total data size of 910. Retrying individual operations one-by-one." with the changes in this PR.

@lhotari
Copy link
Member Author

lhotari commented Apr 23, 2024

It seems that batching reads could cause more harm than benefit when the returned data exceeds jute.maxbuffer size.
This impacts stability a lot so there should be a way to disable batch reads completely.

@lhotari lhotari merged commit 69839c7 into apache:master Apr 26, 2024
50 of 52 checks passed
lhotari added a commit that referenced this pull request Apr 26, 2024
lhotari added a commit that referenced this pull request Apr 26, 2024
lhotari added a commit that referenced this pull request Apr 26, 2024
lhotari added a commit that referenced this pull request Apr 26, 2024
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request May 13, 2024
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants