Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-10890. [hsync] Increase default value for hdds.container.ratis.log.appender.queue.num-elements. #6711

Merged
merged 1 commit into from
May 22, 2024

Conversation

jojochuang
Copy link
Contributor

What changes were proposed in this pull request?

HDDS-10890. [hsync] Increase default value for hdds.container.ratis.log.appender.queue.num-elements.

Please describe your PR in detail:
Using Freon DN Echo tool, I found that increasing hdds.container.ratis.log.appender.queue.num-elements value drastically improve DN Echo throughput and latency.

Set it to 1024 to be consistent with OM and SCM.
ozone.om.ratis.log.appender.queue.num-elements
ozone.scm.ha.ratis.log.appender.queue.num-elements

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10890

How was this patch tested?

Benchmarked on a real cluster

…og.appender.queue.num-elements.

Change-Id: Ie3d85e7cd89fae7fae1fea19c6d770d53d5bfb85
@jojochuang
Copy link
Contributor Author

jojochuang commented May 21, 2024

With this configuration,

1 client thread:

sudo -u hdfs ozone freon dne --clients=32 --container-id=4 -t 1 -n 1000000 --ratis --sleep-time-ms=0

     mean rate = 1681.18 calls/second
 1-minute rate = 1362.11 calls/second
 5-minute rate = 1157.42 calls/second
15-minute rate = 1115.03 calls/second
           min = 0.35 milliseconds
           max = 1.94 milliseconds
          mean = 0.54 milliseconds
        stddev = 0.12 milliseconds
        median = 0.51 milliseconds
          75% <= 0.57 milliseconds
          95% <= 0.70 milliseconds
          98% <= 0.87 milliseconds
          99% <= 0.98 milliseconds
        99.9% <= 1.49 milliseconds

32 clients:

sudo -u hdfs ozone freon dne --clients=32 --container-id=4 -t 32 -n 1000000 --ratis --sleep-time-ms=0

     mean rate = 16372.24 calls/second
 1-minute rate = 15259.27 calls/second
 5-minute rate = 13352.70 calls/second
15-minute rate = 12882.74 calls/second
           min = 0.81 milliseconds
           max = 21.83 milliseconds
          mean = 1.75 milliseconds
        stddev = 1.61 milliseconds
        median = 1.51 milliseconds
          75% <= 1.80 milliseconds
          95% <= 2.53 milliseconds
          98% <= 3.83 milliseconds
          99% <= 6.42 milliseconds
        99.9% <= 21.83 milliseconds

16k per second is about the OM echo throughput on the same cluster.

@jojochuang jojochuang requested a review from szetszwo May 21, 2024 20:58
Copy link
Contributor

@szetszwo szetszwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 the change looks good.

@ChenSammi
Copy link
Contributor

The change looks good to me. It's quite interesting why it's 1 previously.

@adoroszlai adoroszlai merged commit 0176264 into apache:master May 22, 2024
39 checks passed
@adoroszlai
Copy link
Contributor

Thanks @jojochuang for the patch, @ChenSammi, @szetszwo for the review.

jojochuang added a commit to jojochuang/ozone that referenced this pull request May 23, 2024
errose28 added a commit to errose28/ozone that referenced this pull request May 28, 2024
…concile-cli

* HDDS-10239-container-reconciliation: (296 commits)
  HDDS-10897. Refactor OzoneQuota (apache#6714)
  HDDS-10422. Fix some warnings about exposing internal representation in hdds-common (apache#6351)
  HDDS-10899. Refactor Lease callbacks (apache#6715)
  HDDS-10890. Increase default value for hdds.container.ratis.log.appender.queue.num-elements (apache#6711)
  HDDS-10832. Client should switch to streaming based on OpenKeySession replication (apache#6683)
  HDDS-10435. Support S3 object tags for existing requests (apache#6607)
  HDDS-10883. Improve logging in Recon for finalising DN logic. (apache#6704)
  HDDS-8752. Enable TestOzoneRpcClientAbstract#testOverWriteKeyWithAndWithOutVersioning (apache#6702)
  HDDS-10875. XceiverRatisServer#getRaftPeersInPipeline should be called before XceiverRatisServer#removeGroup (apache#6696)
  HDDS-10514. Recon - Provide DN decommissioning detailed status and info inline with current CLI command output. (apache#6376)
  HDDS-10878. Bump zstd-jni to 1.5.6-3 (apache#6701)
  HDDS-10877. Bump Dropwizard metrics to 3.2.6 (apache#6699)
  HDDS-10876. Bump jackson to 2.16.2 (apache#6697)
  HDDS-6116. Remove flaky tag from TestSCMInstallSnapshot (apache#6695)
  HDDS-2643. TestOzoneDelegationTokenSecretManager#testRenewTokenFailureRenewalTime fails intermittently.
  HDDS-10699. Refactor ContainerBalancerTask and TestContainerBalancerTask (apache#6537)
  HDDS-10861. Ozone cli supports default ozone.om.service.id (apache#6680)
  HDDS-10859. Improve error messages when decommission and maintenance fail-early (apache#6678)
  HDDS-9031. Upgrade acceptance tests to Docker Compose v2 (apache#6667)
  HDDS-10559. Add a warning or a check to run repair tool as System user (apache#6574)
  ...

Conflicts:
    hadoop-ozone/dist/src/main/smoketest/admincli/container.robot
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants