Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11316. Improve Create Key and Chunk IO Dashboards #7075

Merged
merged 4 commits into from
Aug 21, 2024

Conversation

kerneltime
Copy link
Contributor

@kerneltime kerneltime commented Aug 14, 2024

Change-Id: I187a3f0b61ebd3e6ce7c36464052d3dd2e2a2d8b

Add additional metrics to create key dashboard to track total data committed, number of key commits executed.
Minor fixes to legends for Chunk Dashboard. Workaround issue grafana/grafana#44251

Also, default to prometheus configured instead of parameterizing the source which gives errors on import.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11316

How was this patch tested?

Deployed locally.

image

Change-Id: I187a3f0b61ebd3e6ce7c36464052d3dd2e2a2d8b
Change-Id: I352b08bd43a79416a38d8070fcbd206608ec8ef6
Copy link
Contributor

@duongkame duongkame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @kerneltime . LGTM

@@ -15,7 +15,7 @@
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "10.4.2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive question: Would other grafana servers be able to import dashboard generated from newer grafana.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Even with the local setup the docker imaged pulled down is the latest. I used the older version 10.4.2 of the docker image and the dashboards were rendered without errors. This will be a hard thing for us to keep track off across versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried on an older version Version 7.3.6 (commit: NA, branch: master) and a lot of the dashboards don't really work.. but I think it is ok for us to continue with the latest version.

Copy link
Contributor

@tanvipenumudy tanvipenumudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patch @kerneltime, tested the patch locally ✅.

Change-Id: Icdd524ac0479890b50bdf0e256089ae9768deaed
@kerneltime
Copy link
Contributor Author

@tanvipenumudy I pused one more change, can you take a look?

Copy link
Contributor

@tanvipenumudy tanvipenumudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redeployed the changes.

  • The Chunk IO Dashboard LGTM.
  • Please find a very small nit on the Create Keys Dashboard.

Thanks!

@kerneltime kerneltime merged commit ebdf767 into apache:master Aug 21, 2024
13 checks passed
errose28 added a commit that referenced this pull request Aug 23, 2024
* master:
  HDDS-11081. Use thread-local instance of FileSystem in Freon tests (#7091)
  HDDS-11333. Avoid hard-coded current version in upgrade/xcompat tests (#7089)
  Mark TestPipelineManagerMXBean#testPipelineInfo as flaky
  Mark TestAddRemoveOzoneManager#testForceBootstrap as flaky
  HDDS-11352. HDDS-11353. Mark TestOzoneManagerHAWithStoppedNodes as flaky
  HDDS-11354. Mark TestOzoneManagerSnapshotAcl#testLookupKeyWithNotAllowedUserForPrefixAcl as flaky
  HDDS-11355. Mark TestMultiBlockWritesWithDnFailures#testMultiBlockWritesWithIntermittentDnFailures as flaky
  HDDS-11227. Use server default key provider to encrypt/decrypt keys from multiple OMs. (#7081)
  HDDS-11316. Improve Create Key and Chunk IO Dashboards (#7075)
  HDDS-11239. Fix KeyOutputStream's exception handling when calling hsync concurrently (#7047)
errose28 added a commit to errose28/ozone that referenced this pull request Aug 26, 2024
…an-on-error

* HDDS-10239-container-reconciliation: (428 commits)
  HDDS-11081. Use thread-local instance of FileSystem in Freon tests (apache#7091)
  HDDS-11333. Avoid hard-coded current version in upgrade/xcompat tests (apache#7089)
  Mark TestPipelineManagerMXBean#testPipelineInfo as flaky
  Mark TestAddRemoveOzoneManager#testForceBootstrap as flaky
  HDDS-11352. HDDS-11353. Mark TestOzoneManagerHAWithStoppedNodes as flaky
  HDDS-11354. Mark TestOzoneManagerSnapshotAcl#testLookupKeyWithNotAllowedUserForPrefixAcl as flaky
  HDDS-11355. Mark TestMultiBlockWritesWithDnFailures#testMultiBlockWritesWithIntermittentDnFailures as flaky
  HDDS-11227. Use server default key provider to encrypt/decrypt keys from multiple OMs. (apache#7081)
  HDDS-11316. Improve Create Key and Chunk IO Dashboards (apache#7075)
  HDDS-11239. Fix KeyOutputStream's exception handling when calling hsync concurrently (apache#7047)
  HDDS-11254. Reconcile commands should be handled by datanode ReplicationSupervisor (apache#7076)
  HDDS-11331. Fix Datanode unable to report for a long time (apache#7090)
  HDDS-11346. FS CLI gives incorrect recursive volume deletion prompt (apache#7102)
  HDDS-11349. Add NullPointer handling when volume/bucket tables are not initialized (apache#7103)
  HDDS-11209. Avoid insufficient EC pipelines in the container pipeline cache (apache#6974)
  HDDS-11284. refactor quota repair non-blocking while upgrade (apache#7035)
  HDDS-9790. Add tests for Overview page (apache#6983)
  HDDS-10904. [hsync] Enable PutBlock piggybacking and incremental chunk list by default (apache#7074)
  HDDS-11322. [hsync] Block ECKeyOutputStream from calling hsync and hflush (apache#7098)
  HDDS-11325. Intermittent failure in TestBlockOutputStreamWithFailures#testContainerClose (apache#7099)
  ...

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/OzoneContainer.java
errose28 added a commit to errose28/ozone that referenced this pull request Aug 28, 2024
…rrupt-files

* HDDS-10239-container-reconciliation: (61 commits)
  HDDS-11081. Use thread-local instance of FileSystem in Freon tests (apache#7091)
  HDDS-11333. Avoid hard-coded current version in upgrade/xcompat tests (apache#7089)
  Mark TestPipelineManagerMXBean#testPipelineInfo as flaky
  Mark TestAddRemoveOzoneManager#testForceBootstrap as flaky
  HDDS-11352. HDDS-11353. Mark TestOzoneManagerHAWithStoppedNodes as flaky
  HDDS-11354. Mark TestOzoneManagerSnapshotAcl#testLookupKeyWithNotAllowedUserForPrefixAcl as flaky
  HDDS-11355. Mark TestMultiBlockWritesWithDnFailures#testMultiBlockWritesWithIntermittentDnFailures as flaky
  HDDS-11227. Use server default key provider to encrypt/decrypt keys from multiple OMs. (apache#7081)
  HDDS-11316. Improve Create Key and Chunk IO Dashboards (apache#7075)
  HDDS-11239. Fix KeyOutputStream's exception handling when calling hsync concurrently (apache#7047)
  HDDS-11254. Reconcile commands should be handled by datanode ReplicationSupervisor (apache#7076)
  HDDS-11331. Fix Datanode unable to report for a long time (apache#7090)
  HDDS-11346. FS CLI gives incorrect recursive volume deletion prompt (apache#7102)
  HDDS-11349. Add NullPointer handling when volume/bucket tables are not initialized (apache#7103)
  HDDS-11209. Avoid insufficient EC pipelines in the container pipeline cache (apache#6974)
  HDDS-11284. refactor quota repair non-blocking while upgrade (apache#7035)
  HDDS-9790. Add tests for Overview page (apache#6983)
  HDDS-10904. [hsync] Enable PutBlock piggybacking and incremental chunk list by default (apache#7074)
  HDDS-11322. [hsync] Block ECKeyOutputStream from calling hsync and hflush (apache#7098)
  HDDS-11325. Intermittent failure in TestBlockOutputStreamWithFailures#testContainerClose (apache#7099)
  ...

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants