Implement metrics for Zookeeper client #164

noblepaul · 2022-07-07T07:17:06Z

Do not commit, WIP

please let me know if we wish to add more variables

How to use?

start your servers
hit the end point http://localhost:8983/solr/admin/metrics?key=solr.node:CONTAINER.zkClient

sample output

{
  "metrics":{
    "solr.node:CONTAINER.zkClient":{
      "watchesFired":29,
      "reads":452,
      "writes":171,
      "bytesRead":1260447,
      "bytesWritten":483353,
      "multiOps":26,
      "cumulativeMultiOps":34,
      "childFetches":139,
      "cumulativeChildrenFetched":388,
      "existsChecks":427,
      "deletes":3}}}

magibney

Aside from the handful of minor comments/questions/suggestions, I wonder if a bit more consideration would be warranted wrt at a high level what we're seeking to capture with these metrics? One concern I have is that based on where the stats are actually incremented, we're currently not capturing any info about connection errors/retries (some stats are incremented before operations, some after -- a relevant distinction in the event of an exception being thrown by the actual operation; and metrics don't currently capture anything wrt retryOnConnLoss, which might be useful).

solr/solrj/src/java/org/apache/solr/common/cloud/SolrZkClient.java

solr/core/src/java/org/apache/solr/core/ZkContainer.java

solr/solrj/src/java/org/apache/solr/common/cloud/SolrZkClient.java

solr/core/src/java/org/apache/solr/core/CoreContainer.java

noblepaul · 2022-07-08T03:33:41Z

I wonder if a bit more consideration would be warranted wrt at a high level what we're seeking to capture with these metrics?

This is why I have marked this as a WIP. We need to refine the scope of this ticket and add/remove more metrics . Maybe, @hiteshk25 will be able to add more here

magibney

Thanks @noblepaul, this looks good. I think the remaining questions are more about the significance/utility of individual metrics (will discuss with @hiteshk25 et al).

solr/core/src/java/org/apache/solr/core/CoreContainer.java

solr/core/src/java/org/apache/solr/core/ZkContainer.java

solr/solrj/src/java/org/apache/solr/common/cloud/SolrZkClient.java

solr/core/src/java/org/apache/solr/core/ZkContainer.java

solr/solrj/src/java/org/apache/solr/common/cloud/SolrZkClient.java

hiteshk25 · 2022-07-08T21:14:49Z

solr/solrj/src/java/org/apache/solr/common/cloud/SolrZkClient.java

    }
+    metrics.reads.increment();


solr/solrj/src/java/org/apache/solr/common/cloud/SolrZkClient.java

noblepaul · 2022-07-09T01:15:40Z

@hiteshk25

some of these metrics are collected after the operation is done because we also need to collect data from output

for instance, when we do data reads , we need to collect both the number of reads and no:of bytes. the no:of bytes are only available after the operation. OTOH , if you are writing data to ZK , we know the payload size before hand.

hiteshk25 · 2022-07-11T16:18:17Z

@hiteshk25

some of these metrics are collected after the operation is done because we also need to collect data from output

for instance, when we do data reads , we need to collect both the number of reads and no:of bytes. the no:of bytes are only available after the operation. OTOH , if you are writing data to ZK , we know the payload size before hand.

right. But look their main ops counter, which incremented inconsistently.

hiteshk25 · 2022-07-11T16:19:30Z

In general, better to increment counter before their ops. Here is solr request example

lucene-solr/solr/core/src/java/org/apache/solr/handler/RequestHandlerBase.java

Line 182 in 0cb904f

requests.inc();

noblepaul · 2022-07-11T17:59:43Z

In general, better to increment counter before their ops. Here is solr request example

lucene-solr/solr/core/src/java/org/apache/solr/handler/RequestHandlerBase.java

Line 182 in 0cb904f

requests.inc();

Ideally, I would do the increments before the operation. if we also wish to increment the result data , what choice do we have? In places where we don't need to collect output data , we just increment it in the beginning

 public byte[] getData(final String path, final Watcher watcher, final Stat stat, boolean retryOnConnLoss)
      throws KeeperException, InterruptedException {
    byte[] result = null;
    if (retryOnConnLoss) {
      result = zkCmdExecutor.retryOperation(() -> keeper.getData(path, wrapWatcher(watcher), stat));
    } else {
      result = keeper.getData(path, wrapWatcher(watcher), stat);
    }
    metrics.reads.increment();
    if (result != null) {
      metrics.bytesRead.add(result.length);
    }
    return result;
  }

look at the above method. We want to keep track of the bytesRead as well which is only available after the call

hiteshk25 · 2022-07-11T18:38:25Z

In general, better to increment counter before their ops. Here is solr request example

lucene-solr/solr/core/src/java/org/apache/solr/handler/RequestHandlerBase.java

Line 182 in 0cb904f

requests.inc();

Ideally, I would do the increments before the operation. if we also wish to increment the result data , what choice do we have? In places where we don't need to collect output data , we just increment it in the beginning
 public byte[] getData(final String path, final Watcher watcher, final Stat stat, boolean retryOnConnLoss)
      throws KeeperException, InterruptedException {
    byte[] result = null;
    if (retryOnConnLoss) {
      result = zkCmdExecutor.retryOperation(() -> keeper.getData(path, wrapWatcher(watcher), stat));
    } else {
      result = keeper.getData(path, wrapWatcher(watcher), stat);
    }
    metrics.reads.increment();
    if (result != null) {
      metrics.bytesRead.add(result.length);
    }
    return result;
  }
look at the above method. We want to keep track of the bytesRead as well which is only available after the call

I think those are two different metrics,

how many calls
How much data we fetched.
(errors, which requires different metrics(zk-errors).)

noblepaul · 2022-07-19T09:55:11Z

@hiteshk25 The JUnit test is added . lemme know if there is anything more required

hiteshk25 · 2022-08-18T19:23:18Z

These are the test failing in CI
https://app.circleci.com/pipelines/github/cowpaths/fs-solr/1684/workflows/0151f952-1d5c-4231-b64e-e23869575b2e/jobs/4036/parallel-runs/0/steps/0-104

   [junit4] 
   [junit4] Tests with failures [seed: AA7A22597286676]:
   [junit4]   - org.apache.solr.core.FSPRSTest (suite)
   [junit4]   - org.apache.solr.cloud.LegacyCloudClusterPropTest.testCreateCollectionSwitchLegacyCloud
   [junit4]   - org.apache.solr.handler.admin.AdminHandlersProxyTest.proxyMetricsHandlerAllNodes
   [junit4]   - org.apache.solr.metrics.reporters.solr.SolrCloudReportersTest.testDefaultP

hiteshk25 · 2022-08-18T21:28:57Z

rebased from release/8.8, Now these tests are failing
https://app.circleci.com/pipelines/github/cowpaths/fs-solr/1692/workflows/2bdd2469-c700-4b70-ac5a-ba0d6aff24d9/jobs/4075

 Tests with failures [seed: 52AD923C32304CBE]:
   [junit4]   - org.apache.solr.pkg.TestPackages.testCoreReloadingPlugin
   [junit4]   - org.apache.solr.handler.admin.AdminHandlersProxyTest.proxyMetricsHandlerAllNodes
   [junit4]   - org.apache.solr.cloud.LegacyCloudClusterPropTest.testCreateCollectionSwitchLegacyCloud
   [junit4]   - org.apache.solr.cloud.LeaderTragicEventTest.testLeaderFailsOver
   [junit4]   - org.apache.solr.cloud.TestWithCollection.testNodeAdded
   [junit4]   - org.apache.solr.handler.RequestHandlerMetricsTest.testAggregateNodeLevelMetrics

hiteshk25 · 2022-08-18T21:33:42Z

@noblepaul @chatman we need to look above test failures

noblepaul requested review from chatman, hiteshk25, justinrsweeney and magibney July 7, 2022 07:17

magibney suggested changes Jul 7, 2022

View reviewed changes

magibney reviewed Jul 8, 2022

View reviewed changes

hiteshk25 reviewed Jul 8, 2022

View reviewed changes

solr/solrj/src/java/org/apache/solr/common/cloud/SolrZkClient.java Show resolved Hide resolved

hiteshk25 reviewed Jul 8, 2022

View reviewed changes

solr/solrj/src/java/org/apache/solr/common/cloud/SolrZkClient.java Show resolved Hide resolved

hiteshk25 reviewed Jul 8, 2022

View reviewed changes

solr/solrj/src/java/org/apache/solr/common/cloud/SolrZkClient.java

}

metrics.reads.increment();

Copy link

hiteshk25 Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here!

hiteshk25 reviewed Jul 8, 2022

View reviewed changes

solr/solrj/src/java/org/apache/solr/common/cloud/SolrZkClient.java Show resolved Hide resolved

hiteshk25 reviewed Jul 8, 2022

View reviewed changes

solr/solrj/src/java/org/apache/solr/common/cloud/SolrZkClient.java Outdated Show resolved Hide resolved

noblepaul added 11 commits August 18, 2022 12:28

added metrics for SolrZKclient

a4d6fcf

added multi support

9f0858d

more fields added

82a69f6

refactor

f69a697

refactor

71d2f16

use LongAdder

52fd8cd

formatting

49b3100

formatting

d811104

unused imports

644617e

refactor

cc8e7fa

added javadocs

e2b4345

noblepaul and others added 5 commits August 18, 2022 12:30

added total watches count

375f34e

every operation is consistently incremented after the op

5d68114

added JUnit for zk metrics

21572c2

minor test change as it was not compiling in ci

323a105

updated

953958f

hiteshk25 force-pushed the noble/zkMetrics branch from 2add79d to 953958f Compare August 18, 2022 19:37

noblepaul mentioned this pull request Aug 22, 2022

Implement metrics for Zookeeper client (after merge) #179

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement metrics for Zookeeper client #164

Implement metrics for Zookeeper client #164

noblepaul commented Jul 7, 2022 •

edited

Loading

magibney left a comment

noblepaul commented Jul 8, 2022

magibney left a comment

hiteshk25 Jul 8, 2022

noblepaul commented Jul 9, 2022

hiteshk25 commented Jul 11, 2022

hiteshk25 commented Jul 11, 2022

noblepaul commented Jul 11, 2022 •

edited

Loading

hiteshk25 commented Jul 11, 2022

noblepaul commented Jul 19, 2022

hiteshk25 commented Aug 18, 2022

hiteshk25 commented Aug 18, 2022

hiteshk25 commented Aug 18, 2022

Implement metrics for Zookeeper client #164

Are you sure you want to change the base?

Implement metrics for Zookeeper client #164

Conversation

noblepaul commented Jul 7, 2022 • edited Loading

How to use?

magibney left a comment

Choose a reason for hiding this comment

noblepaul commented Jul 8, 2022

magibney left a comment

Choose a reason for hiding this comment

hiteshk25 Jul 8, 2022

Choose a reason for hiding this comment

noblepaul commented Jul 9, 2022

hiteshk25 commented Jul 11, 2022

hiteshk25 commented Jul 11, 2022

noblepaul commented Jul 11, 2022 • edited Loading

hiteshk25 commented Jul 11, 2022

noblepaul commented Jul 19, 2022

hiteshk25 commented Aug 18, 2022

hiteshk25 commented Aug 18, 2022

hiteshk25 commented Aug 18, 2022

noblepaul commented Jul 7, 2022 •

edited

Loading

noblepaul commented Jul 11, 2022 •

edited

Loading