Skip to content
This repository has been archived by the owner on Mar 27, 2021. It is now read-only.

Implement metric and log lines when a Bigtable request times out, per x-client-id #763

Merged
merged 5 commits into from
Mar 4, 2021

Conversation

sming
Copy link
Contributor

@sming sming commented Mar 1, 2021

We need to know how many requests are hitting Bigtable timeouts per x-client-id

  • I am a prism engineer
  • Who wants to know the impact of his/her changes on our user base
  • So that I can put out any fires efficiently, inform the user base if necessary, and protect heroic

Proposed Solution

  • decorate the exception-handling at the API request level: add a check for the Bigtable timeout exception string
  • if found, increment the according counter for the request's x-client-id
  • if found, emit a log message that details the x-client-id (and the other usual exception info)

Design & Implementation Notes

nothing unusual. Just added :

  1. detection logic,
  2. a new metric e.g. query-metrics-client-id-count-playerapp-timeout
  3. logging e.g. "bigtable-query-timeout...

@sming sming changed the title WIP BT timeout logging and metrics 🚨 DRAFT 🚨 Implement metric and log lines when a Bigtable request times out, per x-client-id Mar 1, 2021
@sming sming linked an issue Mar 1, 2021 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Mar 2, 2021

Codecov Report

Merging #763 (4e88457) into master (903382d) will decrease coverage by 0.04%.
The diff coverage is 60.97%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #763      +/-   ##
============================================
- Coverage     55.14%   55.10%   -0.05%     
- Complexity     3163     3164       +1     
============================================
  Files           748      749       +1     
  Lines         20395    20427      +32     
  Branches       1339     1341       +2     
============================================
+ Hits          11247    11256       +9     
- Misses         8661     8683      +22     
- Partials        487      488       +1     
Impacted Files Coverage Δ Complexity Δ
...in/java/com/spotify/heroic/common/FailureType.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...c/main/java/com/spotify/heroic/metric/NodeError.kt 20.00% <0.00%> (ø) 1.00 <0.00> (ø)
.../main/java/com/spotify/heroic/metric/ShardError.kt 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
...oic/statistics/noop/NoopMetricBackendReporter.java 87.50% <0.00%> (-5.84%) 9.00 <0.00> (ø)
...tify/heroic/querylogging/noop/NoopQueryLogger.java 16.66% <0.00%> (-1.52%) 2.00 <0.00> (ø)
...potify/heroic/servlet/MandatoryClientIdFilter.java 100.00% <ø> (ø) 4.00 <0.00> (ø)
...spotify/heroic/test/AbstractMetadataBackendIT.java 98.40% <0.00%> (ø) 30.00 <0.00> (ø)
...istics/semantic/SemanticMetricBackendReporter.java 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
.../com/spotify/heroic/metric/LocalMetricManager.java 64.67% <73.33%> (+0.08%) 4.00 <1.00> (ø)
.../spotify/heroic/querylogging/Slf4jQueryLogger.java 88.88% <93.75%> (-3.71%) 18.00 <8.00> (ø)
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 903382d...4e88457. Read the comment docs.

@@ -378,7 +378,7 @@ public void filterTest() throws Exception {
private void retrySome(final ThrowingRunnable action) throws Exception {
AssertionError error = null;

for (int i = 0; i < 10; i++) {
for (int i = 0; i < 50; i++) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: this was timing out in CI so I increased. This won't be any slower, it's just more permissive.

@sming sming changed the title 🚨 DRAFT 🚨 Implement metric and log lines when a Bigtable request times out, per x-client-id Implement metric and log lines when a Bigtable request times out, per x-client-id Mar 2, 2021
@sming sming marked this pull request as ready for review March 2, 2021 18:07
Copy link
Contributor

@malish8632 malish8632 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sming sming merged commit 2e63167 into master Mar 4, 2021
@delete-merged-branch delete-merged-branch bot deleted the feature/add-bt-timeout-metrics branch March 4, 2021 18:26
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Instrument Bigtable timeouts (add metrics and log messages)
2 participants