Skip to content
This repository is currently being migrated. It's locked while the migration is in progress.

add Prometheus to remaining sidecar types #752

Merged
merged 5 commits into from
Oct 24, 2022
Merged

add Prometheus to remaining sidecar types #752

merged 5 commits into from
Oct 24, 2022

Conversation

ghirsch-reddit
Copy link
Contributor

@ghirsch-reddit ghirsch-reddit commented Oct 10, 2022

💸 TL;DR

Reproducing #742. This makes all sidecar types expose Prometheus metrics.

📜 Details

JIRA

🧪 Testing Steps / Validation

  • tested with a service in snoodev - it worked insofar as it ran, though I'm not 100% sure that all of the behavior is unchanged since the service was not using the sidecars in dev before
  • a test in snoodev for services with each of the sidecar types
  • deploy a service for real with an override to use the new versions, and closely monitor it

and only then update infrared to use these by default

✅ Checks

  • CI tests (if present) are passing
  • Adheres to code style for repo
  • Contributor License Agreement (CLA) completed if not a Reddit employee

@ghirsch-reddit ghirsch-reddit force-pushed the BASE-122 branch 3 times, most recently from d17e243 to 107ff09 Compare October 10, 2022 23:01
@ghirsch-reddit ghirsch-reddit marked this pull request as ready for review October 10, 2022 23:06
@ghirsch-reddit ghirsch-reddit requested a review from a team as a code owner October 10, 2022 23:06
Copy link
Contributor

@nsheaps nsheaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I recognize the prom metrics are basically in parity with the statsd metrics, I think we should be aiming for the proper metrics, rather than replacement metrics. Additionally, these need spec changes (even though the same sidecar are used for go and python), and we should ensure any clients used in the sidecars are the baseplate clients so we get that instrumentation for free

@ghirsch-reddit ghirsch-reddit force-pushed the BASE-122 branch 3 times, most recently from 8a10860 to 0491f16 Compare October 13, 2022 18:33
Comment on lines -108 to -110
self.session.headers[
"User-Agent"
] = f"baseplate.py-{self.__class__.__name__}/{baseplate_version}"
Copy link
Contributor Author

@ghirsch-reddit ghirsch-reddit Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but it seems like this was getting overwritten below anyway (and same in the other file)

@ghirsch-reddit ghirsch-reddit force-pushed the BASE-122 branch 2 times, most recently from f6f0830 to 0959931 Compare October 13, 2022 18:44
@ghirsch-reddit
Copy link
Contributor Author

ghirsch-reddit commented Oct 13, 2022

Added the metrics to the spec here https://github.snooguts.net/reddit/baseplate.spec/pull/74

Copy link
Contributor

@nsheaps nsheaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! Do you have some example metrics from testing?

@ghirsch-reddit
Copy link
Contributor Author

Not good ones 😅 My plan now is to ask a team who actually uses these sidecars in a not-very-critical service if I can do better tests using their service

@ghirsch-reddit
Copy link
Contributor Author

Okey doke, finally tested it:

Example metrics from the trace_publisher test (forgot to record ones from the other test, but it was pretty similar)

# HELP zipkin_trace_publishes_total Multiprocess metric
# TYPE zipkin_trace_publishes_total counter
zipkin_trace_publishes_total 36.0
# HELP http_client_requests_total Multiprocess metric
# TYPE http_client_requests_total counter
http_client_requests_total{http_client_name="trace_collector",http_method="post",http_response_code="202",http_success="true"} 3.0
# HELP http_client_latency_seconds Multiprocess metric
# TYPE http_client_latency_seconds histogram
http_client_latency_seconds_sum{http_client_name="trace_collector",http_method="post",http_success="true"} 0.031773647060617805
http_client_latency_seconds_bucket{http_client_name="trace_collector",http_method="post",http_success="true",le="0.0001"} 0.0
http_client_latency_seconds_bucket{http_client_name="trace_collector",http_method="post",http_success="true",le="0.0005"} 0.0
http_client_latency_seconds_bucket{http_client_name="trace_collector",http_method="post",http_success="true",le="0.001"} 0.0
http_client_latency_seconds_bucket{http_client_name="trace_collector",http_method="post",http_success="true",le="0.0025"} 0.0
http_client_latency_seconds_bucket{http_client_name="trace_collector",http_method="post",http_success="true",le="0.005"} 0.0
http_client_latency_seconds_bucket{http_client_name="trace_collector",http_method="post",http_success="true",le="0.01"} 2.0
http_client_latency_seconds_bucket{http_client_name="trace_collector",http_method="post",http_success="true",le="0.025"} 3.0
http_client_latency_seconds_bucket{http_client_name="trace_collector",http_method="post",http_success="true",le="0.05"} 3.0
http_client_latency_seconds_bucket{http_client_name="trace_collector",http_method="post",http_success="true",le="0.1"} 3.0
http_client_latency_seconds_bucket{http_client_name="trace_collector",http_method="post",http_success="true",le="0.25"} 3.0
http_client_latency_seconds_bucket{http_client_name="trace_collector",http_method="post",http_success="true",le="0.5"} 3.0
http_client_latency_seconds_bucket{http_client_name="trace_collector",http_method="post",http_success="true",le="1.0"} 3.0
http_client_latency_seconds_bucket{http_client_name="trace_collector",http_method="post",http_success="true",le="5.0"} 3.0
http_client_latency_seconds_bucket{http_client_name="trace_collector",http_method="post",http_success="true",le="15.0"} 3.0
http_client_latency_seconds_bucket{http_client_name="trace_collector",http_method="post",http_success="true",le="30.0"} 3.0
http_client_latency_seconds_bucket{http_client_name="trace_collector",http_method="post",http_success="true",le="+Inf"} 3.0
http_client_latency_seconds_count{http_client_name="trace_collector",http_method="post",http_success="true"} 3.0
# HELP http_client_active_requests Multiprocess metric
# TYPE http_client_active_requests gauge
http_client_active_requests{http_client_name="trace_collector",http_method="post",pid="7"} 0.0

@ghirsch-reddit ghirsch-reddit merged commit eb7d24b into develop Oct 24, 2022
@ghirsch-reddit ghirsch-reddit deleted the BASE-122 branch October 24, 2022 22:50
ghirsch-reddit added a commit that referenced this pull request Dec 5, 2022
ghirsch-reddit added a commit that referenced this pull request Dec 5, 2022
* Revert "expose prometheus metrics for Zookeeper in live-data sidecar (#747)"

This reverts commit e49779e.

* Revert "add Prometheus to remaining sidecar types (#752)"

This reverts commit eb7d24b.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants