Skip to content

Commit 10914d7

Browse files
author
Darun Seethammagari
authored
feat: Instrument Runner Service (#602)
Runner is lacking instrumentation. It is responsible for many things and it's become hard to understand what tasks contribute to the overall latency of an indexer. In addition, we are now at a point where we need to drive down latencies to facilitate new * indexer use cases such as access keys. I've chosen to instrument Runner with OpenTelemetry. Tracing generally requires 3 items: An instrumented service, a trace collector, and a trace visualizer. The service is responsible for collecting and transmitting trace data to the collector. The collector should be able to receive trace data with little fuss to prevent performance impacts to the instrumented service. The collector then processes the trace data and transmits the processed data to the visualizer. The visualizer visualizes trace data and allows for filtering on traces. The benefit of OpenTelemetry over other options like Zipkin and Jaeger is that GCP already supports ingesting OpenTelemetry data. As such, we don't need to provision a collector ourselves, and can instead leverage GCP's existing collector & visualizer Tracing service. For local development, traces can be output to console, a Zipkin all-in-one container or to GCP (Requires Cloud Trace Agent role and specifying project ID). This is done by simply initializing the NodeSDK differently. In addition, we do not want to enable traces in prod yet, so by not specifying any exporter. This creates a No-Op Trace Exporter which won't attempt to record traces. No code changes were made changing code execution path. All tests pass with no changes, aside from having to replace snapshots due to changes in tabbing of mutation strings. I have manually verified mutation strings are still the same by stripping whitespace and checking against original.
1 parent 6038fe6 commit 10914d7

File tree

11 files changed

+7287
-226
lines changed

11 files changed

+7287
-226
lines changed

docker-compose.yml

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,11 @@ services:
5656
AWS_ACCESS_KEY_ID:
5757
AWS_SECRET_ACCESS_KEY:
5858
GRPC_SERVER_PORT: 7001
59+
PREFETCH_QUEUE_LIMIT: 10
60+
TRACING_EXPORTER: ZIPKIN # CONSOLE, GCP, ZIPKIN, or NONE
61+
ZIPKIN_ENDPOINT: http://zipkin:9411/api/v2/spans
62+
GCP_PROJECT_ID:
63+
TRACING_SAMPLE_RATE: 0.1
5964
ports:
6065
- "7001:7001"
6166

@@ -104,6 +109,7 @@ services:
104109
HASURA_GRAPHQL_ENABLED_LOG_TYPES: startup, http-log, webhook-log, websocket-log, query-log
105110
HASURA_GRAPHQL_ADMIN_SECRET: myadminsecretkey
106111
HASURA_GRAPHQL_AUTH_HOOK: http://hasura-auth:4000/auth
112+
107113
grafana:
108114
image: grafana/grafana
109115
volumes:
@@ -112,13 +118,20 @@ services:
112118
- "3000:3000"
113119
environment:
114120
- GF_SECURITY_ADMIN_PASSWORD=secret
115-
121+
116122
prometheus:
117123
image: prom/prometheus
118124
volumes:
119125
- ./prometheus.yml:/etc/prometheus/prometheus.yml
120126
ports:
121127
- "9090:9090"
128+
129+
zipkin:
130+
image: openzipkin/zipkin
131+
ports:
132+
- "9411:9411"
133+
environment:
134+
- STORAGE_TYPE=mem
122135

123136
volumes:
124137
postgres:

0 commit comments

Comments
 (0)