Skip to content

Commit

Permalink
Add continuous testing for logs to k3d smoke test environment (#2223)
Browse files Browse the repository at this point in the history
Signed-off-by: Paschalis Tsilias <paschalis.tsilias@grafana.com>
  • Loading branch information
tpaschalis authored Oct 3, 2022
1 parent 51dc11e commit 634feb6
Show file tree
Hide file tree
Showing 9 changed files with 282 additions and 38 deletions.
19 changes: 15 additions & 4 deletions example/k3d/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,13 @@ k3d cluster delete agent-k3d

## Smoke Test Environment

The smoke test environment is used to validate samples end to end.
The smoke test environment is used for end-to-end validation of all three observability signals.

### Running

Smoke Test environment is invoked via `/scripts/smoke-test.bash`

This tool will spin up cluster of Grafana Agent, Cortex, Avalanche, Smoke and [Crow](../../tools/crow/README.md) instances. The Smoke deployment will then periodically kill instances and check for any failed alerts. At the end of the duration (default 3h) it will end the testing.
This tool will spin up cluster of Grafana Agent, Cortex, Avalanche, Smoke, [Crow](../../tools/crow/README.md), [Canary](https://grafana.com/docs/loki/latest/operations/loki-canary/) and Vulture instances. The Smoke deployment will then periodically kill instances and check for any failed alerts. At the end of the duration (default 3h) it will end the testing.

For users who do not have access to the `us.gcr.io/kubernetes-dev` container registry, do the following to run the smoke test:

Expand All @@ -67,6 +67,7 @@ These alerts are viewable [here](http://prometheus.k3d.localhost:50080/alerts).

Prometheus alerts are triggered:
- If any Crow instances are not running or Crow samples are not being propagated correctly.
- If any Canary instances are not running or Canary logs are not being propagated correctly.
- If any Vulture instances are not running or Vulture samples are not being propagated correctly.
- If any Grafana Agents are not running or Grafana Agent limits are outside their norm.

Expand All @@ -88,6 +89,10 @@ Changing the avalanche setting for label_count to 1000, located [here](../../pro

![](./assets/trigger_change.png)

For Loki Canary, the easiest way to trigger an alert is to edit its Daemonset to query for a different label that doesn't exist.
![](./assets/trigger_logs_alerts.png)
![](./assets/logs_alerts.png)

### Architecture

By default, a k3d cluster will be created running the following instances
Expand All @@ -99,15 +104,21 @@ By default, a k3d cluster will be created running the following instances
- cortex
- avalanche - selection of avalanche instances serving traffic
- smoke - scales avalanche replicas and introduces chaos by deleting agent pods during testing
- vulture - emits traces and checks if are stored properly
- canary - emits logs and checks if they're stored properly
- loki
- vulture - emits traces and checks if they're stored properly
- tempo

Crow and Vulture instances will check to see if the metrics and traces that were scraped shows up in the prometheus endpoint and then will emit metrics on the success of those metrics. This success/failure result will trigger an alert if it is incorrect.
Crow, Canary and Vulture instances will check to see if the metrics, logs and traces that were scraped respectively, show up in the Cortex/Loki/Tempo instances. They will then emit metrics on the success of those metrics. This success/failure result will trigger an alert if it is incorrect.

### Metrics Flow

![](./assets/metrics_flow.png)

### Logs Flow

![](./assets/logs_flow.png)

### Traces Flow

![](./assets/traces_flow.png)
Expand Down
Binary file added example/k3d/assets/logs_alerts.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions example/k3d/assets/logs_flow.mermaid
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
sequenceDiagram
Canary ->>+ Canary: Logs to stdout
Agent ->>+ Canary: Reads pod logs through /var/log/pods/
Agent ->>+ Loki: Pushes logs (remote_write)
Canary ->>+ Loki: Queries logs storage
Canary ->>+ Canary: Exposed success/failure metrics
Agent ->>+ Canary: Scrapes /metrics
Agent ->>+ Prometheus: Sends success/failure metrics
Prometheus ->>+ Prometheus: Checks alerts
Binary file added example/k3d/assets/logs_flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added example/k3d/assets/trigger_logs_alerts.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions example/k3d/jsonnetfile.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,15 @@
},
"version": "master"
},
{
"source": {
"git": {
"remote": "https://github.com/grafana/loki.git",
"subdir": "production/ksonnet/loki-canary"
}
},
"version": "main"
},
{
"source": {
"git": {
Expand Down
10 changes: 10 additions & 0 deletions example/k3d/jsonnetfile.lock.json
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,16 @@
"version": "b9cc0f3529833096c043084c04bc7b3562a134c4",
"sum": "mtTAh8vSa4Eb8ojviyZ9zE2pPq5OgwhK75qsEWkifhI="
},
{
"source": {
"git": {
"remote": "https://github.com/grafana/loki.git",
"subdir": "production/ksonnet/loki-canary"
}
},
"version": "74c8cf03ba4fb2abd979a9af05bb945813a4505c",
"sum": "EIFf6m9IvdJbfGMXkzWYofoFSnHo8f+tVeUh3x/v+u0="
},
{
"source": {
"git": {
Expand Down
198 changes: 164 additions & 34 deletions example/k3d/smoke/main.jsonnet
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
local monitoring = import './monitoring/main.jsonnet';
local cortex = import 'cortex/main.libsonnet';
local avalanche = import 'grafana-agent/smoke/avalanche/main.libsonnet';
local crow = import 'grafana-agent/smoke/crow/main.libsonnet';
local canary = import 'github.com/grafana/loki/production/ksonnet/loki-canary/loki-canary.libsonnet';
local vulture = import 'github.com/grafana/tempo/operations/jsonnet/microservices/vulture.libsonnet';
local tempo = import 'github.com/grafana/tempo/operations/jsonnet/single-binary/tempo.libsonnet';
local avalanche = import 'grafana-agent/smoke/avalanche/main.libsonnet';
local crow = import 'grafana-agent/smoke/crow/main.libsonnet';
local etcd = import 'grafana-agent/smoke/etcd/main.libsonnet';
local smoke = import 'grafana-agent/smoke/main.libsonnet';
local gragent = import 'grafana-agent/v2/main.libsonnet';
local k = import 'ksonnet-util/kausal.libsonnet';
local loki = import 'loki/main.libsonnet';

local namespace = k.core.v1.namespace;
local pvc = k.core.v1.persistentVolumeClaim;
Expand All @@ -17,6 +19,7 @@ local statefulset = k.apps.v1.statefulSet;
local service = k.core.v1.service;
local configMap = k.core.v1.configMap;
local deployment = k.apps.v1.deployment;
local daemonSet = k.apps.v1.daemonSet;

local images = {
agent: 'grafana/agent:main',
Expand Down Expand Up @@ -61,11 +64,11 @@ local smoke = {
otlp: {
protocols: {
grpc: {
endpoint: "0.0.0.0:4317"
endpoint: '0.0.0.0:4317',
},
},
},
}
},
},
tempo_config+: {
querier: {
Expand All @@ -75,15 +78,38 @@ local smoke = {
},
},
tempo_statefulset+:
statefulset.mixin.metadata.withNamespace("smoke"),
statefulset.mixin.metadata.withNamespace('smoke'),
tempo_service+:
service.mixin.metadata.withNamespace("smoke"),
service.mixin.metadata.withNamespace('smoke'),
tempo_headless_service+:
service.mixin.metadata.withNamespace("smoke"),
service.mixin.metadata.withNamespace('smoke'),
tempo_query_configmap+:
configMap.mixin.metadata.withNamespace("smoke"),
configMap.mixin.metadata.withNamespace('smoke'),
tempo_configmap+:
configMap.mixin.metadata.withNamespace("smoke")
configMap.mixin.metadata.withNamespace('smoke'),
},

loki: loki.new(namespace='smoke'),

// https://grafana.com/docs/loki/latest/operations/loki-canary/
canary: canary {
loki_canary_args+:: {
addr: 'loki:80',
port: '80',
tls: false,
labelname: 'instance',
labelvalue: '$(POD_NAME)',
interval: '1s',
'metric-test-interval': '30m',
'metric-test-range': '2h',
size: 1024,
wait: '3m',
},
_config+:: {
namespace: 'smoke',
},
loki_canary_daemonset+:
daemonSet.mixin.metadata.withNamespace('smoke'),
},

// Needed to run agent cluster
Expand All @@ -106,8 +132,8 @@ local smoke = {

vulture: vulture {
_images+:: {
tempo_vulture: 'grafana/tempo-vulture:latest'
},
tempo_vulture: 'grafana/tempo-vulture:latest',
},
_config+:: {
vulture: {
replicas: 1,
Expand All @@ -118,10 +144,10 @@ local smoke = {
tempoSearchBackoffDuration: '0s',
tempoReadBackoffDuration: '10s',
tempoWriteBackoffDuration: '10s',
}
},
},
tempo_vulture_deployment+:
deployment.mixin.metadata.withNamespace("smoke")
deployment.mixin.metadata.withNamespace('smoke'),
},

local metric_instances(crow_name) = [{
Expand Down Expand Up @@ -258,6 +284,104 @@ local smoke = {
}],
},
],
}, {
name: 'canary',
remote_write: [
{
url: 'http://cortex/api/prom/push',
write_relabel_configs: [
{
source_labels: ['__name__'],
regex: 'avalanche_.*',
action: 'drop',
},
],
},
{
url: 'http://smoke-test:19090/api/prom/push',
write_relabel_configs: [
{
source_labels: ['__name__'],
regex: 'avalanche_.*',
action: 'keep',
},
],
},
],
scrape_configs: [
{
job_name: 'canary',
kubernetes_sd_configs: [{ role: 'pod' }],
tls_config: {
ca_file: '/var/run/secrets/kubernetes.io/serviceaccount/ca.crt',
},
bearer_token_file: '/var/run/secrets/kubernetes.io/serviceaccount/token',

relabel_configs: [
{
source_labels: ['__meta_kubernetes_namespace'],
regex: 'smoke',
action: 'keep',
},
{
source_labels: ['__meta_kubernetes_pod_container_name'],
regex: 'canary',
action: 'keep',
},
],
},
],
}],

local logs_instances() = [{
name: 'write-loki',
clients: [{
url: 'http://loki/loki/api/v1/push',
basic_auth: {
username: '104334',
password: 'noauth',
},
external_labels: {
cluster: 'grafana-agent',
},

}],
scrape_configs: [{
job_name: 'write-canary-output',
kubernetes_sd_configs: [{ role: 'pod' }],
pipeline_stages: [
{ cri: {} },
],
relabel_configs: [
{
source_labels: ['__meta_kubernetes_namespace'],
regex: 'smoke',
action: 'keep',
},
{
source_labels: ['__meta_kubernetes_pod_container_name'],
regex: 'loki-canary',
action: 'keep',
},
{
action: 'replace',
source_labels: ['__meta_kubernetes_pod_uid', '__meta_kubernetes_pod_container_name'],
target_label: '__path__',
separator: '/',
replacement: '/var/log/pods/*$1/*.log',
},
{
action: 'replace',
source_labels: ['__meta_kubernetes_pod_name'],
target_label: 'pod',
},
{
action: 'replace',
source_labels: ['__meta_kubernetes_pod_name'],
target_label: 'instance',
},
],
}],
}],

normal_agent:
Expand All @@ -278,8 +402,13 @@ local smoke = {
gragent.withPortsMixin([
containerPort.new('thrift-grpc', 14250) + containerPort.withProtocol('TCP'),
]) +
gragent.withLogVolumeMounts() +
gragent.withAgentConfig({
server: { log_level: 'debug' },
logs: {
positions_directory: '/var/lib/agent/logs-positions',
configs: logs_instances(),
},

prometheus: {
global: {
Expand All @@ -293,28 +422,28 @@ local smoke = {
},
traces: {
configs: [
{
name: "vulture",
receivers: {
jaeger: {
protocols: {
grpc: null
}
}
{
name: 'vulture',
receivers: {
jaeger: {
protocols: {
grpc: null,
},
remote_write: [
{
endpoint: "tempo:4317",
insecure: true
}
],
batch: {
timeout: "5s",
send_batch_size: 100
}
}
]
}
},
},
remote_write: [
{
endpoint: 'tempo:4317',
insecure: true,
},
],
batch: {
timeout: '5s',
send_batch_size: 100,
},
},
],
},
}),

cluster_agent:
Expand All @@ -332,6 +461,7 @@ local smoke = {
) +
gragent.withVolumeMountsMixin([volumeMount.new('agent-cluster-wal', '/var/lib/agent')]) +
gragent.withService() +
gragent.withLogVolumeMounts() +
gragent.withAgentConfig({
server: { log_level: 'debug' },

Expand Down
Loading

0 comments on commit 634feb6

Please sign in to comment.