Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filtering by job in stat, tap, top; fix panic #1904

Merged
merged 2 commits into from
Dec 3, 2018
Merged

Conversation

siggy
Copy link
Member

@siggy siggy commented Dec 1, 2018

Filtering by Kubernetes job was not supported. Also filtering by any unknown
type caused a panic.

Add filtering support by Kubernetes job, with special case mapping job to
k8s_job, to not conflict with Prometheus' job label.

Fix panic when unknown type specified as a --from or --to flag.

Fix job label from linkerd-proxy overwriting Prometheus job label at
collection time. This caused all metrics collected by proxy sidecars in
Kubernetes jobs to be collected into an incorrect Prometheus job, rather than
the expected linkerd-proxy Prometheus job.

Fix unsupported resource type tap error message incorrectly printing the
target resource rather than the destination.

Set --controller-log-level debug in install_test.go for easier debugging.

Expose slow-cooker's metrics via a k8s service in the tap integration test, to
validate proxy requests with a job as destination.

Fixes #1872
Part of #627

Signed-off-by: Andrew Seigner siggy@buoyant.io

Validating commands

Deploy integration tests

bin/fast-build
bin/test-run `pwd`/bin/linkerd

Send traffic to a Kubernetes job

kubectl -n linkerd-tap-test exec -it $(kubectl -n linkerd-tap-test get po --selector=app=t1 -o jsonpath='{.items[*].metadata.name}') -c t1 -- bash -c "while true; do curl slow-cooker:9999/metrics; sleep 1; done"

View traffic to/from Kubernetes job

$ bin/linkerd stat deploy -n linkerd-tap-test --to job/slow-cooker
NAME   MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TLS
t1        1/1   100.00%   1.0rps           4ms           9ms          10ms    0%

$ bin/linkerd stat deploy -n linkerd-tap-test --from job/slow-cooker
NAME      MESHED   SUCCESS      RPS   LATENCY_P50   LATENCY_P95   LATENCY_P99   TLS
gateway      1/1     0.00%   1.0rps          23ms          38ms          40ms    0%

$ bin/linkerd tap deploy -n linkerd-tap-test --to job/slow-cooker
req id=0:1259 proxy=out src=10.1.7.175:47410 dst=10.1.7.179:9999 tls=no_identity :method=GET :authority=slow-cooker:9999 :path=/metrics
rsp id=0:1259 proxy=out src=10.1.7.175:47410 dst=10.1.7.179:9999 tls=no_identity :status=200 latency=7696µs
end id=0:1259 proxy=out src=10.1.7.175:47410 dst=10.1.7.179:9999 tls=no_identity duration=279µs response-length=7630B

Filtering by Kubernetes job was not supported. Also filtering by any unknown
type caused a panic.

Add filtering support by Kubernetes job, with special case mapping `job` to
`k8s_job`, to not conflict with Prometheus' job label.

Fix panic when unknown type specified as a `--from` or `--to` flag.

Fix `job` label from `linkerd-proxy` overwriting Prometheus `job` label at
collection time. This caused all metrics collected by proxy sidecars in
Kubernetes jobs to be collected into an incorrect Prometheus job, rather than
the expected `linkerd-proxy` Prometheus job.

Fix `unsupported resource type` tap error message incorrectly printing the
target resource rather than the destination.

Set `--controller-log-level debug` in `install_test.go` for easier debugging.

Expose `slow-cooker`'s metrics via a k8s service in the tap integration test, to
validate proxy requests with a job as destination.

Fixes #1872
Part of #627

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Copy link
Contributor

@klingerf klingerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⭐️ Huge improvement, and hooray for more tests! I just had a few small wording nits, but otherwise lgtm.

cli/cmd/tap.go Outdated
* deployments
* namespaces
* pods
* replicationcontrollers
* services (only supported as a "--to" resource)`,
* services (only supported as a --to resource)
* jobs (only supported as a --from or --to)`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the services wording, I'd write this as:

  • jobs (only supported as a --from or --to resource)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, on closer inspection, it looks like tap and top only support a --to flag.

bin/linkerd -l kltest top deploy -n kltest-tap-test --from job/slow-cooker
Error: unknown flag: --from

Offhand it seems like it should be possible for them to support a --from flag, but I don't think we should implement that as part of this PR. In the meantime, can you change this to:

  • jobs (only supported as a --to resource)

@@ -491,6 +491,9 @@ data:
- source_labels: [__meta_kubernetes_pod_label_linkerd_io_proxy_job]
action: replace
target_label: k8s_job
# drop __meta_kubernetes_pod_label_linkerd_io_proxy_job
- action: labeldrop
regex: __meta_kubernetes_pod_label_linkerd_io_proxy_job
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Wow, nice catch.

cli/cmd/top.go Outdated
* deployments
* namespaces
* pods
* replicationcontrollers
* services (only supported as a "--to" resource)`,
* services (only supported as a --to resource)
* jobs (only supported as a --from or --to)`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here as in tap.go:

  • jobs (only supported as a --to resource)

if err != nil {
return pb.Resource{}, err
}

return res[0], err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱

Signed-off-by: Andrew Seigner <siggy@buoyant.io>
@siggy siggy merged commit 37a5455 into master Dec 3, 2018
@siggy siggy deleted the siggy/job-panic branch December 3, 2018 23:34
@siggy siggy restored the siggy/job-panic branch December 3, 2018 23:45
@siggy siggy deleted the siggy/job-panic branch December 3, 2018 23:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants