-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fully-implemented, resource-aware conduit stat
#627
Comments
Some examples to validate the query patterns described above: conduit stat -n emojivoto deploy/emoji# Deployment => ReplicaSet selector
selector=$(kubectl -n emojivoto get deploy/emoji -o json | jq -r '.spec.selector.matchLabels | to_entries[] | .key+"="+.value')
# ReplicaSet => pod-template-hash
# select ReplicaSet from Deployment selector, verify ownerReference back to Deployment, parse out pod-template-hash
podtemplatehash=$(kubectl -n emojivoto get rs -l $selector -o json | jq -r '.items[] | select(.metadata.ownerReferences[].controller == true and .metadata.ownerReferences[].kind == "Deployment" and .metadata.ownerReferences[].name == "emoji") | .spec.selector.matchLabels."pod-template-hash"')
# pod-template-hash => Prometheus request volume query
curl "http://localhost:9090/api/v1/query?query=sum(request_total%7Bpod_template_hash%3D%22$podtemplatehash%22%7D)" |
@olix0r Overall LGTM, the only question I have is about the different resource types. So far we have only dealt with |
* Define a new telemetry Stat API Proposal definition for a new Stat API, for the purposes of satisfying the queries proposed in #627. StatSummary will replace Stat once implemented and the original Stat deleted.
This is a follow-up from my comment on #663. @olix0r Can you provide a bit more clarity on the The description for
The description for Based on the examples, it looks like the stat command returns inbound stats by default, but setting any of the I'm not super familiar with the intended use cases, but whether or not we return inbound or outbound stats feels like a binary flag to me, whereas I would expect for us to be able to add filtering regardless of whether or not we're returning inbound or outbound stats. |
The intention is that these are both outbound in.
That's probably copypasta
Definitely copypasta
Correct. The main resource -- with no outs specified -- selects inbound stats. Because we don't have source annotations on inbound stats, no further filtering is supported on inbound. |
That all makes sense, thanks. With that in mind, I'd like to recommend that we revise the inbound filtering flags as follows:
And the outbound filtering as follows:
Personally I'd also drop all of the |
This seems like a fine simplification for now. We may find a need to add it later, but simple is good. |
Ok, have updated accordingly. |
Start implementing new conduit stat summary endpoint. Changes the public-api to call prometheus directly instead of the telemetry service. Wired through to `api/stat` on the web server, as well as `conduit statsummary` on the CLI. Works for deployments only. Current implementation just retrieves requests and mesh/total pod count (so latency stats are always 0). Uses API defined in #663 Example queries the stat endpoint will eventually satisfy in #627 This branch includes commits from @klingerf * run ./bin/dep ensure * run ./bin/update-go-deps-shas
The cli and public-api only supported deployments as a resource type. This change adds support for namespace as a resource type in the cli and public-api. This also change includes: - the cli statsummary command now prints `-`'s when objects are not in the mesh - removed `out-` from cli statsummary flags, and analagous proto changes - switched public-api to use native prometheus label types - misc error handling and logging fixes Part of #627 Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The cli and public-api only supported deployments as a resource type. This change adds support for namespace as a resource type in the cli and public-api. This also change includes: - cli statsummary now prints `-`'s when objects are not in the mesh - cli statsummary prints `No resources found.` when applicable - removed `out-` from cli statsummary flags, and analagous proto changes - switched public-api to use native prometheus label types - misc error handling and logging fixes Part of #627 Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The cli and public-api only supported deployments as a resource type. This change adds support for namespace as a resource type in the cli and public-api. This also change includes: - cli statsummary now prints `-`'s when objects are not in the mesh - cli statsummary prints `No resources found.` when applicable - removed `out-` from cli statsummary flags, and analagous proto changes - switched public-api to use native prometheus label types - misc error handling and logging fixes Part of #627 Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The cli and public-api only supported deployments as a resource type. This change adds support for namespace as a resource type in the cli and public-api. This also change includes: - cli statsummary now prints `-`'s when objects are not in the mesh - cli statsummary prints `No resources found.` when applicable - removed `out-` from cli statsummary flags, and analagous proto changes - switched public-api to use native prometheus label types - misc error handling and logging fixes Part of #627 Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The cli and public-api only supported deployments as a resource type. This change adds support for namespace as a resource type in the cli and public-api. This also change includes: - cli statsummary now prints `-`'s when objects are not in the mesh - cli statsummary prints `No resources found.` when applicable - removed `out-` from cli statsummary flags, and analagous proto changes - switched public-api to use native prometheus label types - misc error handling and logging fixes Part of #627 Signed-off-by: Andrew Seigner <siggy@buoyant.io>
The cli and public-api only supported deployments as a resource type. This change adds support for namespace as a resource type in the cli and public-api. This also change includes: - cli statsummary now prints `-`'s when objects are not in the mesh - cli statsummary prints `No resources found.` when applicable - removed `out-` from cli statsummary flags, and analagous proto changes - switched public-api to use native prometheus label types - misc error handling and logging fixes Part of #627 Signed-off-by: Andrew Seigner <siggy@buoyant.io>
* Add namespace as a resource type in public-api The cli and public-api only supported deployments as a resource type. This change adds support for namespace as a resource type in the cli and public-api. This also change includes: - cli statsummary now prints `-`'s when objects are not in the mesh - cli statsummary prints `No resources found.` when applicable - removed `out-` from cli statsummary flags, and analagous proto changes - switched public-api to use native prometheus label types - misc error handling and logging fixes Part of #627 Signed-off-by: Andrew Seigner <siggy@buoyant.io> * Refactor filter and groupby label formulation Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Rename stat_summary.go to stat.go in cli Signed-off-by: Kevin Lingerfelt <kl@buoyant.io> * Update rbac privileges for namespace stats Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>
@klingerf what's still missing from this issue? |
@grampelberg I'd be fine with closing this, but I don't think we have issues tracking the remaining resources from the description for which we don't have stats. Specifically, we aren't exposing stats yet for:
|
conduit stat
conduit stat
command
conduit stat
commandconduit stat
@grampelberg Works for me -- have updated the title and description accordingly and included a task list. |
@klingerf you're a gentleman and a scholar. Thank you! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
Filtering by Kubernetes job was not supported. Also filtering by any unknown type caused a panic. Add filtering support by Kubernetes job, with special case mapping `job` to `k8s_job`, to not conflict with Prometheus' job label. Fix panic when unknown type specified as a `--from` or `--to` flag. Fix `job` label from `linkerd-proxy` overwriting Prometheus `job` label at collection time. This caused all metrics collected by proxy sidecars in Kubernetes jobs to be collected into an incorrect Prometheus job, rather than the expected `linkerd-proxy` Prometheus job. Fix `unsupported resource type` tap error message incorrectly printing the target resource rather than the destination. Set `--controller-log-level debug` in `install_test.go` for easier debugging. Expose `slow-cooker`'s metrics via a k8s service in the tap integration test, to validate proxy requests with a job as destination. Fixes #1872 Part of #627 Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Filtering by Kubernetes job was not supported. Also filtering by any unknown type caused a panic. Add filtering support by Kubernetes job, with special case mapping `job` to `k8s_job`, to not conflict with Prometheus' job label. Fix panic when unknown type specified as a `--from` or `--to` flag. Fix `job` label from `linkerd-proxy` overwriting Prometheus `job` label at collection time. This caused all metrics collected by proxy sidecars in Kubernetes jobs to be collected into an incorrect Prometheus job, rather than the expected `linkerd-proxy` Prometheus job. Fix `unsupported resource type` tap error message incorrectly printing the target resource rather than the destination. Set `--controller-log-level debug` in `install_test.go` for easier debugging. Expose `slow-cooker`'s metrics via a k8s service in the tap integration test, to validate proxy requests with a job as destination. Fixes #1872 Part of #627 Signed-off-by: Andrew Seigner <siggy@buoyant.io>
Tracking ticket for a fully-implemented, resource-aware
conduit stat
command.Tasks
RFC
conduit help stat
Example output:
@adleong @siggy
The text was updated successfully, but these errors were encountered: