-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ability to query stats by authority #1181
Conversation
Add documentation to Resource so we can use it for authority Handle non-k8s resource requests Rewrite stat summary fetching and parsing to handle non-k8s resources Requesting an authority as primary breakout now works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Excited to see these stats. I had a few bits of miscellaneous feedback.
I'm a bit worried that it's going to get confusing to have both a CanonicalResourceType
func and a CanonicalKubernetesNameFromFriendlyName
func in the long run. I think my preference would be to just modify and rename the existing methods in pkg/k8s
, and not add new methods that wrap those methods. It's ok by me that "authority" isn't actually a kubernetes resource. If it would be less confusion, we could move all of the code and constants to a new package, like pkg/resource
or something like that.
controller/api/public/test_helper.go
Outdated
TimeWindow: "1m", | ||
} | ||
|
||
if !counts.OmitPodCounts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIOLI I think you could avoid needing this OmitPodCounts
field altogether if you updated this method to take a pointer to PodCounts
:
func GenStatSummaryResponse(resName, resType, resNs string, counts *PodCounts) pb.StatSummaryResponse {
And then here just checked to make sure it isn't nil:
if counts != nil {
statTableRow.MeshedPodCount = counts.MeshedPods
statTableRow.RunningPodCount = counts.RunningPods
statTableRow.FailedPodCount = counts.FailedPods
}
That way, in tests where you don't want counts, you could just pass nil
instead of PodCounts{OmitPodCounts: true}
.
// This can be: | ||
// - "all" -- includes all Kubernetes resource types only | ||
// - "authority" -- a special resource type derived from request `:authority` values | ||
// - Otherwise, the resource type may be any Kubernetes resource (e.g. "namespace", "deployment"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
controller/api/util/api_utils.go
Outdated
|
||
// An authority can only receive traffic, not send it | ||
func ValidateToResourceType(resourceType string) (string, error) { | ||
return CanonicalResourceType(resourceType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like ValidateToResourceType
and CanonicalResourceType
are exact aliases of the same function. What's the goal of having both of these? Maybe we can just use CanonicalResourceType
everywhere?
cli/cmd/stat.go
Outdated
return k8s.ShortNameFromCanonicalKubernetesName(resourceType) + "/" | ||
// the type doesn't come from the user here; skip err check | ||
canonicalType, _ := util.CanonicalResourceType(resourceType) | ||
return canonicalType + "/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe we should be using util.ShortNameFromCanonicalName
here? That seems more equivalent to what we were doing previously. Plus it looks like that func is unused.
With this change, we're now seeing the full resource name in the stat all
output:
$ bin/conduit -c kltest stat all -n kltest
NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 SECURED
deployments/controller 1/1 100.00% 1.2rps 5ms 10ms 10ms 0%
deployments/grafana 1/1 100.00% 0.3rps 5ms 10ms 10ms 0%
deployments/prometheus 1/1 100.00% 0.6rps 5ms 10ms 10ms 0%
deployments/web 1/1 100.00% 0.3rps 5ms 10ms 10ms 0%
Compared to:
$ conduit -c kltest stat all -n kltest
NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 SECURED
deploy/controller 1/1 100.00% 1.2rps 5ms 10ms 10ms 0%
deploy/grafana 1/1 100.00% 0.3rps 5ms 10ms 10ms 0%
deploy/prometheus 1/1 100.00% 0.5rps 5ms 9ms 10ms 0%
deploy/web 1/1 100.00% 0.3rps 5ms 10ms 10ms 0%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah! yeah! thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created ShortNameFromCanonicalName for this purpose and somehow forgot to use it :/
controller/api/util/api_utils.go
Outdated
return CanonicalResourceType(resourceType) | ||
} | ||
|
||
func ValidateFromResourceType(resourceType string) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like ValidateToResourceType
and ValidateFromResourceType
methods don't need to be exported, since they're not called outside of this package.
controller/api/util/api_utils.go
Outdated
@@ -176,6 +179,30 @@ func BuildStatSummaryRequest(p StatSummaryRequestParams) (*pb.StatSummaryRequest | |||
return statRequest, nil | |||
} | |||
|
|||
// Ensures a valid resource type - a canonical k8s object, or 'authority' | |||
func CanonicalResourceType(resourceType string) (string, error) { | |||
if resourceType == Authority { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with CanonicalKubernetesNameFromFriendlyName
, I think this method should also recognize "au" as a valid resourceType string.
Yeah, that's true. It would be less confusing in the long run to do away with the wrapper methods. I'll fold those into the k8s methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making those updates! I had one pretty substantial refactor to suggest in stat_summary.go, but the rest of this setup is looking good, and these authority stats are incredibly awesome.
$ ./bin/go-run cli stat au -n conduit-egress-test --to au
NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99 SECURED
www.httpbin.org 0/0 100.00% 0.5rps 150ms 195ms 199ms 0%
controller/api/util/api_utils.go
Outdated
@@ -21,6 +21,9 @@ import ( | |||
var ( | |||
defaultMetricTimeWindow = "1m" | |||
|
|||
Authority = "authority" // non-k8s resource type | |||
AuthorityShortName = "au" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not re-define these strings outside of pkg/k8s/k8s.go
. Instead, I think you can remove them if you make the change to validateFromResourceType
that I suggest below.
controller/api/util/api_utils.go
Outdated
if resourceType == Authority || resourceType == AuthorityShortName { | ||
return "", errors.New("cannot query traffic --from an authority") | ||
} | ||
return k8s.CanonicalResourceNameFromFriendlyName(resourceType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can avoid defining the Authority
and AuthorityShortName
consts in this file if we change this method to:
func validateFromResourceType(resourceType string) (string, error) {
name, err := k8s.CanonicalResourceNameFromFriendlyName(resourceType)
if err != nil {
return "", err
}
if name == k8s.Authority {
return "", errors.New("cannot query traffic --from an authority")
}
return name, nil
}
@@ -128,37 +131,50 @@ func statSummaryError(req *pb.StatSummaryRequest, message string) *pb.StatSummar | |||
} | |||
} | |||
|
|||
func (s *grpcServer) resourceQuery(ctx context.Context, req *pb.StatSummaryRequest) resourceResult { | |||
func (s *grpcServer) getKubernetesObjectStats(req *pb.StatSummaryRequest) (map[pb.Resource]k8sStat, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a valiant effort to reuse all of the existing k8s resource code, but I think that authorities are different enough from k8s resources that we'd be better off just forking to a different method entirely in the case of authorities. I recommend leaving the resourceQuery
method mostly in tact, but renaming it to k8sResourceQuery
or something similar. And then we could add a nonK8sResourceQuery
method that looks like this:
func (s *grpcServer) nonK8sResourceQuery(ctx context.Context, req *pb.StatSummaryRequest) resourceResult {
rows := make([]*pb.StatTable_PodGroup_Row, 0)
requestMetrics, err := s.getPrometheusMetrics(ctx, req, req.TimeWindow)
if err != nil {
return resourceResult{err: err}
}
for key, stats := range requestMetrics {
resource := key
row := pb.StatTable_PodGroup_Row{
Resource: &resource,
TimeWindow: req.TimeWindow,
Stats: stats,
}
rows = append(rows, &row)
}
rsp := pb.StatTable{
Table: &pb.StatTable_PodGroup_{
PodGroup: &pb.StatTable_PodGroup{
Rows: rows,
},
},
}
return resourceResult{res: &rsp}
}
And then above, on line 96 where we fire off the queries in separate goroutines, we can do something like:
go func() {
if resource == k8s.Authority {
resultChan <- s.nonK8sResourceQuery(ctx, statReq)
} else {
resultChan <- s.k8sResourceQuery(ctx, statReq)
}
}()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! You're right, just splitting these out into separate cases makes them easier to follow, and we don't have to keep doing the k8s/nonk8s checks everywhere
@@ -292,19 +364,21 @@ func buildRequestLabels(req *pb.StatSummaryRequest) (labels model.LabelSet, labe | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need to make a special exception when the ToResource
is an authority, since we won't have any dst_authority
labels, and we won't have dst_ns
labels when the authority is external to the k8s cluster. So I recommend changing the ToResource
case to:
case *pb.StatSummaryRequest_ToResource:
labelNames = promLabelNames(req.Selector.Resource)
if req.GetSelector().GetResource().GetType() != k8s.Authority {
labels = labels.Merge(promDstLabels(out.ToResource))
}
labels = labels.Merge(promLabels(req.Selector.Resource))
labels = labels.Merge(promDirectionLabels("outbound"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored some of the label methods, maybe this will help. They are kind of complicated.
I think we can defer the setting of the labels to promDstLabels
, since if we're querying a ToAuthority
, we'd still need to set the label {authority=to_authority_name}
, which won't be set by req.Selector.Resource.Name
(we need to get it from out.ToResource.Name
)
Currently promDstLabels does take care of checking to make sure we don't add a dst_authority
or a dst_ns
in the case of a ToAuthority
query. And it sets the {authority=to_authority}
that we want to filter by:
// query a named resource
func promDstLabels(resource *pb.Resource) model.LabelSet {
set := model.LabelSet{}
if resource.Name != "" {
if resource.Type == k8s.Authority {
set[promResourceType(resource)] = model.LabelValue(resource.Name)
} else {
set["dst_"+promResourceType(resource)] = model.LabelValue(resource.Name)
if shouldAddNamespaceLabel(resource) {
set[dstNamespaceLabel] = model.LabelValue(resource.Namespace)
}
}
}
return set
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making those updates! I had just a few more comments below, and then this should be good to go.
@@ -303,7 +303,8 @@ func getNamePrefix(resourceType string) string { | |||
if resourceType == "" { | |||
return "" | |||
} else { | |||
return k8s.ShortNameFromCanonicalKubernetesName(resourceType) + "/" | |||
canonicalType := k8s.ShortNameFromCanonicalResourceName(resourceType) | |||
return canonicalType + "/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also update the long description for the stat command to include "authority" as a valid resource type. On line 62 of this file:
* authorities
* deployments
* namespaces
* pods
* replicationcontrollers
* services (only supported if a --from is also specified, or as a --to)
* all (all resource types, not supported in --from or --to)
Can you also remove the quotes around --from and --to in the description for "services", to match the description for "all"?
Also, while you're there, can you fix this issue with the examples:
Examples:
* deploy
* deploy/my-deploy
* deploy my-deploy
* ns/my-ns
* all
"deploy/my-deploy" is listed twice. Maybe that should be "rc/my-replication-controller"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related: do you think it makes sense to add authority to the output of conduit stat all
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, yeah, I think it does!
} | ||
return strings.Join(values, "/") | ||
|
||
key.Name = resourceKeys[resourceName] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this method is properly formulating keys for namespace-type resources. The stat integration test is failing on this branch with:
--- FAIL: TestCliStatForConduitNamespace (20.21s)
--- PASS: TestCliStatForConduitNamespace/test_conduit_stat_deploy (0.09s)
--- FAIL: TestCliStatForConduitNamespace/test_conduit_stat_namespace (20.12s)
stat_test.go:88: Expected success rate [100.00%] for [kltest], got [-]
The issue is that this method returns an empty key for the conduit namespace, whereas is should be returning a key with {Name: <conduit-ns>}
.
I think we could fix and simplify this with something like:
func metricToKey(metric model.Metric, groupBy model.LabelNames) pb.Resource {
// i.e. produce a key using the resource name and namespace
key := pb.Resource{}
for _, k := range groupBy {
item := string(k)
if _, ok := k8s.ProxyLabelsToResourceTypes[item]; ok {
if item == "namespace" && len(groupBy) == 2 {
key.Namespace = string(metric[k])
} else {
key.Name = string(metric[k])
}
}
}
return key
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, and actually, if you make that change, then I don't think you need ProxyLabelsToResourceTypes
at all, since groupBy
will always contain either "namespace", or the string representing the name. In which case you could simplify even further to:
func metricToKey(metric model.Metric, groupBy model.LabelNames) pb.Resource {
key := pb.Resource{}
for _, k := range groupBy {
if string(k) == "namespace" && len(groupBy) == 2 {
key.Namespace = string(metric[k])
} else {
key.Name = string(metric[k])
}
}
return key
}
That would also allow you to remove ProxyLabelsToResourceTypes
and reverseMap
from pkg/k8s/k8s.go
.
Added to
|
* Enable Authorities in web UI, under /authorities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⭐️ Awesome. I just had a few more tiny nits, but this looks great!
@@ -200,8 +203,12 @@ func writeStatsToBuffer(resp *pb.StatSummaryResponse, reqResourceType string, w | |||
maxNamespaceLength = len(namespace) | |||
} | |||
|
|||
meshedCount := fmt.Sprintf("%d/%d", r.MeshedPodCount, r.RunningPodCount) | |||
if resourceKey == k8s.Authorities { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! Rather than checking the resource type, I think it might be more future-proof to check if r.RunningPodCount == 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's fair. I'd originally wanted to keep the 0/0
as a clearer indicator of 0 running pods, as opposed to -
which I wanted to imply N/A
. (Like, a 0 value vs a null value) What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah, I hadn't thought about that. Like, if we scaled a deployment down to 0 replicas? In that case I think you're totally right that 0/0 would be more useful. Totally fine to leave this as is then.
this.api.ConduitLink | ||
)); | ||
|
||
let locale = { | ||
emptyText: `No ${this.props.resource}s detected.` | ||
emptyText: `No ${resource}s detected.` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use friendlyTitle
here?
web/app/js/components/Sidebar.jsx
Outdated
<Icon type="cloud-o" /> | ||
<span>Authorities</span> | ||
</ConduitLink> | ||
</Menu.Item> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than adding this as a top-level menu item, it feels like it belongs in the resource sub-menu, alongside deployments, replication controllers, and pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see integration tests! I had just a few more comments about getting the integration tests to pass in all environments.
} | ||
if resource.Type != k8s.Namespaces && resource.Namespace != "" { | ||
set[dstNamespaceLabel] = model.LabelValue(resource.Namespace) | ||
if resource.Type == k8s.Authorities { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIOLI you could use the isNonK8sResourceQuery
method that you're adding here.
for _, k := range groupBy { | ||
// return namespace/resource | ||
values = append(values, string(metric[k])) | ||
if string(k) == "namespace" && len(groupBy) == 2 && resourceNs == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm unfortunately having a pretty hard time reasoning about this method. I think we should go back to assuming that the groupBy
slice is ordered, and that the namespace returned with the metric is the namespace to use when formulating the key. That will allow you to always use the last element in the slice as the key for the Name
field, without needing to worry about the value of any of the elements in the slice. You could then rewrite as:
func metricToKey(req *pb.StatSummaryRequest, metric model.Metric, groupBy model.LabelNames) pb.Resource {
key := pb.Resource{
Type: req.GetSelector().GetResource().GetType(),
Name: string(metric[groupBy[len(groupBy)-1]]),
}
if len(groupBy) == 2 {
key.Namespace = string(metric[groupBy[0]])
}
return key
}
That's more consistent with the original implementation, and afaict it works for all types including authorities.
test/stat/stat_test.go
Outdated
@@ -96,6 +105,18 @@ func TestCliStatForConduitNamespace(t *testing.T) { | |||
TestHelper.GetConduitNamespace(): "4/4", | |||
}, | |||
}, | |||
{ | |||
args: []string{"stat", "po", "-n", TestHelper.GetConduitNamespace(), "--to", "au/prometheus.conduit.svc.cluster.local:9090"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately "au/prometheus.conduit.svc.cluster.local:9090" only works when conduit is running in the default "conduit" namespace, which won't be the case in CI. Instead, I think you should define a separate prometheusAuthority
var above:
prometheusAuthority := "prometheus." + TestHelper.GetConduitNamespace() + ".svc.cluster.local:9090"
And then use that here:
args: []string{"stat", "po", "-n", TestHelper.GetConduitNamespace(), "--to", "au/" + prometheusAuthority},
test/stat/stat_test.go
Outdated
expectedRows: map[string]string{ | ||
"prometheus.conduit.svc.cluster.local:9090": "-", | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test case would be more valuable if it were the inverse of the previous test. Specifically, the previous test requests stats from pods to a specified authority, and I think this one should test stats from authorities to a specified pod. I'd rewrite this test case as:
{
args: []string{"stat", "au", "-n", TestHelper.GetConduitNamespace(), "--to", "po/" + prometheusPod},
expectedRows: map[string]string{
prometheusAuthority: "-",
},
},
test/stat/stat_test.go
Outdated
@@ -53,6 +53,15 @@ func TestCliStatForConduitNamespace(t *testing.T) { | |||
} | |||
prometheusPod := pods[0] | |||
|
|||
controllerPods, err := TestHelper.GetPodsForDeployment(TestHelper.GetConduitNamespace(), "controller") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super minor, but you could reuse the pods
and err
variables from above, avoiding the needs for the new controllerPods
var. I'd rewrite this as:
pods, err = TestHelper.GetPodsForDeployment(TestHelper.GetConduitNamespace(), "controller")
if err != nil {
t.Fatalf("Failed to get pods for controller: %s", err)
}
if len(pods) != 1 {
t.Fatalf("Expected 1 pod for controller, got %d", len(pods))
}
controllerPod := pods[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⭐️ Thanks for adding those additional tests! This looks great.
}, | ||
mockPromResponse: model.Vector{ | ||
genPromSample("10.1.1.239:9995", "authority", "conduit", "success"), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIOLI, since genPromSample
doesn't generate data with dst_*
labels, the data that it does generate isn't returned. So I think it might be clearer to not generate any prometheus data at all. The test will still pass if you change this too:
mockPromResponse: model.Vector{},
Same goes for the second test that you added too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhhh thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⭐️ Looks great, thanks for fixing that link issue!
Resource
ofType
authority. This is the first non-k8s resource we've added.stat_summary.go
since our current method of returning results is very dependent on kubernetes objects.stat_summary.go
to try to clean it up before adding authorityCLI examples:
Questions:
namespace
be set at all? Authorities can be non-k8s and have no nsWill generate the prometheus query
This is because we set dst_namespace to the namespace of the request, but for an authority, I'm not sure we need the dst_namespace as well.
Follow ups:
Fixes #1147