-
Notifications
You must be signed in to change notification settings - Fork 80
enhance health scope with informative health condition #194
enhance health scope with informative health condition #194
Conversation
captainroy-hy
commented
Sep 3, 2020
- Make health scope show more information in HealthCondition.
- Add CheckByHealthCheckTrait interface
- Modify e2e-test & unit tests
add unit test Signed-off-by: roy wang <seiwy2010@gmail.com>
Signed-off-by: roy wang <seiwy2010@gmail.com>
Add CheckByHealthCheckTrait interface Signed-off-by: roy wang <seiwy2010@gmail.com>
f0851b3
to
528ec10
Compare
Signed-off-by: roy wang <seiwy2010@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job
r.Target.UID = cwObj.GetUID() | ||
compName, exist := cwObj.GetLabels()[oam.LabelAppComponent] | ||
if !exist { | ||
compName = "UNKNOWN NAME" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep it empty is better if there's not label found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I will fix it.
// TODO(artursouza): enable this check once crossplane is updated. | ||
//if len(healthScope.Spec.WorkloadReferences) == 0 { | ||
// return false | ||
//} | ||
|
||
return healthScope.Status.Health == "healthy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't keep the compatibility here?
@@ -253,21 +249,6 @@ var _ = Describe("Test GetScopeHealthStatus", func() { | |||
return nil | |||
}, | |||
}, | |||
{ | |||
caseName: "2 general workloads but one is unhealthy", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why delete this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the general check logic (check any workloads having status.readyReplicas
) is removed, so this case doesn't make sense anymore.
errUpdateHealthScopeStatus = "cannot update health scope status" | ||
) | ||
|
||
const ( | ||
infoFmtScopeDiagnosis = "total workloads: %d, healthy workloads: %d, unhealthy workloads: %d, unknown workloads: %d" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we can also provide a structural way here, makes it easy for other traits re-use this info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it.
@@ -41,7 +53,22 @@ type HealthScopeSpec struct { | |||
type HealthScopeStatus struct { | |||
runtimev1alpha1.ConditionedStatus `json:",inline"` | |||
|
|||
Health string `json:"health"` | |||
// ScopeHealthCondition represents health condition summary of the scope | |||
ScopeHealthCondition HealthCondition `json:"scopeHealthCondition"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they should be different struct as some of the field like componentName don't work here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, let me add a specific struct for ScopeHealthCondition, we can replace
infoFmtScopeDiagnosis = "total workloads: %d, healthy workloads: %d, unhealthy workloads: %d, unknown workloads: %d"
with the new struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wonderflow any suggestion on the scope health condition struct? e.g. additional info that may be consumed by traits.
// ScopeHealthCondition represents health condition summary of a scope.
type ScopeHealthCondition struct {
HealthStatus HealthStatus `json:"healthStatus"`
Total int64 `json:"total,omitempty"`
HealthyWorkloads int64 `json:"healthyWorkloads,omitempty"`
UnhealthyWorkloads int64 `json:"unhealthyWorkloads,omitempty"`
UnknownWorkloads int64 `json:"unknownWorkloads,omitempty"`
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The struct is good
compName, exist := deployment.GetLabels()[oam.LabelAppComponent] | ||
if !exist { | ||
compName = "UNKNOWN NAME" | ||
} | ||
r.ComponentName = compName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this code twice, can we make it a common function, just put the runtime.Object
as argument
compName, exist := statefulset.GetLabels()[oam.LabelAppComponent] | ||
if !exist { | ||
compName = "UNKNOWN NAME" | ||
} | ||
r.ComponentName = compName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again here
compName, exist := daemonset.GetLabels()[oam.LabelAppComponent] | ||
if !exist { | ||
compName = "UNKNOWN NAME" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
wl := &unstructured.Unstructured{} | ||
wl.SetGroupVersionKind(wlRef.GroupVersionKind()) | ||
if err := c.Get(ctx, client.ObjectKey{Namespace: ns, Name: wlRef.Name}, wl); err != nil { | ||
healthCondition.Diagnosis += errors.Wrap(err, errHealthCheck).Error() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not combine string, I don't see we have more than one error message to combine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will fix it.
var wlStatusC *HealthCondition | ||
var wlHealthCondition *HealthCondition | ||
|
||
wlHealthCondition = r.traitChecker.Check(ctx, r.client, resRef, healthScope.GetNamespace()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why traitChecker has higher priority than built-in ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's possible for user to override health check logic on these "built-in" workloads.
modify e2e-test & unit test Signed-off-by: roy wang <seiwy2010@gmail.com>
ping @artursouza this PR may contain some un-compatible changes, please confirm if it's OK? Does healthscope already be in use in any cases of MSFT? |
It is OK to break compatibility. Let me look at the PR now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great!
if deployment.Status.ReadyReplicas == 0 { | ||
r.Diagnosis = fmt.Sprintf(errFmtResourceNotReady, deployment.Status) | ||
// Health criteria | ||
if deployment.Status.ReadyReplicas != requiredReplicas { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there are more ReadyReplicas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can treat this situation as "unhealthy" just like this case "readyReplicas greater than replicas" ?
8e7c793
to
8b62ad5
Compare
hi @wonderflow @ryanzhang-oss May I trouble you to help review this pr? thx! |
} | ||
|
||
func getComponentNameFromLabel(o metav1.Object) string { | ||
compName, exist := o.GetLabels()[oam.LabelAppComponent] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we have risk of panic here? if o.GetLabels return nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nil of map type would not lead a panic but nil of metav1.Object interface would. I should add nil check for o metav1.Object
} | ||
|
||
// CheckUnknownWorkload handles unknown type workloads. | ||
func CheckUnknownWorkload(ctx context.Context, c client.Client, wlRef runtimev1alpha1.TypedReference, ns string) *WorkloadHealthCondition { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see any test case for this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tested here. Let me separate it into healthscope_test.go
Signed-off-by: roy wang <seiwy2010@gmail.com>
ac0a7b1
to
6ad4984
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: roy wang <seiwy2010@gmail.com>