-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Egress status when configuring static Egress #2444
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2444 +/- ##
==========================================
+ Coverage 59.78% 64.99% +5.20%
==========================================
Files 284 284
Lines 22265 25637 +3372
==========================================
+ Hits 13312 16662 +3350
+ Misses 7535 7423 -112
- Partials 1418 1552 +134
Flags with carried forward coverage won't be shown. Click here to find out more.
|
a4cba02
to
6fc5982
Compare
test/e2e/egress_test.go
Outdated
@@ -167,6 +167,8 @@ ip netns exec %[1]s /agnhost netexec | |||
defer data.crdClient.CrdV1alpha2().Egresses().Delete(context.TODO(), egress.Name, metav1.DeleteOptions{}) | |||
assertClientIP(localPod, egressNodeIP) | |||
assertClientIP(remotePod, egressNodeIP) | |||
egressState, _ := data.crdClient.CrdV1alpha2().Egresses().Get(context.TODO(), egress.Name, metav1.GetOptions{}) | |||
assert.Equal(t, egressNode, egressState.Status.EgressNode, "Egress status not match") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it cause the test flaky? The status update is asynchrounous, the client IP check is quite quick. Normally the test uses wait.Poll or wait.PollImmediate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should remove this assertion? otherwise it will still fail the test
It seems all comments have been addressed. PTAL. @tnqn |
if egress.Status.EgressNode == nodeName { | ||
func (c *EgressController) updateEgressStatus(egressName, nodeName string) error { | ||
if err := retry.RetryOnConflict(retry.DefaultRetry, func() error { | ||
egress, err := c.crdClient.CrdV1alpha2().Egresses().Get(context.TODO(), egressName, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally the Egress in Lister is in sync, it should only get the value from API when an update meets conflict (which means the cache in Lister is out of sync for this resource). Otherwise it would waste one GET request for each update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return err | ||
} | ||
actualStatus := egress.Status.EgressNode | ||
if actualStatus == nodeName || (nodeName == "" && actualStatus != c.nodeName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This special check worths a comment. otherwise what the method does may be surprizing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about changing the argument to a bool isLocal
to make the method more readable?
If it's true, the method needs to ensure egress.Status.EgressNode = c.nodeName
.
If it's false, the method needs to ensure egress.Status.EgressNode != c.nodeName
by updating it to ""
.
return fmt.Errorf("updating Egress %s status error: %v", egress.Name, err) | ||
func (c *EgressController) updateEgressStatus(egressName, nodeName string) error { | ||
if err := retry.RetryOnConflict(retry.DefaultRetry, func() error { | ||
egress, err := c.egressLister.Get(egressName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is some misunderstanding. I meant it should first use the value in cache, and get latest version from API if there is a conflict. See an example https://github.com/kubernetes/kubernetes/blob/d92b788faa2521a937acc5fdcb66bfdb960dbd48/pkg/controller/daemon/daemon_controller.go#L1056.
return err | ||
} | ||
actualStatus := egress.Status.EgressNode | ||
if actualStatus == nodeName || (nodeName == "" && actualStatus != c.nodeName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about changing the argument to a bool isLocal
to make the method more readable?
If it's true, the method needs to ensure egress.Status.EgressNode = c.nodeName
.
If it's false, the method needs to ensure egress.Status.EgressNode != c.nodeName
by updating it to ""
.
return nil | ||
} | ||
nodeNameToUpdate = "" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be moved to the loop below and handle isLocal=true
case as well, it doesn't make to separate the two cases and add a special check for isLocal
case outside of the loop alone.
test/e2e/egress_test.go
Outdated
@@ -167,6 +167,8 @@ ip netns exec %[1]s /agnhost netexec | |||
defer data.crdClient.CrdV1alpha2().Egresses().Delete(context.TODO(), egress.Name, metav1.DeleteOptions{}) | |||
assertClientIP(localPod, egressNodeIP) | |||
assertClientIP(remotePod, egressNodeIP) | |||
egressState, _ := data.crdClient.CrdV1alpha2().Egresses().Get(context.TODO(), egress.Name, metav1.GetOptions{}) | |||
assert.Equal(t, egressNode, egressState.Status.EgressNode, "Egress status not match") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should remove this assertion? otherwise it will still fail the test
test/e2e/egress_test.go
Outdated
}) | ||
assert.NoError(t, err, "Egress failed to reach expected status") | ||
|
||
egress, err = data.crdClient.CrdV1alpha2().Egresses().Get(context.TODO(), egress.Name, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it needs to get it again after the above loop has verified its status?
You can declare egress out of the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
klog.V(2).InfoS("Updating Egress status", "Egress", egress.Name, "oldNode", egress.Status.EgressNode, "newNode", toUpdate.Status.EgressNode) | ||
_, err := c.crdClient.CrdV1alpha2().Egresses().UpdateStatus(context.TODO(), toUpdate, metav1.UpdateOptions{}) | ||
if err != nil && errors.IsConflict(err) { | ||
if toUpdate, err = c.crdClient.CrdV1alpha2().Egresses().Get(context.TODO(), egress.Name, metav1.GetOptions{}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized if you override err
here, RetryOnConflict
won't retry if Get
succeeds. I think this needs an unit test which makes the first UpdateStatus call fail to verify it can retry correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About unit test, is there an easy way to mock concurrent control and incremental resource version in the crd fake clientset?
kubernetes/kubernetes#72353
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you don't have concurrent case. Refer to https://github.com/kubernetes/kubernetes/pull/99398/files to fake error only for first N calls.
_, updateErr = c.crdClient.CrdV1alpha2().Egresses().UpdateStatus(context.TODO(), toUpdate, metav1.UpdateOptions{}) | ||
if updateErr != nil && errors.IsConflict(updateErr) { | ||
if toUpdate, getErr = c.crdClient.CrdV1alpha2().Egresses().Get(context.TODO(), egress.Name, metav1.GetOptions{}); getErr != nil { | ||
// If the GET fails we can't trust status.Replicas anymore. This error is bound to be more interesting than the update failure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-pasted comment..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminding, will remove it.
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
"k8s.io/apimachinery/pkg/runtime" | ||
"k8s.io/apimachinery/pkg/util/sets" | ||
"k8s.io/apimachinery/pkg/util/wait" | ||
testing2 "k8s.io/client-go/testing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing2 "k8s.io/client-go/testing" | |
k8stesting "k8s.io/client-go/testing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
assert.Equal(t, c.nodeName, egress.Status.EgressNode, "Egress status not match") | ||
} else { | ||
assert.Equal(t, "", egress.Status.EgressNode, "Egress status not match") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit test shouln't calculate the desired status based on the actual status, which basically repeats the main code and not tests it.
You can add expectedEgresses
and set the expected Node name there, and checks if the got egress equals the expected one. This will be useful later when we have more fields.
for _, expectedEgress := range tt.expectedEgresses {
gotEgress, err := c.crdClient.CrdV1alpha2().Egresses().Get(context.TODO(), expectedEgress.Name, metav1.GetOptions{})
require.NoError(t, err)
assert.Equal(t, expectedEgress, gotEgress)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
expectedUpdateCalled: 1, | ||
expectedGetCalled: 0, | ||
expectedError: nil, | ||
updateFailure: updateConflictError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be nil or unset to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
expectedUpdateCalled int | ||
expectedGetCalled int | ||
expectedError error | ||
updateFailure error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is input rather than expected result, please move it around updateErrorNum
to avoid confusion. And better to name it updateError
accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
updateFailure: updateConflictError, | ||
}, | ||
{ | ||
name: "fail after one update failures", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: "fail after one update failures", | |
name: "fail after one update failure", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Wenqi Qiu <wenqiq@vmware.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
@wenqiq Please add more details in the PR description or file an issue about what this PR is for, so other members can understand it without digging into the code. |
/test-all |
I just noticed that this was merged as a commit with a message that simply says: "Fix Egress status updating". This is misleading because this is not a bug fix IMO. It also does not provide any useful information about the change. Please be mindful in the future that the squashed commit matches the PR description. Thanks! |
Thanks for noticing this and sorry my carelessness, can we revert this merge or add some comments in the commit or force rewrite the commit? @antoninbas @tnqn |
We cannot change anything now, but it's ok we'll keep it in mind for future merges |
Update Egress status when configuring a static Egress.
For example:
Create an
Egress
resources, assign EgressIP field and the EgressIP has been assigned to Node 'kind-worker' manually.List the
Egress
resource with kubectl.It is an additional PR for #2186, and makes the Egress status more user friendly.
Signed-off-by: Wenqi Qiu wenqiq@vmware.com