Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Egress status when configuring static Egress #2444

Merged
merged 1 commit into from
Aug 5, 2021

Conversation

wenqiq
Copy link
Contributor

@wenqiq wenqiq commented Jul 21, 2021

Update Egress status when configuring a static Egress.

For example:

Create an Egress resources, assign EgressIP field and the EgressIP has been assigned to Node 'kind-worker' manually.

kind: Egress
metadata:
  name: egress-web-ipv4
spec:
  appliedTo:
    podSelector:
      matchLabels:
        role: web
    namespaceSelector:
      matchLabels:
        env: prod
  egressIP: 10.10.10.10

List the Egress resource with kubectl.

# kubectl get egress
NAME                 EGRESSIP       AGE   NODE
egress-web-ipv4      10.10.0.10     1m    kind-worker

It is an additional PR for #2186, and makes the Egress status more user friendly.

Signed-off-by: Wenqi Qiu wenqiq@vmware.com

@codecov-commenter
Copy link

codecov-commenter commented Jul 21, 2021

Codecov Report

Merging #2444 (e6414b9) into main (028393d) will increase coverage by 5.20%.
The diff coverage is 90.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2444      +/-   ##
==========================================
+ Coverage   59.78%   64.99%   +5.20%     
==========================================
  Files         284      284              
  Lines       22265    25637    +3372     
==========================================
+ Hits        13312    16662    +3350     
+ Misses       7535     7423     -112     
- Partials     1418     1552     +134     
Flag Coverage Δ
e2e-tests 55.86% <71.42%> (?)
kind-e2e-tests 47.18% <0.00%> (+0.31%) ⬆️
unit-tests 42.20% <85.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/controller/egress/egress_controller.go 72.43% <90.47%> (+36.91%) ⬆️
pkg/controller/egress/ipallocator/allocator.go 67.82% <0.00%> (-15.16%) ⬇️
pkg/controller/networkpolicy/endpoint_querier.go 77.64% <0.00%> (-13.79%) ⬇️
pkg/apis/controlplane/v1beta1/conversion.go 72.44% <0.00%> (-11.89%) ⬇️
pkg/legacyapis/core/v1alpha2/register.go 69.23% <0.00%> (-10.77%) ⬇️
pkg/controller/egress/controller.go 76.76% <0.00%> (-10.44%) ⬇️
pkg/apis/stats/register.go 71.42% <0.00%> (-10.39%) ⬇️
pkg/legacyapis/stats/register.go 71.42% <0.00%> (-10.39%) ⬇️
pkg/ovs/openflow/ofctrl_meter.go 33.84% <0.00%> (-10.16%) ⬇️
pkg/legacyapis/security/v1alpha1/register.go 73.33% <0.00%> (-10.00%) ⬇️
... and 266 more

@wenqiq wenqiq force-pushed the egress-fix branch 2 times, most recently from a4cba02 to 6fc5982 Compare July 21, 2021 15:36
@@ -167,6 +167,8 @@ ip netns exec %[1]s /agnhost netexec
defer data.crdClient.CrdV1alpha2().Egresses().Delete(context.TODO(), egress.Name, metav1.DeleteOptions{})
assertClientIP(localPod, egressNodeIP)
assertClientIP(remotePod, egressNodeIP)
egressState, _ := data.crdClient.CrdV1alpha2().Egresses().Get(context.TODO(), egress.Name, metav1.GetOptions{})
assert.Equal(t, egressNode, egressState.Status.EgressNode, "Egress status not match")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it cause the test flaky? The status update is asynchrounous, the client IP check is quite quick. Normally the test uses wait.Poll or wait.PollImmediate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should remove this assertion? otherwise it will still fail the test

@wenqiq wenqiq changed the title Updating Egress status when configuring static Egress Update Egress status when configuring static Egress Jul 23, 2021
@wenqiq
Copy link
Contributor Author

wenqiq commented Jul 23, 2021

It seems all comments have been addressed. PTAL. @tnqn

if egress.Status.EgressNode == nodeName {
func (c *EgressController) updateEgressStatus(egressName, nodeName string) error {
if err := retry.RetryOnConflict(retry.DefaultRetry, func() error {
egress, err := c.crdClient.CrdV1alpha2().Egresses().Get(context.TODO(), egressName, metav1.GetOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally the Egress in Lister is in sync, it should only get the value from API when an update meets conflict (which means the cache in Lister is out of sync for this resource). Otherwise it would waste one GET request for each update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return err
}
actualStatus := egress.Status.EgressNode
if actualStatus == nodeName || (nodeName == "" && actualStatus != c.nodeName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This special check worths a comment. otherwise what the method does may be surprizing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing the argument to a bool isLocal to make the method more readable?
If it's true, the method needs to ensure egress.Status.EgressNode = c.nodeName.
If it's false, the method needs to ensure egress.Status.EgressNode != c.nodeName by updating it to "".

return fmt.Errorf("updating Egress %s status error: %v", egress.Name, err)
func (c *EgressController) updateEgressStatus(egressName, nodeName string) error {
if err := retry.RetryOnConflict(retry.DefaultRetry, func() error {
egress, err := c.egressLister.Get(egressName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is some misunderstanding. I meant it should first use the value in cache, and get latest version from API if there is a conflict. See an example https://github.com/kubernetes/kubernetes/blob/d92b788faa2521a937acc5fdcb66bfdb960dbd48/pkg/controller/daemon/daemon_controller.go#L1056.

return err
}
actualStatus := egress.Status.EgressNode
if actualStatus == nodeName || (nodeName == "" && actualStatus != c.nodeName) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing the argument to a bool isLocal to make the method more readable?
If it's true, the method needs to ensure egress.Status.EgressNode = c.nodeName.
If it's false, the method needs to ensure egress.Status.EgressNode != c.nodeName by updating it to "".

return nil
}
nodeNameToUpdate = ""
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be moved to the loop below and handle isLocal=true case as well, it doesn't make to separate the two cases and add a special check for isLocal case outside of the loop alone.

@@ -167,6 +167,8 @@ ip netns exec %[1]s /agnhost netexec
defer data.crdClient.CrdV1alpha2().Egresses().Delete(context.TODO(), egress.Name, metav1.DeleteOptions{})
assertClientIP(localPod, egressNodeIP)
assertClientIP(remotePod, egressNodeIP)
egressState, _ := data.crdClient.CrdV1alpha2().Egresses().Get(context.TODO(), egress.Name, metav1.GetOptions{})
assert.Equal(t, egressNode, egressState.Status.EgressNode, "Egress status not match")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should remove this assertion? otherwise it will still fail the test

})
assert.NoError(t, err, "Egress failed to reach expected status")

egress, err = data.crdClient.CrdV1alpha2().Egresses().Get(context.TODO(), egress.Name, metav1.GetOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it needs to get it again after the above loop has verified its status?
You can declare egress out of the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

klog.V(2).InfoS("Updating Egress status", "Egress", egress.Name, "oldNode", egress.Status.EgressNode, "newNode", toUpdate.Status.EgressNode)
_, err := c.crdClient.CrdV1alpha2().Egresses().UpdateStatus(context.TODO(), toUpdate, metav1.UpdateOptions{})
if err != nil && errors.IsConflict(err) {
if toUpdate, err = c.crdClient.CrdV1alpha2().Egresses().Get(context.TODO(), egress.Name, metav1.GetOptions{}); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized if you override err here, RetryOnConflict won't retry if Get succeeds. I think this needs an unit test which makes the first UpdateStatus call fail to verify it can retry correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About unit test, is there an easy way to mock concurrent control and incremental resource version in the crd fake clientset?
kubernetes/kubernetes#72353

Copy link
Member

@tnqn tnqn Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't have concurrent case. Refer to https://github.com/kubernetes/kubernetes/pull/99398/files to fake error only for first N calls.

_, updateErr = c.crdClient.CrdV1alpha2().Egresses().UpdateStatus(context.TODO(), toUpdate, metav1.UpdateOptions{})
if updateErr != nil && errors.IsConflict(updateErr) {
if toUpdate, getErr = c.crdClient.CrdV1alpha2().Egresses().Get(context.TODO(), egress.Name, metav1.GetOptions{}); getErr != nil {
// If the GET fails we can't trust status.Replicas anymore. This error is bound to be more interesting than the update failure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy-pasted comment..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminding, will remove it.

metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/util/sets"
"k8s.io/apimachinery/pkg/util/wait"
testing2 "k8s.io/client-go/testing"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
testing2 "k8s.io/client-go/testing"
k8stesting "k8s.io/client-go/testing"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

assert.Equal(t, c.nodeName, egress.Status.EgressNode, "Egress status not match")
} else {
assert.Equal(t, "", egress.Status.EgressNode, "Egress status not match")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test shouln't calculate the desired status based on the actual status, which basically repeats the main code and not tests it.
You can add expectedEgresses and set the expected Node name there, and checks if the got egress equals the expected one. This will be useful later when we have more fields.

for _, expectedEgress := range tt.expectedEgresses {
    gotEgress, err := c.crdClient.CrdV1alpha2().Egresses().Get(context.TODO(), expectedEgress.Name, metav1.GetOptions{})
    require.NoError(t, err)
    assert.Equal(t, expectedEgress, gotEgress)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

expectedUpdateCalled: 1,
expectedGetCalled: 0,
expectedError: nil,
updateFailure: updateConflictError,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be nil or unset to avoid confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

expectedUpdateCalled int
expectedGetCalled int
expectedError error
updateFailure error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is input rather than expected result, please move it around updateErrorNum to avoid confusion. And better to name it updateError accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

updateFailure: updateConflictError,
},
{
name: "fail after one update failures",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: "fail after one update failures",
name: "fail after one update failure",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Wenqi Qiu <wenqiq@vmware.com>
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@tnqn
Copy link
Member

tnqn commented Aug 4, 2021

@wenqiq Please add more details in the PR description or file an issue about what this PR is for, so other members can understand it without digging into the code.

@tnqn tnqn closed this Aug 4, 2021
@tnqn tnqn reopened this Aug 4, 2021
@tnqn
Copy link
Member

tnqn commented Aug 4, 2021

/test-all

@tnqn tnqn merged commit 11777b1 into antrea-io:main Aug 5, 2021
@antoninbas
Copy link
Contributor

I just noticed that this was merged as a commit with a message that simply says: "Fix Egress status updating". This is misleading because this is not a bug fix IMO. It also does not provide any useful information about the change. Please be mindful in the future that the squashed commit matches the PR description. Thanks!

@wenqiq
Copy link
Contributor Author

wenqiq commented Aug 6, 2021

Thanks for noticing this and sorry my carelessness, can we revert this merge or add some comments in the commit or force rewrite the commit? @antoninbas @tnqn

@antoninbas
Copy link
Contributor

Thanks for noticing this and sorry my carelessness, can we revert this merge or add some comments in the commit or force rewrite the commit? @antoninbas @tnqn

We cannot change anything now, but it's ok we'll keep it in mind for future merges

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants