Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix e2e TestAntctl on windows testbed #4663

Merged
merged 1 commit into from
Mar 9, 2023
Merged

Conversation

Atish-iaf
Copy link
Contributor

@Atish-iaf Atish-iaf commented Feb 28, 2023

TestAntctl/testAntctlControllerRemoteAccess failed
on all windows testbed because it cannot get
output of antctl get memberlist while collecting
supportbundle from outside on windows testbed as
memberlist is not running.

This PR handles the above case as success on linux
and writes output to file that memberlist is not running.

On Windows, don't collect memberlist as it never runs on Windows.

Fixes #4659

@Atish-iaf Atish-iaf changed the title Fix e2e TestAntctl on windows testbed [WIP] Fix e2e TestAntctl on windows testbed Feb 28, 2023
@Atish-iaf
Copy link
Contributor Author

/test-windows-e2e

@XinShuYang
Copy link
Contributor

XinShuYang commented Mar 1, 2023

/test-windows-containerd-e2e
/test-windows-e2e
/test-windows-proxyall-e2e

@Atish-iaf
Copy link
Contributor Author

/test-windows-e2e
/test-windows-proxyall-e2e

@XinShuYang
Copy link
Contributor

/test-windows-proxyall-e2e
/test-windows-e2e

@Atish-iaf
Copy link
Contributor Author

/test-windows-e2e

@Atish-iaf Atish-iaf marked this pull request as ready for review March 1, 2023 13:29
@Atish-iaf Atish-iaf changed the title [WIP] Fix e2e TestAntctl on windows testbed Fix e2e TestAntctl on windows testbed Mar 1, 2023
@luolanzone luolanzone added the area/test/e2e Issues or PRs related to Antrea specific end-to-end testing. label Mar 1, 2023
@luolanzone luolanzone requested a review from wenyingd March 1, 2023 14:17
@@ -293,7 +295,13 @@ func (d *agentDumper) DumpAgentInfo(basedir string) error {
}

func (d *agentDumper) DumpMemberlist(basedir string) error {
return dumpAntctlGet(d.fs, d.executor, "memberlist", basedir)
var ml []memberlist.Response
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some unit tests for the change?

@@ -293,7 +295,13 @@ func (d *agentDumper) DumpAgentInfo(basedir string) error {
}

func (d *agentDumper) DumpMemberlist(basedir string) error {
return dumpAntctlGet(d.fs, d.executor, "memberlist", basedir)
var ml []memberlist.Response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real issue is memberlistCluster is nil on windows. The change in the PR can't resolve the problem.

The problem also exists on Linux when Egress feature is disabled. memberlist.HandleFunc should check whether GetMemberlistCluster() return nil before calling its AliveNodes method. If the interface is nil, it should return a proper http code and message to client indicating memberlist is not running.

@Atish-iaf
Copy link
Contributor Author

/test-windows-all

@Atish-iaf
Copy link
Contributor Author

/test-windows-all

@Atish-iaf
Copy link
Contributor Author

/test-windows-e2e

@Atish-iaf Atish-iaf marked this pull request as ready for review March 3, 2023 02:28
@Atish-iaf Atish-iaf requested a review from tnqn March 3, 2023 05:48
Comment on lines 51 to 57
if aq.GetMemberlistCluster() == nil {
http.Error(w, "memberlist is not running", http.StatusServiceUnavailable)
return
}
var memberlist []Response
allNodes, _ := aq.GetNodeLister().List(labels.Everything())
aliveNodes := aq.GetMemberlistCluster().AliveNodes()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if aq.GetMemberlistCluster() == nil {
http.Error(w, "memberlist is not running", http.StatusServiceUnavailable)
return
}
var memberlist []Response
allNodes, _ := aq.GetNodeLister().List(labels.Everything())
aliveNodes := aq.GetMemberlistCluster().AliveNodes()
memberlisterCluster := aq.GetMemberlistCluster()
if memberlisterCluster == nil {
http.Error(w, "memberlist is not running", http.StatusServiceUnavailable)
return
}
var memberlist []Response
allNodes, _ := aq.GetNodeLister().List(labels.Everything())
aliveNodes := memberlisterCluster.AliveNodes()

Comment on lines 300 to 314
handler := memberlist.HandleFunc(d.aq)
req, err := http.NewRequest(http.MethodGet, "", nil)
if err != nil {
return err
}
recorder := httptest.NewRecorder()
handler.ServeHTTP(recorder, req)
var received []memberlist.Response
if recorder.Code == http.StatusOK {
err = json.Unmarshal(recorder.Body.Bytes(), &received)
if err != nil {
return err
}
}
return writeYAMLFile(d.fs, filepath.Join(basedir, "memberlist"), "memberlist", received)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because windows tests fail with same reason if we keep existing code (even after making changes in pkg/agent/apiserver/handlers/memberlist/handler.go to check for memberlist is running or not). I am not aware of the reason for this but i guess there maybe some history of running antctl cmds on windows as I see DumpAgentInfo func in dump.go also uses similar approach instead of calling dumpAntctlGet function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should understand why it fails if it's not supposed to. I don't know why DumpAgentInfo implements it in a different way but at least it doesn't create a new http handler and client to fetch the result, which looks redundant.

I think the failure on windows of the command is still expected, as the server the http response code is http.StatusServiceUnavailable, which will be converted an error by exec.Cmd. The underlying difference is, previously antrea-agent panic if it receives this http request, now it returns an error to client to indicate memberlist is not running. I think the code should handle the failure as an expected case just like what you did in L308.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tnqn Do you know the reason for Use internal queriers instead of antctl to retrieve networkpolicies and agentinfo. ? as written in commit message of this PR #820

pkg/agent/apiserver/handlers/memberlist/handler_test.go Outdated Show resolved Hide resolved
@Atish-iaf
Copy link
Contributor Author

/test-windows-all

@Atish-iaf
Copy link
Contributor Author

/test-windows-all

@Atish-iaf
Copy link
Contributor Author

/test-windows-e2e
/test-windows-proxyall-e2e

@Atish-iaf
Copy link
Contributor Author

/test-windows-proxyall-e2e

@Atish-iaf Atish-iaf marked this pull request as ready for review March 6, 2023 06:50
@Atish-iaf Atish-iaf requested a review from tnqn March 6, 2023 06:51
@Atish-iaf
Copy link
Contributor Author

/test-windows-proxyall-e2e

@Atish-iaf
Copy link
Contributor Author

/test-windows-all

@@ -293,7 +293,11 @@ func (d *agentDumper) DumpAgentInfo(basedir string) error {
}

func (d *agentDumper) DumpMemberlist(basedir string) error {
return dumpAntctlGet(d.fs, d.executor, "memberlist", basedir)
err := dumpAntctlGet(d.fs, d.executor, "memberlist", basedir)
if err != nil && strings.Contains(err.Error(), "error when dumping memberlist:") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would just ignore all errors as error returned by dumpAntctlGet always include this string. Can it get the target message "memberlist is not running" or the target status code http.StatusServiceUnavailable from the error in some way?

Copy link
Contributor Author

@Atish-iaf Atish-iaf Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dumpAntctlGet returns broadly two kinds of error -

1A. error due to memberlist not running , status code http.StatusServiceUnavailable.
1B. error when encoding Memberlist to json, status code http.StatusInternalServerError
These two cases mentioned above will fall in this if case.

  1. error when writing memberlist to file.
    This case will NOT fall in this if case.

@tnqn Do you suggest to catch only case 1A in this if case (which writes no data to file)?

EDIT : done, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only 1A indicates things work correctly, so we should process 1A like success.

@Atish-iaf
Copy link
Contributor Author

/test-windows-all

pkg/support/dump.go Outdated Show resolved Hide resolved
pkg/support/dump.go Outdated Show resolved Hide resolved
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@XinShuYang
Copy link
Contributor

/test-windows-all

@XinShuYang
Copy link
Contributor

TestAntctl e2e test still failed.

@@ -48,9 +48,14 @@ func generateResponse(node *v1.Node, aliveNodes sets.String) Response {
// HandleFunc returns the function which can handle queries issued by the memberlist command.
func HandleFunc(aq querier.AgentQuerier) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
memberlistCluster := aq.GetMemberlistCluster()
if memberlistCluster == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interface assigned with a concrete type's value will not be nil regardless whether the concrete type is nil or not.
Try this code: https://go.dev/play/p/rkyXVVwDjw1.

The nil check should use reflect.ValueOf(multiclusterInterface).IsNil()

@Atish-iaf
Copy link
Contributor Author

/test-windows-all

@tnqn
Copy link
Member

tnqn commented Mar 9, 2023

/test-windows-e2e

@tnqn
Copy link
Member

tnqn commented Mar 9, 2023

/test-windows-containerd-e2e

@tnqn
Copy link
Member

tnqn commented Mar 9, 2023

/test-windows-proxyall-e2e

1 similar comment
@XinShuYang
Copy link
Contributor

/test-windows-proxyall-e2e

@XinShuYang
Copy link
Contributor

/test-windows-proxyall-e2e

TestAntctl/testAntctlControllerRemoteAccess failed
on all windows testbed because it cannot get
output of `antctl get memberlist` while collecting
supportbundle from outside on windows testbed as
memberlist is not running.

This PR handles the above case as success on linux
and writes output to file that memberlist is not running.

On Windows, don't collect memberlist as it never runs on Windows.

Fixes antrea-io#4659

Signed-off-by: Kumar Atish <atish.iaf@gmail.com>
@Atish-iaf
Copy link
Contributor Author

/test-windows-proxyall-e2e

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Mar 9, 2023

/skip-all
The windows e2e failed because of other failures, TestAntctl/testAntctlControllerRemoteAccess has succeded.

@tnqn tnqn merged commit dd564e6 into antrea-io:main Mar 9, 2023
jainpulkit22 pushed a commit to urharshitha/antrea that referenced this pull request Apr 28, 2023
TestAntctl/testAntctlControllerRemoteAccess failed
on all windows testbed because it cannot get
output of `antctl get memberlist` while collecting
supportbundle from outside on windows testbed as
memberlist is not running.

This PR handles the above case as success on linux
and writes output to file that memberlist is not running.

On Windows, don't collect memberlist as it never runs on Windows.

Fixes antrea-io#4659

Signed-off-by: Kumar Atish <atish.iaf@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test/e2e Issues or PRs related to Antrea specific end-to-end testing.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TestAntctl failed on windows testbed
5 participants