-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix e2e TestAntctl on windows testbed #4663
Conversation
/test-windows-e2e |
/test-windows-containerd-e2e |
/test-windows-e2e |
/test-windows-proxyall-e2e |
/test-windows-e2e |
pkg/support/dump.go
Outdated
@@ -293,7 +295,13 @@ func (d *agentDumper) DumpAgentInfo(basedir string) error { | |||
} | |||
|
|||
func (d *agentDumper) DumpMemberlist(basedir string) error { | |||
return dumpAntctlGet(d.fs, d.executor, "memberlist", basedir) | |||
var ml []memberlist.Response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some unit tests for the change?
pkg/support/dump.go
Outdated
@@ -293,7 +295,13 @@ func (d *agentDumper) DumpAgentInfo(basedir string) error { | |||
} | |||
|
|||
func (d *agentDumper) DumpMemberlist(basedir string) error { | |||
return dumpAntctlGet(d.fs, d.executor, "memberlist", basedir) | |||
var ml []memberlist.Response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real issue is memberlistCluster is nil on windows. The change in the PR can't resolve the problem.
The problem also exists on Linux when Egress feature is disabled. memberlist.HandleFunc
should check whether GetMemberlistCluster()
return nil before calling its AliveNodes
method. If the interface is nil, it should return a proper http code and message to client indicating memberlist is not running.
/test-windows-all |
/test-windows-all |
/test-windows-e2e |
if aq.GetMemberlistCluster() == nil { | ||
http.Error(w, "memberlist is not running", http.StatusServiceUnavailable) | ||
return | ||
} | ||
var memberlist []Response | ||
allNodes, _ := aq.GetNodeLister().List(labels.Everything()) | ||
aliveNodes := aq.GetMemberlistCluster().AliveNodes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if aq.GetMemberlistCluster() == nil { | |
http.Error(w, "memberlist is not running", http.StatusServiceUnavailable) | |
return | |
} | |
var memberlist []Response | |
allNodes, _ := aq.GetNodeLister().List(labels.Everything()) | |
aliveNodes := aq.GetMemberlistCluster().AliveNodes() | |
memberlisterCluster := aq.GetMemberlistCluster() | |
if memberlisterCluster == nil { | |
http.Error(w, "memberlist is not running", http.StatusServiceUnavailable) | |
return | |
} | |
var memberlist []Response | |
allNodes, _ := aq.GetNodeLister().List(labels.Everything()) | |
aliveNodes := memberlisterCluster.AliveNodes() |
pkg/support/dump.go
Outdated
handler := memberlist.HandleFunc(d.aq) | ||
req, err := http.NewRequest(http.MethodGet, "", nil) | ||
if err != nil { | ||
return err | ||
} | ||
recorder := httptest.NewRecorder() | ||
handler.ServeHTTP(recorder, req) | ||
var received []memberlist.Response | ||
if recorder.Code == http.StatusOK { | ||
err = json.Unmarshal(recorder.Body.Bytes(), &received) | ||
if err != nil { | ||
return err | ||
} | ||
} | ||
return writeYAMLFile(d.fs, filepath.Join(basedir, "memberlist"), "memberlist", received) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because windows tests fail with same reason if we keep existing code (even after making changes in pkg/agent/apiserver/handlers/memberlist/handler.go to check for memberlist is running or not). I am not aware of the reason for this but i guess there maybe some history of running antctl cmds on windows as I see DumpAgentInfo
func in dump.go also uses similar approach instead of calling dumpAntctlGet
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should understand why it fails if it's not supposed to. I don't know why DumpAgentInfo
implements it in a different way but at least it doesn't create a new http handler and client to fetch the result, which looks redundant.
I think the failure on windows of the command is still expected, as the server the http response code is http.StatusServiceUnavailable
, which will be converted an error by exec.Cmd. The underlying difference is, previously antrea-agent panic if it receives this http request, now it returns an error to client to indicate memberlist is not running. I think the code should handle the failure as an expected case just like what you did in L308.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/test-windows-all |
/test-windows-all |
/test-windows-e2e |
/test-windows-proxyall-e2e |
/test-windows-proxyall-e2e |
/test-windows-all |
pkg/support/dump.go
Outdated
@@ -293,7 +293,11 @@ func (d *agentDumper) DumpAgentInfo(basedir string) error { | |||
} | |||
|
|||
func (d *agentDumper) DumpMemberlist(basedir string) error { | |||
return dumpAntctlGet(d.fs, d.executor, "memberlist", basedir) | |||
err := dumpAntctlGet(d.fs, d.executor, "memberlist", basedir) | |||
if err != nil && strings.Contains(err.Error(), "error when dumping memberlist:") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would just ignore all errors as error returned by dumpAntctlGet always include this string. Can it get the target message "memberlist is not running" or the target status code http.StatusServiceUnavailable
from the error in some way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dumpAntctlGet returns broadly two kinds of error -
1A. error due to memberlist not running , status code http.StatusServiceUnavailable
.
1B. error when encoding Memberlist to json, status code http.StatusInternalServerError
These two cases mentioned above will fall in this if case.
- error when writing memberlist to file.
This case will NOT fall in this if case.
@tnqn Do you suggest to catch only case 1A in this if case (which writes no data to file)?
EDIT : done, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only 1A indicates things work correctly, so we should process 1A like success.
/test-windows-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-windows-all |
TestAntctl e2e test still failed. |
@@ -48,9 +48,14 @@ func generateResponse(node *v1.Node, aliveNodes sets.String) Response { | |||
// HandleFunc returns the function which can handle queries issued by the memberlist command. | |||
func HandleFunc(aq querier.AgentQuerier) http.HandlerFunc { | |||
return func(w http.ResponseWriter, r *http.Request) { | |||
memberlistCluster := aq.GetMemberlistCluster() | |||
if memberlistCluster == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An interface assigned with a concrete type's value will not be nil
regardless whether the concrete type is nil or not.
Try this code: https://go.dev/play/p/rkyXVVwDjw1.
The nil check should use reflect.ValueOf(multiclusterInterface).IsNil()
/test-windows-all |
/test-windows-e2e |
/test-windows-containerd-e2e |
/test-windows-proxyall-e2e |
1 similar comment
/test-windows-proxyall-e2e |
/test-windows-proxyall-e2e |
TestAntctl/testAntctlControllerRemoteAccess failed on all windows testbed because it cannot get output of `antctl get memberlist` while collecting supportbundle from outside on windows testbed as memberlist is not running. This PR handles the above case as success on linux and writes output to file that memberlist is not running. On Windows, don't collect memberlist as it never runs on Windows. Fixes antrea-io#4659 Signed-off-by: Kumar Atish <atish.iaf@gmail.com>
/test-windows-proxyall-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/skip-all |
TestAntctl/testAntctlControllerRemoteAccess failed on all windows testbed because it cannot get output of `antctl get memberlist` while collecting supportbundle from outside on windows testbed as memberlist is not running. This PR handles the above case as success on linux and writes output to file that memberlist is not running. On Windows, don't collect memberlist as it never runs on Windows. Fixes antrea-io#4659 Signed-off-by: Kumar Atish <atish.iaf@gmail.com>
TestAntctl/testAntctlControllerRemoteAccess failed
on all windows testbed because it cannot get
output of
antctl get memberlist
while collectingsupportbundle from outside on windows testbed as
memberlist is not running.
This PR handles the above case as success on linux
and writes output to file that memberlist is not running.
On Windows, don't collect memberlist as it never runs on Windows.
Fixes #4659