-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix network plugin detection #1061
Conversation
I've tested by below way and it seems to work well: [Build environment]
(Copy bin/subctl and /tmp/submariner-operator.tar to test environment.) [Test environment (after k3s is deployed)]
(Change
|
pkg/discovery/network/generic.go
Outdated
} | ||
} | ||
|
||
return "", fmt.Errorf("no node with Spec.PodCIDR found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should return an error here if not found. We should let the caller handle not found as it sees fit (it so happens this is the last resort for the caller but that could change).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see inline comments, but the new service heuristic is good IMO, "cluster-dump" equivalent is what we do on the other heuristics.
3866d7a
to
d89fbc6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed almost as suggested. Please see my inline comments.
pkg/discovery/network/generic.go
Outdated
@@ -89,7 +97,7 @@ func findClusterIPRangeSvcCreation(clientSet kubernetes.Interface) (string, erro | |||
|
|||
// creating invalid svc didn't fail as expected | |||
if err == nil { | |||
return "", fmt.Errorf("creating invalid service(%v) didn't fail", invalidSvcSpec) | |||
return "", nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping this return error makes unit tests complex. All existing tests for network plugin detection will need to rely on creating svc always return error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this means the API server didn't behave as we expected, likely something changed. I think it would be useful to at least log a message:
status.QueueWarningMessage("Could not determine the service IP range via service creation - the expected error was not returned")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to return an error properly and changed unit tests as well.
pkg/discovery/network/generic.go
Outdated
}, | ||
}, | ||
} | ||
_, err := clientSet.CoreV1().Services("submariner-operator").Create(invalidSvcSpec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating svc in submariner-operator
namespace might fail if it is called before the namespace is created. It seems to happen even on subctl join
command. Other candidate will be kube-system
, default
, and random namespace created just for this check. Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could use empty namespace but, if not, default
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is called from both command line and operator.
subctl actually calls it on subctl join before submariner-namespace is created, so it shouldn't create service in submariner-operator namespace. On the other hand, the operator is only allowed to create resources inside submariner-operator namespace, so it shouldn't create service in default namespace.
Therefore, the code was changed to check whether it is running in operator and to create service in the proper namespace.
pkg/discovery/network/generic.go
Outdated
}, | ||
}, | ||
} | ||
_, err := clientSet.CoreV1().Services("submariner-operator").Create(invalidSvcSpec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could use empty namespace but, if not, default
.
d89fbc6
to
a6ef93b
Compare
Thank you so much for your review with detailed suggestions. PTAL |
} | ||
|
||
// decide which namespace to create the service | ||
ns := os.Getenv("WATCH_NAMESPACE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is WATCH_NAMESPACE
and how is it set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is set here as an environment variable of the operator pod. I think that this is used by operator-sdk's GetWatchNamespace
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - could you add a comment explaining this? eg "WATCH_NAMESPACE will be present if running in the operator pod".
Another approach is to pass in the NS so we don't rely on the presence of an env var. The operator code implicitly knows its own namespace. I think this would make it clearer and although it would mean passing a param down the chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed and squashed.
pkg/discovery/network/generic.go
Outdated
// decide which namespace to create the service | ||
ns := os.Getenv("WATCH_NAMESPACE") | ||
// use "submariner-operator" if WATCH_NAMESPACE is "submariner-operator" (in operator) | ||
if ns != "submariner-operator" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a constant for this in root.go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that importing constant from pkg/subctl/cmd
to pkg/discovery/network/generic.go
makes import cycle not allowed
error. Is it OK to just define constant in pkg/discovery/network/generic.go
or do we need to move the constant to the other place to share it?
Note that as explained, subctl create the operator pod in the namespace and pass the env variable via the downward API (metadata.namespace
). Then inside the operator, it checks whether env variable is set to check if it runs inside the operator. So, this constant is not directly shared across these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we pass the NS into the function as described above, this becomes moot.
|
||
// creating invalid service didn't fail as expected | ||
if err == nil { | ||
return "", fmt.Errorf("creating invalid service(%v) didn't fail", invalidSvcSpec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer this the way you had it, ie not return error here or on L128. This makes it consistent with the other discovery methods and, on join, will result in the user being prompted. But, as I noted earlier, I think it's worth logging a message as the API server didn't behave as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The callers of discoverGenericNetwork
are as follows:
[Operator]
Reconcile
discoverNetwork
getClusterNetwork
Discover
discoverGenericNetwork
[Command line]
getNetworkDetails
Discover
discoverGenericNetwork
if err != nil {
status.QueueWarningMessage(fmt.Sprintf("Error trying to discover network details: %s", err))
}
So, as for command line case, the error is already been passed to status.QueueWarningMessage
.
And, other functions called from discoverGenericNetwork
, like findClusterIPRangeFromApiserver
actually returns error and not catching it with status.QueueWarningMessage
.
Therefore, the code should be already as you expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was about whether it should return an error in this case, ie should we treat it like we didn't find the data similar to the other discovery methods. So for subclt join
, this would at least allow it to prompt the user as a last resort. After all, this is just another attempt to find the data. This seems reasonable but it would be beneficial to print a warning. @mangelajo WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting for @mangelajo 's comment. But I'm now inclined to return nil and log error here, only for this case.
Meanwhile, I would like to confirm that it is safe to call status.QueueWarningMessage
outside CLI's context, or from operator. It looks like a package under pkg/internal/cli
and I'm not familiar with its implementation details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately status.QueueWarningMessage
wouldn't be appropriate in the operator. The operator uses a different logging API. We could pass in a function to log warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added logging code in controllers/submariner/submariner_networkdiscovery.go
for operator to log it (See the first commit). Also, re-organized and squashed commits to review easier.
pkg/discovery/network/canal_test.go
Outdated
}) | ||
}) | ||
|
||
When("There is a kubeapi pod at least ", func() { | ||
When("There is a kube-api pod at least ", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When("There is a kube-api pod at least ", func() { | |
When("There is a kube-api pod", func() { |
ed5a74c
to
06ad059
Compare
var clusterNet *ClusterNetwork | ||
|
||
BeforeEach(func() { | ||
clusterNet = testDiscoverGenericWith( | ||
fakePod("kube-controller", []string{"kube-controller", "--cluster-ABCD=1.2.3.4"}, []v1.EnvVar{}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fakePod("kube-controller", []string{"kube-controller", "--cluster-ABCD=1.2.3.4"}, []v1.EnvVar{}), | |
fakePod("kube-controller-manager", []string{"kube-controller-manager", "--cluster-ABCD=1.2.3.4"}, []v1.EnvVar{}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the code, we are actually checking for component kube-controller-manager
and this test-case is about validating the arguments (i.e., no expected params).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing it out. Fixed.
var clusterNet *ClusterNetwork | ||
|
||
BeforeEach(func() { | ||
clusterNet = testDiscoverGenericWith( | ||
fakePod("kube-controller", []string{"kube-api", "--cluster-ABCD=1.2.3.4"}, []v1.EnvVar{}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fakePod("kube-controller", []string{"kube-api", "--cluster-ABCD=1.2.3.4"}, []v1.EnvVar{}), | |
fakePod("kube-apiserver", []string{"kube-apiserver", "--cluster-ABCD=1.2.3.4"}, []v1.EnvVar{}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed as well.
Signed-off-by: Masaki Kimura <masaki.kimura@hitachivantara.com>
Signed-off-by: Masaki Kimura <masaki.kimura@hitachivantara.com>
Signed-off-by: Masaki Kimura <masaki.kimura@hitachivantara.com>
Signed-off-by: Masaki Kimura <masaki.kimura@hitachivantara.com>
Signed-off-by: Masaki Kimura <masaki.kimura@hitachivantara.com>
Signed-off-by: Masaki Kimura <masaki.kimura@hitachivantara.com>
06ad059
to
317aaf6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR and also addressing the comments @mkimuram
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me now, the additional heuristics will make this much more robust. Thank you @mkimuram .
@tpantelis have an eye, I belive mkimura handled all your comments. |
Mostly. So we're going to fail fast on |
This PR fixes network plugin detection by:
Closes: submariner-io/submariner#1116