-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abort Kubernetes Ingress update if Kubernetes API call fails #1295
Abort Kubernetes Ingress update if Kubernetes API call fails #1295
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- IMHO that's the right approach. See my comment though on making an error/missing distinction when we get the endpoints.
- Generally the test approach is fine. I made a few suggestions though on how to improve the test implementation.
- Does my code comment answer this question?
provider/kubernetes_test.go
Outdated
apiEndpointsError: true, | ||
} | ||
provider = Kubernetes{} | ||
_, err = provider.loadIngresses(client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we convert the test into a table-driven one using sub-tests. Among several benefits, this should improve readability and reduce the amount of test code needed.
provider/kubernetes_test.go
Outdated
_, err := provider.loadIngresses(client) | ||
|
||
if err == nil { | ||
t.Fatal("Should have raised error when loading services but did not.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be t.Error
in my view.
provider/kubernetes_test.go
Outdated
@@ -1543,6 +1612,10 @@ func (c clientMock) GetIngresses(namespaces k8s.Namespaces) []*v1beta1.Ingress { | |||
} | |||
|
|||
func (c clientMock) GetService(namespace, name string) (*v1.Service, bool, error) { | |||
if c.apiServiceError { | |||
return &v1.Service{}, true, c.err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conventionally, if an error occurs in a function returning multiple values, the non-error
parameters are set to their default values. Thus, I'd use false
for the boolean return value.
Slightly related/unrelated: I don't know why we return true
in line 1624 when we seemingly can't find the service.
provider/kubernetes_test.go
Outdated
@@ -1552,6 +1625,10 @@ func (c clientMock) GetService(namespace, name string) (*v1.Service, bool, error | |||
} | |||
|
|||
func (c clientMock) GetEndpoints(namespace, name string) (*v1.Endpoints, bool, error) { | |||
if c.apiEndpointsError { | |||
return &v1.Endpoints{}, true, c.err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also false
here.
provider/kubernetes_test.go
Outdated
ObjectMeta: v1.ObjectMeta{ | ||
Namespace: "testing", | ||
Annotations: map[string]string{ | ||
"traefik.frontend.passHostHeader": "true", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this annotation for the tests at hand?
I'm always in favor of keeping test fixtures as minimized as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to move it to a separate test specifically for checking the !exists cases.
provider/kubernetes_test.go
Outdated
Spec: v1beta1.IngressSpec{ | ||
Rules: []v1beta1.IngressRule{ | ||
{ | ||
Host: "foo", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
return nil, err | ||
} | ||
|
||
if !exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a test for this case. I suppose it's already covered by a pre-existing test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the does not exist part? No that was on my todo list, it is actually why the ingress part of the new test has sample data. Was thinking of moving it to a separate test though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate test sounds good. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests for this and the !exists endpoints
provider/kubernetes.go
Outdated
@@ -196,10 +199,10 @@ func (provider *Kubernetes) loadIngresses(k8sClient k8s.Client) (*types.Configur | |||
endpoints, exists, err := k8sClient.GetEndpoints(service.ObjectMeta.Namespace, service.ObjectMeta.Name) | |||
if err != nil || !exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason we not splitting up the two cases here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bad rebase is why actually. Specifically wrote some test code for it. Will fix,
provider/kubernetes_test.go
Outdated
type clientMock struct { | ||
ingresses []*v1beta1.Ingress | ||
services []*v1.Service | ||
endpoints []*v1.Endpoints | ||
watchChan chan interface{} | ||
|
||
err error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get rid of this variable by changing the type of api{Service,Endpoints}Error
to error
and returning the error if it's non-nil in the stubbed getter methods.
provider/kubernetes_test.go
Outdated
_, err := provider.loadIngresses(client) | ||
|
||
if err == nil { | ||
t.Fatal("Should have raised error when loading services but did not.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of just checking for the error being nil
, what I think we can do here is pass a custom error variable to the mock (e.g., errors.New("failed")
) and check against this very error. It shouldn't matter what error value we inject since our code doesn't depend on any specific error -- we just want to verify that we return the error to the caller that the (mocked) API gives us.
Similar for the endpoint case.
I hope this addresses your third question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should work great. Will make the change.
Tests SHOULD be passing now. Still need to move the tests over to using the table driven test method, need to add tests for the !exists code paths, and am not sure how we want to handle when things don't exist. For example when a service doesn't exist do we still add the frontend? The two cases of this right now are when services or endpoints don't exist. Both cases can easily be someone putting a typo in the service name or port name. |
AFAICS, the only alternative we have to skipping frontends with missing services/endpoints is to revert the entire configuration (i.e., apply the error case resolution). IMHO, we shouldn't do that since such misses are usually due to a user mistake (as you said) and would consequently "punish" all other frontends that did everything correct. I believe the focus of this PR should be to fix #1240, which is to protect against the Kube API temporarily being unavailable. By handling that case gracefully (through reverting to the previous configuration) and maintaining the "missing" behavior as-is, I think we have satisfied the goal at hand. Potential improvements to the missing case, especially if the direction isn't super clear, can be handled by follow-up issues/PRs. Just my 2 cents. :) |
Was reviewing this PR this morning and noticed that the condition in my new test to check for failure was actually wrong and would be passing even if it shouldn't. Fixed that and added test data back in. Without the test data the loadIngress logic doesn't get far enough along to trigger the errors we want to test for. So added the test data back in. Very sorry about that! The logic in loadIngress was fine and I didn't have to make any changes there. It was just the tests not actually doing anything useful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more comment.
Also, anxiously waiting for that table-driven version. :-)
provider/kubernetes_test.go
Outdated
provider := Kubernetes{} | ||
_, err := provider.loadIngresses(client) | ||
|
||
if err == nil || err.Error() != "failed service call" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can simplify the assertion by comparing error values:
if err != client.apiServiceError {
...
}
Once you turn the test into a table-driven one, define and inject an error variable var apiErr = errors.New("error")
and compare against that in each test case.
provider/kubernetes_test.go
Outdated
provider = Kubernetes{} | ||
_, err = provider.loadIngresses(client) | ||
|
||
if err == nil || err.Error() != "failed endpoints call" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same pattern as above.
provider/kubernetes_test.go
Outdated
|
||
endpoints := []*v1.Endpoints{} | ||
watchChan := make(chan interface{}) | ||
apiErr := errors.New("failed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment, but I would use something more random than failed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @regner :)
Could you squash your commits?
Apart from minor comments, LGTM.
Currently if a Kubernetes API call fails we potentially remove a working service from Traefik. This changes it so if a Kubernetes API call fails we abort out of the ingress update and use the current working config. Github issue: traefik#1240 Also added a test to cover when requested resources (services and endpoints) that the user has specified don’t exist.
c96f69f
to
fab3cb3
Compare
Commits all squashed and that final test case added. @timoreimann be aware, as this isn't obvious with the squash, that I renamed the test from I also changed Other then that I only added the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found one tiny test coding bug and filed two suggestions. :-)
provider/kubernetes_test.go
Outdated
} | ||
|
||
for _, tc := range testCases { | ||
t.Run(tc.desc, func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to capture the loop variable by an assignment tc := tc
since we run the tests in parallel (as documented). Otherwise, chances are we will be testing a single case only.
provider/kubernetes_test.go
Outdated
}, | ||
}, | ||
{ | ||
Host: "bar", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Could we change the host name to something like "host_with_missing_service" or place an according comment to indicate that the purpose of this Ingress resource test-wise is to not be found because of that reason?
It takes a bit of back of forth scrolling to understand the purpose of the individual objects this test is setting up. I think purposeful names / comments could alleviate that.
provider/kubernetes_test.go
Outdated
}, | ||
}, | ||
{ | ||
Host: "tar", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, something that indicates this Ingress' Service's Endpoints are missing?
The new |
…hat instead of nil.
My most recent commit where I add a blank Servers attribute the "missing_service" expected output is required to pass the diff. @timoreimann helped me figure this out, but essentially without it we end up comparing a nil to a blank Servers definition.
One of the unfortunate parts of that is the normal logging we do for the failed comparison doesn't show this difference. The above output was retrieved when @timoreimann made the following change:
Something to keep in mind in the future. |
… test to help ensure we have useful output incase of failures
…verride the return value for if the endpoints exist. After the 1.2 release the use of properExists should be removed and the GetEndpoints function should return false for the second value indicating the endpoint doesn’t exist. However at this time that would break a lot of the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- we did it, @regner. :-)
Major thanks for bearing with me for so long and helping to discover a few issues with the existing tests. This should prove valuable in improving the correcting the tests in the future, which is vital for our code base to stay healthy.
Again: Great job! 🎉
@regner I pushed a small commit that includes a link to #1307. The tests went green before, so I think that we can squash and merge as soon as we have @emilevauge's approval. 🎉 |
Good call on adding the link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks a lot @regner & @timoreimann :)
* Abort Kubernetes Ingress update if Kubernetes API call fails Currently if a Kubernetes API call fails we potentially remove a working service from Traefik. This changes it so if a Kubernetes API call fails we abort out of the ingress update and use the current working config. Github issue: #1240 Also added a test to cover when requested resources (services and endpoints) that the user has specified don’t exist. * Specifically capturing the tc range as documented here: https://blog.golang.org/subtests * Updating service names in the mock data to be more clear * Updated expected data to match what currently happens in the loadIngress * Adding a blank Servers to the expected output so we compare against that instead of nil. * Replacing the JSON test output with spew for the TestMissingResources test to help ensure we have useful output incase of failures * Adding a temporary fix to the GetEndoints mocked function so we can override the return value for if the endpoints exist. After the 1.2 release the use of properExists should be removed and the GetEndpoints function should return false for the second value indicating the endpoint doesn’t exist. However at this time that would break a lot of the tests. * Adding quick TODO line about removing the properExists property * Link to issue 1307 re: properExists flag.
* Abort Kubernetes Ingress update if Kubernetes API call fails Currently if a Kubernetes API call fails we potentially remove a working service from Traefik. This changes it so if a Kubernetes API call fails we abort out of the ingress update and use the current working config. Github issue: #1240 Also added a test to cover when requested resources (services and endpoints) that the user has specified don’t exist. * Specifically capturing the tc range as documented here: https://blog.golang.org/subtests * Updating service names in the mock data to be more clear * Updated expected data to match what currently happens in the loadIngress * Adding a blank Servers to the expected output so we compare against that instead of nil. * Replacing the JSON test output with spew for the TestMissingResources test to help ensure we have useful output incase of failures * Adding a temporary fix to the GetEndoints mocked function so we can override the return value for if the endpoints exist. After the 1.2 release the use of properExists should be removed and the GetEndpoints function should return false for the second value indicating the endpoint doesn’t exist. However at this time that would break a lot of the tests. * Adding quick TODO line about removing the properExists property * Link to issue 1307 re: properExists flag.
* Abort Kubernetes Ingress update if Kubernetes API call fails Currently if a Kubernetes API call fails we potentially remove a working service from Traefik. This changes it so if a Kubernetes API call fails we abort out of the ingress update and use the current working config. Github issue: #1240 Also added a test to cover when requested resources (services and endpoints) that the user has specified don’t exist. * Specifically capturing the tc range as documented here: https://blog.golang.org/subtests * Updating service names in the mock data to be more clear * Updated expected data to match what currently happens in the loadIngress * Adding a blank Servers to the expected output so we compare against that instead of nil. * Replacing the JSON test output with spew for the TestMissingResources test to help ensure we have useful output incase of failures * Adding a temporary fix to the GetEndoints mocked function so we can override the return value for if the endpoints exist. After the 1.2 release the use of properExists should be removed and the GetEndpoints function should return false for the second value indicating the endpoint doesn’t exist. However at this time that would break a lot of the tests. * Adding quick TODO line about removing the properExists property * Link to issue 1307 re: properExists flag.
* Abort Kubernetes Ingress update if Kubernetes API call fails Currently if a Kubernetes API call fails we potentially remove a working service from Traefik. This changes it so if a Kubernetes API call fails we abort out of the ingress update and use the current working config. Github issue: #1240 Also added a test to cover when requested resources (services and endpoints) that the user has specified don’t exist. * Specifically capturing the tc range as documented here: https://blog.golang.org/subtests * Updating service names in the mock data to be more clear * Updated expected data to match what currently happens in the loadIngress * Adding a blank Servers to the expected output so we compare against that instead of nil. * Replacing the JSON test output with spew for the TestMissingResources test to help ensure we have useful output incase of failures * Adding a temporary fix to the GetEndoints mocked function so we can override the return value for if the endpoints exist. After the 1.2 release the use of properExists should be removed and the GetEndpoints function should return false for the second value indicating the endpoint doesn’t exist. However at this time that would break a lot of the tests. * Adding quick TODO line about removing the properExists property * Link to issue 1307 re: properExists flag.
* Abort Kubernetes Ingress update if Kubernetes API call fails Currently if a Kubernetes API call fails we potentially remove a working service from Traefik. This changes it so if a Kubernetes API call fails we abort out of the ingress update and use the current working config. Github issue: #1240 Also added a test to cover when requested resources (services and endpoints) that the user has specified don’t exist. * Specifically capturing the tc range as documented here: https://blog.golang.org/subtests * Updating service names in the mock data to be more clear * Updated expected data to match what currently happens in the loadIngress * Adding a blank Servers to the expected output so we compare against that instead of nil. * Replacing the JSON test output with spew for the TestMissingResources test to help ensure we have useful output incase of failures * Adding a temporary fix to the GetEndoints mocked function so we can override the return value for if the endpoints exist. After the 1.2 release the use of properExists should be removed and the GetEndpoints function should return false for the second value indicating the endpoint doesn’t exist. However at this time that would break a lot of the tests. * Adding quick TODO line about removing the properExists property * Link to issue 1307 re: properExists flag.
Currently if a Kubernetes API call fails we potentially remove a working service from Traefik. This changes it so if a Kubernetes API call fails we abort out of the ingress update and use the current working config. Github issue: #1240
Tests are currently failing but wanted to get the PR up for a few reasons:
Fixes #1240