Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid flapping of multiple Ingress definitions #3862

Merged
merged 5 commits into from
Oct 5, 2018
Merged

Avoid flapping of multiple Ingress definitions #3862

merged 5 commits into from
Oct 5, 2018

Conversation

rtreffer
Copy link
Contributor

@rtreffer rtreffer commented Sep 6, 2018

What does this PR do?

If multiple ingress objects define the same frontend (host + path) and one of the ingresses is broken (e.g. referenced service does not exist) then the frontend will be flapping (appear, disappear, ...).

Motivation

We found this in production and I've extracted the problem into a testcase...

Given 2 ingresses:

- apiVersion: extensions/v1beta1
  kind: Ingress
  metadata:
    namespace: testing
    name: web-a
  spec:
    rules:
    - host: host-a
      http:
        paths:
        - backend:
            serviceName: service1
            servicePort: 80
- apiVersion: extensions/v1beta1
  kind: Ingress
  metadata:
    namespace: testing
    name: web-b
  spec:
    rules:
    - host: host-a
      http:
        paths:
        - backend:
            serviceName: missing
            servicePort: 80

The missing service will cause the whole host-a frontend to disappear if the ingress objects are processed in order [ingress/web-a, ingress/web-b]. We have seen in production that the ordering can be random, thus causing the frontend to become unavailable.

More

  • Added/updated tests
  • Added/updated documentation

Additional Notes

Instead if writing to the Frontends map directly this patch checks out the current frontend and writes it to the map only if basic validations pass.

There was a second test-failure after applying this patch. The root2 testcase has 2 ingress definitions, one of those is broken.
The behavior now changed to accept the valid one and keep the frontend.

This is a rebase of #3857 on v1.6
This version does not require the following testcase patch: 5b5e8dc

@rtreffer rtreffer requested a review from a team as a code owner September 6, 2018 15:06
@traefiker traefiker added this to the 1.6 milestone Sep 6, 2018
@rtreffer rtreffer changed the title Fix missing frontend if the last ingress is broken Multiple kubernetes ingress definitions may cause flapping frontend Sep 6, 2018
@rtreffer
Copy link
Contributor Author

rtreffer commented Sep 6, 2018

Applying this PR on newer branches may require 5b5e8dc

@dtomcej
Copy link
Contributor

dtomcej commented Sep 6, 2018

@timoreimann I guess the only question I have is: If you have two identical ingresses witht he same host/path combo, but different services, are we supposed to merge the backends if they are valid?

If so, then this resolution is fine. Otherwise, we should define what should happen.

I agree having the frontend appear/disappear is not ideal, but that doesn't mean that merging all overlaps is the only other option.

Thoughts?

@rtreffer
Copy link
Contributor Author

rtreffer commented Sep 6, 2018

We had an outage because the same host/path was defined by 2 ingress in 2 namespaces (in that particular case even for valid reasons). That caused an outage as only one of those was correctly scaled and traffic was routed to one of the two namespaces (choosen randomly on each reload).

We have since then a testcase and see it as a must that an ingress merges those. Even if it is regarded as a misconfiguration I would prefer anything to taking the frontend offline.

I see the point that merging ingress definitions is not desirable (annotation? restrictions? auth? ....) but what are non-disruptive alternatives?

@timoreimann
Copy link
Contributor

IMHO, the host/path tuple is a unique identifier for an Ingress'ed resource. As such, I feel it'd be natural to merge -- it seems like there should be no difference between an Ingress spec holding all backends and two Ingress specs holding all backends together.

The only alternative I could see is to reject Ingresses that refer an already existing host/path pair, or make things configurable. I don't really have a use case for that, other than to protect Ingress objects from each other. If people want that, however, they can control access to Ingress resources more tightly (which is what my org does) or run separate Traefik instances per namespace.

@dtomcej WDYT?

@timoreimann timoreimann changed the title Multiple kubernetes ingress definitions may cause flapping frontend Avoid flapping of multiple Ingress definitions Sep 12, 2018
@dtomcej
Copy link
Contributor

dtomcej commented Sep 24, 2018

Sorry for the delayed response.

I agree @timoreimann that the tuple should be unique. In the case at hand, the annotations from the ingress resource first loaded would be applied. I do think that this case may be noteworthy as far as documentation goes, since it could lead to non-deterministic configuration.

If @timoreimann is ok with this non-deterministic configuration, I am as well.

@timoreimann
Copy link
Contributor

Documenting sounds like a good idea. I'd also emphasize that we now merge configurations that map to the same Ingress resource.

@rtreffer would you do us the documentation honor? 🙂

@rtreffer
Copy link
Contributor Author

Sure. Merging is not new (it is already happening on 2 service backed ingress definitions) but I couldn't find it in the docs. I'll add a paragraph about it in the kubernetes user-guide.

Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment left.

Other than that, I find your addendum to the documentation quite cheesy -- exactly the way I like it. 🧀

Træfik will now look for cheddar service endpoints (ports on healthy pods) in both the cheese and the default namespace. Deploying cheddar into the cheese namespace and afterwards shutting down cheddar in the default namespace is enough to migrate the traffic.

!!! note
The kubernetes documentation does not specify this merging behavior. The reference nginx implementation has undefined behavior: If two ingress objects define the same host/port then one of them will randomly win on reload.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd omit the description pertaining to Nginx's behavior (second sentence to the end of the paragraph) as it may change in the future, leaving our docs outdated without us being able to notice the fact easily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, added it to explain the motivation, but that's not helpful for someone just reading the docs 🤷‍♂️

Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, LGTM. 👍 Time to :shipit:

Copy link
Contributor

@dtomcej dtomcej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
:shipit:

@ldez ldez added kind/enhancement a new or improved feature. and removed kind/enhancement a new or improved feature. labels Oct 4, 2018
@mmatur mmatur requested a review from ldez October 5, 2018 14:31
rtreffer-sc and others added 4 commits October 5, 2018 17:04
A frontend would be flapping between available and unavailable if
1. The hostname/path was defined by multiple ingress objects
2. One of the ingress objects references a non-existing service

The flapping was caused by undefined ordering in a golang map: if the
last processed ingress was broken a delete was issued on the frontends
map - even if previous runs succeeded.

The new logic only adds a frontend to the map of frontends if the basic
validation passed. The frontend will thus always be available if at
least one ingress definition is not broken.
@ldez ldez modified the milestones: 1.6, 1.7 Oct 5, 2018
@rtreffer rtreffer requested review from a team as code owners October 5, 2018 15:18
@ldez ldez changed the base branch from v1.6 to v1.7 October 5, 2018 15:18
@ldez ldez removed request for a team October 5, 2018 15:18
@ldez ldez removed the bot/no-merge label Oct 5, 2018
Copy link
Contributor

@ldez ldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants