Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KIC doesn't detect Gateway's faulty condition while pushing configuration #3099

Closed
1 task done
czeslavo opened this issue Oct 25, 2022 · 8 comments
Closed
1 task done
Labels
area/kong Issue with Kong proxy behavior, rather than the controller bug Something isn't working
Milestone

Comments

@czeslavo
Copy link
Contributor

czeslavo commented Oct 25, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

  1. Customer pushed some configuration to k8s.
  2. KIC sent that config to the Gateway, Gateway got stalled, and the user got no feedback in logs.
  3. Customer pushed some other configuration to k8s.
  4. KIC failed to push that other config, but the user got no feedback in logs again.

Expected Behavior

It's expected that in 2) and 4) KIC should detect Gateway faulty condition and log an error.

Steps To Reproduce

  1. Configure an Ingress with an incompatible regex path
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: echo
  name: test-default
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: echo
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: echo
    spec:
      containers:
        - env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: spec.nodeName
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
            - name: POD_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP
          image: gcr.io/kubernetes-e2e-test-images/echoserver:2.2
          imagePullPolicy: IfNotPresent
          name: echo
          ports:
            - containerPort: 8080
              protocol: TCP
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /etc/nginx/nginx.conf
              name: nginx-conf
              subPath: nginx.conf
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
        - configMap:
            defaultMode: 420
            items:
              - key: nginx.conf
                path: nginx.conf
            name: echo-nginx-conf
          name: nginx-conf
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    ingress.kubernetes.io/service-upstream: "true"
    konghq.com/override: test-default
  name: test-default
  namespace: default
spec:
  clusterIP: 172.20.88.85
  clusterIPs:
    - 172.20.88.85
  internalTrafficPolicy: Cluster
  ipFamilies:
    - IPv4
  ipFamilyPolicy: SingleStack
  ports:
    - name: high
      port: 8080
      protocol: TCP
      targetPort: 8080
    - name: low
      port: 80
      protocol: TCP
      targetPort: 8080
  selector:
    app: echo
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    konghq.com/override: test-default
    konghq.com/snis: <abc.example.com>
  labels:
    PathConfig: Default
  name: bad-config
  namespace: default
spec:
  ingressClassName: <kong>
  rules:
    - host: <abc.example.com>
      http:
        paths:
          - backend:
              service:
                name: test-default
                port:
                  number: 8080
            path: /~/\/*$
            pathType: ImplementationSpecific
---
apiVersion: configuration.konghq.com/v1
kind: KongIngress
metadata:
  name: test-default
  namespace: default
proxy: {}
route:
  https_redirect_status_code: 308
  preserve_host: false
  protocols:
    - https
  strip_path: false
upstream:
  host_header: test-default.default.svc
  slots: 5000
  1. Gateway should get stalled (@mflendrich - what do we exactly mean by 'stalled' here?)
  2. Fix the Ingress' path manually in the manifests.
  3. KIC doesn't propagate the fixed configuration to the Gateway and reports no errors in logs.

Kong Ingress Controller version

KIC: v2.7.0 
Gateway: v3.0.0
KONG_ROUTER_FLAVOR=traditional_compatible

Kubernetes version

No response

Anything else?

No response

@czeslavo czeslavo added bug Something isn't working area/kong Issue with Kong proxy behavior, rather than the controller labels Oct 25, 2022
@mflendrich
Copy link
Contributor

This is the config I have used to reproduce this issue on KIC 2.7.0 Gateway 3.0.0 (kong router flavor traditional_compatible)

Step 1: apply this

apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpbin-deployment
  labels:
    app: httpbin
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpbin
  template:
    metadata:
      labels:
        app: httpbin
    spec:
      containers:
      - name: httpbin
        image: kong/httpbin:0.1.0
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: httpbin
  name: httpbin-deployment
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: httpbin
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: httpbin-ingress
  annotations:
    httpbin.ingress.kubernetes.io/rewrite-target: /
    kubernetes.io/ingress.class: "kong"
    konghq.com/strip-path: 'true'
spec:
  rules:
  - http:
      paths:
      - path: /foo
        pathType: Prefix
        backend:
          service:
            name: httpbin-deployment
            port:
              number: 80

Observe that the app is up and running, and that going with your web browser to /foo shows the httpbin index page.

Step 2: apply the breaking config

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: httpbin-ingress-2
  annotations:
    httpbin.ingress.kubernetes.io/rewrite-target: /
    kubernetes.io/ingress.class: "kong"
spec:
  rules:
  - http:
      paths:
      - path: '/~/delay/(?<delay>[^\/]+)$'
        pathType: ImplementationSpecific
        backend:
          service:
            name: httpbin-deployment
            port:
              number: 80

Step 3: once KIC has had enough time to send httpbin-ingress-2 to Kong, try making a change in httpbin-ingress, for example kubectl edit ingress httpbin-ingress and change spec.rules[0].http.paths[0].path from /foo to /bar.

Then navigate your browser to foo and then to /bar.
Desired behavior: /foo: Kong 404 no Route matched with those values, /bar: the httpbin page
Actual behavior: /foo: the httpbin page, /bar: Kong 404 no Route matched with those values

@mflendrich
Copy link
Contributor

Gateway should get stalled (@mflendrich - what do we exactly mean by 'stalled' here?)

The observed behavior is that KIC loses the ability to change any value in Kong's Admin API. In the above comment, this means that:

  • the /foo -> /bar route path change never gets visibly configured on Kong
  • there is no log feedback that something bad happened

@mflendrich
Copy link
Contributor

As @czeslavo pointed out: what needs to happen here is:

  • reproduce the issue from the above comments
  • figure out how to make logs show that reconciliation stopped happening
  • implement the above.

@mflendrich mflendrich added this to the KIC v2.8.0 milestone Oct 25, 2022
@rainest
Copy link
Contributor

rainest commented Nov 9, 2022

This is the same problem we've always had with broken configuration (currently tracked in FTI-4350), correct?

I can't actually replicate the specific issue (the first comment suggests the issue is with paths, but both /~/delay/(?<delay>[^\/]+)$' and /~/\/*$ are accepted, and their presence does not appear to interrupt router rebuilds or otherwise prevent changes from other Ingresses from taking effect).

Generally, if KIC fails to push configuration, it does log that. You'll get something like

time="2022-11-09T22:33:54Z" level=error msg="could not update kong admin" error="posting new config to /config: HTTP status 400 (message: \"declarative config is invalid: {services={{routes={[2]={methods=\\\"cannot set 'methods' when 'protocols' is 'grpc' or 'grpcs'\\\"}}}}}\")" subsystem=dataplane-synchronizer

after PerformUpdate bubbles up the error through KongClient.Update(), and finally to the sync loop, which logs it. The Prometheus push success/fail indicator will show the last push status as well.

@czeslavo
Copy link
Contributor Author

I think there are more details in the FTI-4444 discussion. What I understood was that there was a possibility to turn the Gateway into some faulty state which wouldn't allow us to fix it, without restarting the Gateway (even when deleting the faulty k8s resources which originally caused the problem). @mflendrich has reproduced it as described in the first comment. It might be already fixed on the Gateway side so probably in order to reproduce it we should use the same version of KIC and Gateway as in the FTI.

@rainest
Copy link
Contributor

rainest commented Nov 10, 2022

Ah, okay, this is the case where the gateway does accept the configuration and then doesn't rebuild the router properly, and I didn't see it because I missed the traditional_compatible part. That'd presumably affect users who either use flat manifests and change the controller version in their existing Deployment or who are using an older version of the chart with a newer controller release--the current defaults will set traditional unless you override it.

Errors of that class we need some change on the gateway side, either to ingest such routes gracefully or reject them up front--we can't reliably know that they're broken if Kong doesn't say they are. It looks like Kong/kong#9480 will still let the problem routes through, but limit their impact to that route only.

@czeslavo
Copy link
Contributor Author

Errors of that class we need some change on the gateway side, either to ingest such routes gracefully or reject them up front--we can't reliably know that they're broken if Kong doesn't say they are. It looks like Kong/kong#9480 will still let the problem routes through, but limit their impact to that route only.

Given that I guess we can close this issue as there's nothing we can do about it on the KIC side, right?

@rainest
Copy link
Contributor

rainest commented Nov 14, 2022

AFAIK yeah. Someone reopen if I'm wrong and there's something additional we wanted to do on our end.

@rainest rainest closed this as completed Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kong Issue with Kong proxy behavior, rather than the controller bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants