Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to start eventsource server: timed out waiting for the condition #1351

Closed
nicon89 opened this issue Sep 20, 2021 · 16 comments
Closed
Labels
bug Something isn't working

Comments

@nicon89
Copy link

nicon89 commented Sep 20, 2021

Describe the bug
I'm unable to start github-eventsource pod on a Kubernetes with istio enabled.

Environment (please complete the following information):

  • Kubernetes: [v1.19.11]
  • Argo: [v3.1.11]
  • Argo Events: [v1.4.1]

Additional context
Originally I was unable to access eventbus-default-stan service, but as suggested in #1311 I updated the service to:

$ k get svc -o yaml eventbus-default-stan-svc 
apiVersion: v1
kind: Service
metadata:
  annotations:
    resource-spec-hash: "744782701"
  creationTimestamp: "2021-09-16T14:37:29Z"
  labels:
    controller: eventbus-controller
    eventbus-name: default
    owner-name: default
    stan: "yes"
  name: eventbus-default-stan-svc
  namespace: argo-events
  ownerReferences:
  - apiVersion: argoproj.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: EventBus
    name: default
    uid: 7cd4b511-ebdc-474a-8f42-e93c1dbc0e70
  resourceVersion: "17087296"
  selfLink: /api/v1/namespaces/argo-events/services/eventbus-default-stan-svc
  uid: 525d6589-ea83-42b9-a80c-86aa45d051c9
spec:
  clusterIP: None
  ports:
  - name: tcp-client
    port: 4222
    protocol: TCP
    targetPort: 4222
  - name: cluster
    port: 6222
    protocol: TCP
    targetPort: 6222
  - name: monitor
    port: 8222
    protocol: TCP
    targetPort: 8222
  selector:
    controller: eventbus-controller
    eventbus-name: default
    owner-name: default
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

Now I'm able to access it using curl from testing container:

/ $ curl eventbus-default-stan-svc:4222
curl: (1) Received HTTP/0.9 when not allowed

but github-eventsource is still unable to start:

nicon@5CG10725TV-W10:~/.../prod/airflow$ k logs github-eventsource-8w5n2-65b768f64b-dg7dm main
{"level":"info","ts":1632155363.8100798,"logger":"argo-events.eventsource","caller":"cmd/start.go:63","msg":"starting eventsource server","eventSourceName":"github","version":"v1.4.1"}
{"level":"info","ts":1632155363.8101642,"logger":"argo-events.eventsource","caller":"eventsources/eventing.go:309","msg":"Starting event source server...","eventSourceName":"github"}
{"level":"info","ts":1632155363.8110776,"logger":"argo-events.eventsource","caller":"metrics/metrics.go:172","msg":"starting metrics server","eventSourceName":"github"}
{"level":"info","ts":1632155363.8148913,"logger":"argo-events.eventsource","caller":"driver/nats.go:93","msg":"NATS auth strategy: Token","eventSourceName":"github","clientID":"client-github-eventsource-8w5n2-65b768f64b-dg7dm-481"}   
{"level":"error","ts":1632155365.820021,"logger":"argo-events.eventsource","caller":"driver/nats.go:102","msg":"Failed to connect to NATS server","eventSourceName":"github","clientID":"client-github-eventsource-8w5n2-65b768f64b-dg7dm-481","error":"read tcp 10.99.1.185:33986->10.99.1.191:4222: i/o timeout","stacktrace":"github.com/argoproj/argo-events/eventbus/driver.(*natsStreaming).Connect\n\t/home/runner/work/argo-events/argo-events/eventbus/driver/nats.go:102\ngithub.com/argoproj/argo-events/eventsources.(*EventSourceAdaptor).run.func1\n\t/home/runner/work/argo-events/argo-events/eventsources/eventing.go:317\ngithub.com/argoproj/argo-events/common.Connect.func1\n\t/home/runner/work/argo-events/argo-events/common/retry.go:107\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection\n\t/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.19.7-rc.0/pkg/util/wait/wait.go:211\nk8s.io/apimachinery/pkg/util/wait.ExponentialBackoff\n\t/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.19.7-rc.0/pkg/util/wait/wait.go:399\ngithub.com/argoproj/argo-events/common.Connect\n\t/home/runner/work/argo-events/argo-events/common/retry.go:106\ngithub.com/argoproj/argo-events/eventsources.(*EventSourceAdaptor).run\n\t/home/runner/work/argo-events/argo-events/eventsources/eventing.go:316\ngithub.com/argoproj/argo-events/eventsources.(*EventSourceAdaptor).Start\n\t/home/runner/work/argo-events/argo-events/eventsources/eventing.go:284\ngithub.com/argoproj/argo-events/eventsources/cmd.Start\n\t/home/runner/work/argo-events/argo-events/eventsources/cmd/start.go:65\ngithub.com/argoproj/argo-events/cmd/commands.NewEventSourceCommand.func1\n\t/home/runner/work/argo-events/argo-events/cmd/commands/eventsource.go:14\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:846\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:950\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/runner/go/pkg/mod/github.com/spf13/cobra@v1.0.0/command.go:887\ngithub.com/argoproj/argo-events/cmd/commands.Execute\n\t/home/runner/work/argo-events/argo-events/cmd/commands/root.go:19\nmain.main\n\t/home/runner/work/argo-events/argo-events/cmd/main.go:8\nruntime.main\n\t/opt/hostedtoolcache/go/1.15.15/x64/src/runtime/proc.go:204"}  

Can you please help me with that?


Message from the maintainers:

If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.

@nicon89 nicon89 added the bug Something isn't working label Sep 20, 2021
@antoniomo
Copy link
Contributor

What's the difference between your testing container and the github eventsource? Different namespace? Are they both with the Istio sidecar?

@oursland
Copy link

@nicon89

This is a shot in the dark, but is it possible that the Istio Sidecar is not ready? (My go-to question with all problems Istio...)

In my applications I set the meshConfig.defaultConfig.holdApplicationUntilProxyStarts option to true to prevent attempts to use the network before the routing tables have been configured.

The documentation suggests this can be enabled on a per-pod basis with the annotation, but I have not tested:

annotations:
  proxy.istio.io/config: |
    holdApplicationUntilProxyStarts: false

@nicon89
Copy link
Author

nicon89 commented Sep 21, 2021

What's the difference between your testing container and the github eventsource? Different namespace? Are they both with the Istio sidecar?

both are with istio-sidecar. No difference at all. They are in same namespace.

This is a shot in the dark, but is it possible that the Istio Sidecar is not ready? (My go-to question with all problems Istio...)

it is ready. In fact I was able to execute curl from the sidecar of failing container:

nicon@5CG10725TV-W10:~/.../staging/argo-events$ k get pods
NAME                                        READY   STATUS             RESTARTS   AGE
eventbus-controller-79bcdb87ff-chmbk        2/2     Running            1          15h
eventbus-default-stan-0                     3/3     Running            0          14h
eventbus-default-stan-1                     3/3     Running            0          14h
eventbus-default-stan-2                     3/3     Running            3          3d17h
events-webhook-f9f546984-hm8fm              2/2     Running            1          15h
eventsource-controller-8d57cccb8-x5t28      2/2     Running            1          15h
github-eventsource-8w5n2-65b768f64b-25v74   1/2     CrashLoopBackOff   164        15h
sensor-controller-759b7b8bbb-vgtbq          2/2     Running            2          15h
sleep-769579785f-ljkcj                      2/2     Running            0          15h
nicon@5CG10725TV-W10:~/.../staging/argo-events$ k exec -it github-eventsource-8w5n2-65b768f64b-25v74 -c istio-proxy bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
istio-proxy@github-eventsource-8w5n2-65b768f64b-25v74:/$ curl eventbus-default-stan-svc:4222
INFO {"server_id":"NDHR24WPDHILASYW7ZNYLWWEGNOTXP2LRUJOE4LU3SFWFZX4HDVNZDWW","server_name":"NDHR24WPDHILASYW7ZNYLWWEGNOTXP2LRUJOE4LU3SFWFZX4HDVNZDWW","version":"2.3.3","proto":1,"git_commit":"aaba459","go":"go1.16.6","host":"0.0.0.0","port":4222,"headers":true,"auth_required":true,"max_payload":1048576,"client_id":15,"client_ip":"127.0.0.1","cluster":"t2ZebT6onqQN5Pi4FTG68T","connect_urls":["10.99.1.12:4222","10.99.0.199:4222","10.99.0.184:4222"]}
-ERR 'Authorization Violation'
istio-proxy@github-eventsource-8w5n2-65b768f64b-25v74:/$ 

@antoniomo
Copy link
Contributor

Another shot in the dark but, what about a rollout redeployment of the eventbus service?

@nicon89
Copy link
Author

nicon89 commented Sep 21, 2021

It didn't made any difference.

@whynowy
Copy link
Member

whynowy commented Sep 21, 2021

@antoniomo - does it work in your case?

@antoniomo
Copy link
Contributor

Yep, we have a bunch of argo and argo-sensor installations working with Istio with a similar change. Our argo version is 3.0.8, and argo-events is at 1.4.0.

We make use of the SQS and Webhook event sources. We only have Istio on the webhook ones, for ingress (virtual service) purposes.

Some more information on our setup, excerpts from the manifests coming.

On the SQS event-sources we disable it:

apiVersion: argoproj.io/v1alpha1
kind: EventSource
metadata:
  name: sqs
  namespace: foobar
spec:
  template:
    serviceAccountName: foobar
    metadata:
      annotations:
        sidecar.istio.io/inject: "false"
  sqs:
    ...

The eventbus service and the event-sources sit on the same namespace, as we are using namespaced argo and argo-events installations, which forces this setup.

The namespace is istio-enabled:

  labels:
    istio-injection: enabled

The eventbus service:

apiVersion: v1
kind: Service
metadata:
  labels:
    controller: eventbus-controller
    eventbus-name: default
    owner-name: default
    stan: "yes"
  name: eventbus-default-stan-svc
  namespace: foobar
spec:
  clusterIP: None
  ports:
  - name: tcp-client
    port: 4222
    protocol: TCP
    targetPort: 4222
  - name: cluster
    port: 6222
    protocol: TCP
    targetPort: 6222
  - name: monitor
    port: 8222
    protocol: TCP
    targetPort: 8222
  selector:
    controller: eventbus-controller
    eventbus-name: default
    owner-name: default
  sessionAffinity: None
  type: ClusterIP

Not sure if any of that might help.

@nicon89
Copy link
Author

nicon89 commented Sep 22, 2021

@antoniomo does your eventbus pods are started with envoy injected?

@antoniomo
Copy link
Contributor

@antoniomo does your eventbus pods are started with envoy injected?

Actually nope, I should have given our manifest instead of describe from the service!

Here it is:

apiVersion: argoproj.io/v1alpha1
kind: EventBus
metadata:
  name: default
spec:
  nats:
    native:
      metadata:
        annotations:
          sidecar.istio.io/inject: "false"
      replicas: 3 # optional, defaults to 3, and requires minimal 3
      auth: token # optional, default to none
      persistence: # optional
        # storageClassName: standard  # Default of the cluster
        accessMode: ReadWriteOnce
        volumeSize: 10Gi
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchLabels:
                    controller: eventbus-controller
                    eventbus-name: default
                topologyKey: kubernetes.io/hostname
              weight: 100  # Best-effort anti-affinity

We only have the istio sidecar in the webhook event source (as we need a virtual service there for ingress purposes).

@nicon89
Copy link
Author

nicon89 commented Sep 22, 2021

Can you check if it works when you have istio sidecar enabled?

@antoniomo
Copy link
Contributor

Not during the week, this is a production setup, I would have to replicate locally. However, we don't have istio anywhere where it is not required as it consumes quite a bit of resources. Hence, we only run it in the webhook event source, as explained.

The only reason why that event source is in the same namespace as the eventbus is due to a namespaced argo-events installation, that forces us to use this setup.

@antoniomo
Copy link
Contributor

Can you test with/without the annotation to disable istio sidecar on the event bus, while keeping istio in your event source?

@nicon89
Copy link
Author

nicon89 commented Sep 22, 2021

Yes, without istio sidecar in event bus it's connecting fine. Unfortunately in our organization istio is a requirement :(

@antoniomo
Copy link
Contributor

antoniomo commented Sep 22, 2021

Unfortunately #1311 won't fix it for you then, the tcp-client patch won't be enough :(

Basically that enables istio/istio#28623, that is, calling NATS (the EventBus) from an Istio-enabled namespace/from your event sources. But that doesn't enable NATS itself (the EventBus) to have an istio proxy.

For that the solution could be to create a VirtualService on top of NATS, like so: nats-io/nats-operator#88 (comment) Edited: Doesn't look so promising. If I read the thread and follow up correctly, the recommended way is just the tcp- prefix to the port, as we are doing. Could be kubernetes/istio versions, are you running somewhat older ones?

Unfortunately, NATS and Istio don't play so well together, that's just one of several threads about it :/

@nicon89
Copy link
Author

nicon89 commented Sep 24, 2021

Issue was caused by a destinationrule that was disabling mTLS.
Without this it works just fine.
Thank you!

@nicon89 nicon89 closed this as completed Sep 24, 2021
@antoniomo
Copy link
Contributor

Awesome! I'm sure this thread will help more people in the future, as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants