Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argo server logs print "token not valid for running mode" every 30s #5697

Closed
avifried1 opened this issue Apr 17, 2021 · 15 comments · Fixed by #8333
Closed

Argo server logs print "token not valid for running mode" every 30s #5697

avifried1 opened this issue Apr 17, 2021 · 15 comments · Fixed by #8333

Comments

@avifried1
Copy link

Summary

The argo server logs are bombarded with the following error every 30s or so:

time="2021-04-17T23:23:10.983Z" level=info msg="finished streaming call with code Unauthenticated" error="rpc error: code = Unauthenticated desc = token not valid for running mode" grpc.code=Unauthenticated grpc.method=WatchWorkflows grpc.service=workflow.WorkflowService grpc.start_time="2021-04-17T23:23:10Z" grpc.time_ms=0.064 span.kind=server system=grpc

Running with both SSO & client auth modes. Both work fine in UI & API calls (with authentication header with Bearer TOKEN)
We haven't seen this before v3 (upgraded from v2.12.11).

Diagnostics

Running on self-managed k8s cluster in AWS. Server behind an ingress, running insecure with TLS termination done in LB.
Argo workflows v3.0.1

Server YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: argo-server
  namespace: prod
spec:
  replicas: 2
  selector:
    matchLabels:
      app: argo-server
  template:
    metadata:
      labels:
        app: argo-server
    spec:
      nodeSelector:
        general-use: 'true'
      containers:
        - name: argo-server
          image: argoproj/argocli:v3.0.1
          args:
            - server
            - --namespaced
            - --managed-namespace
            - workflows
            - --namespace
            - workflows
            - --configmap
            - workflow-controller-configmap
            - --auth-mode
            - sso
            - --auth-mode
            - client
            - --secure=false
            - --x-frame-options
            - SAMEORIGIN
          ports:
            - containerPort: 2746
              name: web
          securityContext:
            capabilities:
              drop:
                - ALL
          readinessProbe:
            httpGet:
              path: /
              port: 2746
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 20
          volumeMounts:
            - mountPath: /tmp
              name: tmp
      volumes:
        - name: tmp
          emptyDir: { }
      securityContext:
        runAsNonRoot: true
      serviceAccountName: argo-server-ui

SSO:

  sso: |
    issuer: https://accounts.google.com
    clientId:
      name: argo-secrets
      key: oidcClientId
    clientSecret:
      name: argo-secrets
      key: oidcClientSecret
    redirectUrl: https://our.redirect.domain/oauth2/callback
    scopes:
      - email
    rbac:
      enabled: true

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@alexec
Copy link
Contributor

alexec commented Apr 18, 2021

You most likely have a user who's browser window is open - but they have been logged out.

@avifried1
Copy link
Author

it really pollutes the logs. seems like the UI shouldn't need to poll the server if user's logged off.

@toninis
Copy link

toninis commented Apr 27, 2021

Check also if there is a healtcheck from kubernetes

@avifried1
Copy link
Author

there's only the readiness probe:

readinessProbe:
            httpGet:
              path: /
              port: 2746
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 20

but the frequency of these info-level log lines doesn't suggest it is that.

@alexec alexec removed the type/bug label May 4, 2021
@syu-lk4b
Copy link

I am having the same error

time="2021-05-19T04:35:52.989Z" level=info msg="SSO enabled"
time="2021-05-19T04:35:52.992Z" level=info msg="Starting Argo Server" instanceID= version=v3.0.2
time="2021-05-19T04:35:52.992Z" level=info msg="Creating DB session"
time="2021-05-19T04:35:53.002Z" level=info msg="Node status offloading config" ttl=5m0s
time="2021-05-19T04:35:53.002Z" level=info msg="Creating event controller" operationQueueSize=16 workerCount=4
time="2021-05-19T04:35:53.006Z" level=info msg="Argo Server started successfully on https://localhost:2746"
time="2021-05-19T04:37:48.141Z" level=info msg="finished unary call with code Unauthenticated" error="rpc error: code = Unauthenticated desc = token not valid for running mode" grpc.code=Unauthenticated grpc.method=ListWorkflowTemplates grpc.service=workflowtemplate.WorkflowTemplateService grpc.start_time="2021-05-19T04:37:48Z" grpc.time_ms=0.069 span.kind=server system=grpc
time="2021-05-19T04:37:48.151Z" level=info msg="finished unary call with code Unauthenticated" error="rpc error: code = Unauthenticated desc = token not valid for running mode" grpc.code=Unauthenticated grpc.method=ListCronWorkflows grpc.service=cronworkflow.CronWorkflowService grpc.start_time="2021-05-19T04:37:48Z" grpc.time_ms=0.035 span.kind=server system=grpc
time="2021-05-19T04:37:48.151Z" level=info msg="finished unary call with code Unauthenticated" error="rpc error: code = Unauthenticated desc = token not valid for running mode" grpc.code=Unauthenticated grpc.method=ListWorkflows grpc.service=workflow.WorkflowService grpc.start_time="2021-05-19T04:37:48Z" grpc.time_ms=0.037 span.kind=server system=grpc
time="2021-05-19T04:37:50.488Z" level=info msg="finished unary call with code Unauthenticated" error="rpc error: code = Unauthenticated desc = token not valid for running mode" grpc.code=Unauthenticated grpc.method=GetInfo grpc.service=info.InfoService grpc.start_time="2021-05-19T04:37:50Z" grpc.time_ms=0.05 span.kind=server system=grpc
time="2021-05-19T04:37:56.925Z" level=info msg="finished unary call with code Unauthenticated" error="rpc error: code = Unauthenticated desc = token not valid for running mode" grpc.code=Unauthenticated grpc.method=GetInfo grpc.service=info.InfoService grpc.start_time="2021-05-19T04:37:56Z" grpc.time_ms=0.051 span.kind=server system=grpc

and here my config
- args:
- server
- --namespaced
- --auth-mode
- sso

@syu-lk4b
Copy link

any one knows this is the problem from the keycloak or argo setup itself?

@stale
Copy link

stale bot commented Jul 18, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jul 18, 2021
@stale stale bot closed this as completed Jul 25, 2021
@rchojn
Copy link

rchojn commented Aug 28, 2021

Hello, any update on this issue? @avifried1 have you resolved it?

@vitalyrychkov
Copy link
Contributor

Hello, we have the same issue with Argo Wfl 3.1.3 TLS and SSO with Keycloak as OIDC provider.
msg="finished streaming call with code Unauthenticated" error="rpc error: code = Unauthenticated desc = token not valid for running mode"
Argo Wfl documentation does not help.

@tomashtg
Copy link

tomashtg commented Feb 3, 2022

Hello, for us the issue remains the same, any updates on this?

@alexec alexec added type/bug area/ui good first issue Good for newcomers and removed wontfix labels Feb 3, 2022
@alexec alexec reopened this Feb 3, 2022
@alexec
Copy link
Contributor

alexec commented Feb 3, 2022

I believe this issue happens because the user interface retries on error. This can happen when you were logged in, then login expires. It is either retrying too often, or should stop if the error is Unauthorised.

@alexec
Copy link
Contributor

alexec commented Feb 7, 2022

@sarabala1979 I've kept this one for @dpadhiar as I think we should fix it.

@alexec alexec changed the title Argo server logs bombarded with "token not valid for running mode" error Argo server logs print "token not valid for running mode" every 30s Feb 9, 2022
@alexec
Copy link
Contributor

alexec commented Feb 9, 2022

@sarabala1979 @dpadhiar going into the details of this, it seem "bombarded" does not mean ever 30ms, but every 30s. I think this is fine for most users and not worthwhile fixing today.

@stale
Copy link

stale bot commented Mar 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the problem/stale This has not had a response in some time label Mar 2, 2022
@alexec
Copy link
Contributor

alexec commented Apr 7, 2022

To fix this:

const reconnectAfterMs = 3000;

Add some kind of backoff.

@stale stale bot removed the problem/stale This has not had a response in some time label Apr 7, 2022
alexec added a commit to alexec/argo-workflows that referenced this issue Apr 7, 2022
Signed-off-by: Alex Collins <alex_collins@intuit.com>
alexec added a commit that referenced this issue Apr 12, 2022
Signed-off-by: Alex Collins <alex_collins@intuit.com>
@sarabala1979 sarabala1979 mentioned this issue Apr 14, 2022
85 tasks
sarabala1979 pushed a commit that referenced this issue Apr 18, 2022
Signed-off-by: Alex Collins <alex_collins@intuit.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants