Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sluggish performance & timeouts with RBAC for Sessions enabled #12498

Closed
deusxanima opened this issue May 6, 2022 · 4 comments
Closed

Sluggish performance & timeouts with RBAC for Sessions enabled #12498

deusxanima opened this issue May 6, 2022 · 4 comments
Labels
audit-log Issues related to Teleports Audit Log bug rbac Issues related to Role Based Access Control

Comments

@deusxanima
Copy link
Contributor

deusxanima commented May 6, 2022

Expected behavior:
Using RBAC for sessions should not cause Session Recordings page to display an endless loading spinner or take 30-60 seconds to pull up a session recording on an otherwise low-traffic cluster.

Current behavior:
If a role is configured with RBAC for sessions, navigating to or refreshing Session Recordings page at times causes the page to display an endless loading spinner. This behavior is observed even with a single user/single recording cluster where Session Recording page may take up to 30-60 seconds to pull up the session recording. DynamoDB table limits are far above the read/write load observed during this time. Logs on the auth/proxy server also seem to lag at this time and are not updated until the session page is fully refreshed.

Bug details:

  • Teleport version 9.0.1+
  • Recreation steps
  1. Configure a role w/ RBAC for Sessions for an OIDC/SAML user:

     - resources:
       - session
       verbs:
       - list
       - read
       - where: contains(session.participants, user.metadata.name)
    
  2. Log in as the user and create a brief session by SSH-ing into a node and exiting after issuing a sample command (ex: echo test)

  3. Navigate to the Session Recordings page and observe the loading spinner being displayed for an inordinate amount of time even if no other activity or user-specific sessions are present on the cluster. Note: you may have to refresh a few times to hit it b/c it's not always consistent.

  • Debug logs
May 06 15:28:53 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:53-04:00 DEBU             [SAML] ACS: https://proxy.example.com:443/v1/webapi/saml/acs services/saml.go:100
May 06 15:28:53 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:53-04:00 DEBU             [SAML] SSO: https://portal.sso.us-east-2.amazonaws.com/saml/assertion/Nzc5Mzg4NzQ2ODMzX2lucy0yZTk3MTAzOWFjYmYxYmY4 services/saml.go:98
May 06 15:28:53 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:53-04:00 DEBU             [SAML] Issuer: https://portal.sso.us-east-2.amazonaws.com/saml/assertion/Nzc5Mzg4NzQ2ODMzX2lucy0yZTk3MTAzOWFjYmYxYmY4 services/saml.go:99
May 06 15:28:53 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:53-04:00 DEBU             [SAML] ACS: https://proxy.example.com/v1/webapi/saml/acs services/saml.go:100
May 06 15:28:53 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:53-04:00 DEBU             [SAML] SSO: https://sso.jumpcloud.com/saml2/teleport services/saml.go:98
May 06 15:28:53 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:53-04:00 DEBU             [SAML] Issuer: JumpCloud services/saml.go:99
May 06 15:28:53 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:53-04:00 DEBU             [SAML] ACS: https://proxy.example.com/v1/webapi/saml/acs services/saml.go:100
May 06 15:28:53 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:53-04:00 DEBU             [SAML] SSO: https://user.onelogin.com/trust/saml2/http-redirect/sso/43480663-1d59-4a1f-8146-803b1e097797 services/saml.go:98
May 06 15:28:53 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:53-04:00 DEBU             [SAML] Issuer: https://app.onelogin.com/saml/metadata/43480663-1d59-4a1f-8146-803b1e097797 services/saml.go:99
May 06 15:28:53 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:53-04:00 DEBU             [SAML] ACS: https://rc.ateleport.ga:443/v1/webapi/saml/acs services/saml.go:100
May 06 15:28:54 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:54-04:00 DEBU             [SAML] Successfully fetched entity descriptor from "https://user.okta.com/app/xxx/sso/saml/metadata" services/saml.go:62
May 06 15:28:54 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:54-04:00 DEBU             [SAML] SSO: https://user.okta.com/app/user_teleport/xxx/sso/saml services/saml.go:98
May 06 15:28:54 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:54-04:00 DEBU             [SAML] Issuer: http://www.okta.com/xxx services/saml.go:99
May 06 15:28:54 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:54-04:00 DEBU             [SAML] ACS: https://proxy.example.com:443/v1/webapi/saml/acs services/saml.go:100
May 06 15:28:54 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:54-04:00 DEBU             [SAML] SSO: https://keycloak.ateleport.ga/auth/realms/master/protocol/saml services/saml.go:98
May 06 15:28:54 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:54-04:00 DEBU             [SAML] Issuer: https://rc.ateleport.ga/web/login services/saml.go:99
May 06 15:28:54 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:54-04:00 DEBU             [SAML] ACS: https://proxy.example.com/v1/webapi/saml/acs services/saml.go:100
May 06 15:28:54 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:54-04:00 DEBU [AUTH:1]    Server certificate cert(cc.ip-172-31-72-82-ec2-internal issued by ip-172-31-72-82-ec2-internal:1234). auth/middleware.go:313
May 06 15:28:54 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:54-04:00 DEBU [AUTH:1]    Server certificate cert(cc.ip-172-31-72-82-ec2-internal issued by ip-172-31-72-82-ec2-internal:1234). auth/middleware.go:313
May 06 15:28:54 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:54-04:00 DEBU [AUDIT]     SearchSessionEvents(2022-05-06 04:00:00 +0000 UTC, 2022-05-07 03:59:59.999 +0000 UTC, 5000) events/auditlog.go:1042
May 06 15:28:54 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:54-04:00 DEBU [RBAC]      Check access to role(tmp) rc(ip-172-31-18-91-us-east-2-compute-internal, labels=map[type:tc]) matchLabels=true, msg=matched, err=<nil> allow=map[*:[*]] rcLabels=map[type:tc] services/role.go:1358
May 06 15:28:54 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:54-04:00 DEBU [RBAC]      Check access to role(Proxy) rc(ip-172-31-18-91-us-east-2-compute-internal, labels=map[type:tc]) matchLabels=true, msg=matched, err=<nil> allow=map[*:[*]] rcLabels=map[type:tc] services/role.go:1358
May 06 15:28:56 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:56-04:00 DEBU [SSH:PROXY] Incoming connection 99.1.1.1:35118 -> 172.1.1.1:443 version: SSH-2.0-Go, certtype: "host" sshutils/server.go:466
May 06 15:28:56 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:56-04:00 DEBU [PROXY:1]   Transport request: teleport-transport. reversetunnel/srv.go:607
May 06 15:28:56 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:56-04:00 DEBU [PROXY:1]   Received out-of-band proxy transport request for @remote-auth-server []. reversetunnel/transport.go:207
May 06 15:28:56 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:56-04:00 DEBU [PROXY:1]   Attempting to dial through tunnel with server ID "". reversetunnel/transport.go:375
May 06 15:28:56 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:56-04:00 DEBU [PROXY:1]   Attempting to dial directly [172.1.1.1:3025]. reversetunnel/transport.go:392
May 06 15:28:56 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:56-04:00 DEBU [PROXY:1]   Returning direct dialed connection to [172.1.1.13025]. reversetunnel/transport.go:398
May 06 15:28:56 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:56-04:00 DEBU [PROXY:1]   Successfully dialed to @remote-auth-server "", start proxying. reversetunnel/transport.go:311
May 06 15:28:56 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:56-04:00 DEBU [AUTH:1]    Server certificate cert(cc.ip-172-31-72-82-ec2-internal issued by ip-172-31-72-82-ec2-internal:1234). auth/middleware.go:313
May 06 15:28:59 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:28:59-04:00 DEBU [RBAC]      Check access to role(Proxy) rc(ip-172-31-18-91-us-east-2-compute-internal, labels=map[type:tc]) matchLabels=true, msg=matched, err=<nil> allow=map[*:[*]] rcLabels=map[type:tc] services/role.go:1358
May 06 15:29:01 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:29:01-04:00 DEBU             [SAML] SSO: https://user.okta.com/app/user_teleport/xxx/sso/saml services/saml.go:98
May 06 15:29:01 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:29:01-04:00 DEBU             [SAML] Issuer: http://www.okta.com/xxx services/saml.go:99
May 06 15:29:01 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:29:01-04:00 DEBU             [SAML] ACS: https://proxy.example.com:443/v1/webapi/saml/acs services/saml.go:100
May 06 15:29:01 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:29:01-04:00 DEBU             [SAML] SSO: https://portal.sso.us-east-2.amazonaws.com/saml/assertion/xxx services/saml.go:98
May 06 15:29:01 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:29:01-04:00 DEBU             [SAML] Issuer: https://portal.sso.us-east-2.amazonaws.com/saml/assertion/xxx services/saml.go:99
May 06 15:29:01 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:29:01-04:00 DEBU             [SAML] ACS: https://proxy.example.com/v1/webapi/saml/acs services/saml.go:100
May 06 15:29:01 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:29:01-04:00 DEBU             [SAML] SSO: https://sso.jumpcloud.com/saml2/teleport services/saml.go:98
May 06 15:29:01 rc-node-1 /usr/local/bin/teleport[17247]: 2022-05-06T15:29:01-04:00 DEBU             [SAML] Issuer: JumpCloud services/saml.go:99

gz#5050

@deusxanima deusxanima added the bug label May 6, 2022
@zmb3 zmb3 added the rbac Issues related to Role Based Access Control label May 6, 2022
@deusxanima
Copy link
Contributor Author

Responding to @russjones comments from weekly prod sync: both customer and my tests were done on self-hosted clusters. No DynamoDB throttling observed in either case (in my cluster specifically I ran test against a single node with only 1 user, myself, and no other traffic, with 20 read/writes provisioned in Dynamo). This is the RBAC role I tested with:

kind: role
metadata:
  labels:
    description: this is a test
  name: tmp
spec:
  allow:
    app_labels:
      '*': '*'
    cluster_labels:
      '*': '*'
    db_labels:
      '*': '*'
    db_names:
    - '*'
    db_users:
    - '*'
    kubernetes_groups:
    - read
    - '{{interal.kubernetes_groups}}'
    kubernetes_labels:
      '*': '*'
    logins:
    - '{{internal.logins}}'
    - ubuntu
    node_labels:
      env: db
    rules:
    - resources:
      - access_request
      verbs:
      - list
      - read
      - update
      - delete
      - create
    - resources:
      - applications
      verbs:
      - list
      - read
      - update
      - delete
    - resources:
      - role
      verbs:
      - list
      - create
      - read
      - update
      - delete
    - resources:
      - auth_connector
      verbs:
      - list
      - create
      - read
      - update
      - delete
    - resources:
      - session
      verbs:
      - list
      - read
      where: contains(session.participants, user.metadata.name)
    - resources:
      - trusted_cluster
      verbs:
      - list
      - create
      - read
      - update
      - delete
    - resources:
      - event
      verbs:
      - list
      - read
    - resources:
      - user
      verbs:
      - list
      - create
      - read
      - update
      - delete
    - resources:
      - token
      verbs:
      - list
      - create
      - read
      - update
      - delete
    - resources:
      - remote_cluster
      verbs:
      - list
      - create
      - read
      - update
      - delete
    - resources:
      - node
      verbs:
      - list
      - create
      - read
      - update
      - delete
    - resources:
      - kube_service
      verbs:
      - delete
    windows_desktop_labels:
      '*': '*'
    windows_desktop_logins:
    - Administrator
    - '{{internal.windows_logins}}'
  deny: {}
  options:
    cert_format: standard
    client_idle_timeout: 1h0m0s
    desktop_clipboard: true
    enhanced_recording:
    - command
    - network
    forward_agent: true
    max_session_ttl: 8h0m0s
    port_forwarding: true
    record_session:
      desktop: true
version: v5

I'll gather the role file from customer environment as well. Is there any additional info that would be helpful in this instance?

@deusxanima
Copy link
Contributor Author

Customer role:

kind: role
metadata:
  name: teleport-view
spec:
  allow:
    rules:
      - resources:
          - access_request
          - event
          - role
          - user
        verbs:
          - list
          - read
      - resources:
          - session
        verbs:
          - list
      - resources:
          - session
        verbs:
          - read
        where: contains(session.participants, user.metadata.name)
  deny: {}
  options:
    max_session_ttl: 12h0m0s
version: v4

Customer confirmed that removing the where conditional clears the issue. Same behavior observed in my cluster.

@zmb3
Copy link
Collaborator

zmb3 commented Aug 10, 2022

@AHARIC this should be fixed now, can you confirm whether there is still an issue?

@deusxanima
Copy link
Contributor Author

Confirmed in my env. Closing out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audit-log Issues related to Teleports Audit Log bug rbac Issues related to Role Based Access Control
Projects
None yet
Development

No branches or pull requests

2 participants