Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client_idle_timeout does not work #2166

Closed
datasage opened this issue Aug 12, 2018 · 11 comments
Closed

client_idle_timeout does not work #2166

datasage opened this issue Aug 12, 2018 · 11 comments

Comments

@datasage
Copy link

What happened:
I upgraded my cluster to use 2.7.3 (from 2.4.7 using the expected upgrade path) so that I could enable idle timeout. In my testing, no matter what setting i set for client_idle_timeout, It would always disable it.

I am currently using the community version of teleport. Configuration is currently set up as one instance serving as auth and proxy server. Storage is set up to use file directory.

From what I could tell, it seems that teleport auth service does not use config values after initially initialized. I dug through the cache files and cluster_configuration always had the client_idle_timeout set to 0, regardless of what the config has.

What you expected to happen:
Client should disconnect when idle timeout period is reached.

How to reproduce it (as minimally and precisely as possible):

  1. I started at version 2.4.7 and upgraded following the upgrade procedure to 2.7.3.
  2. I set the client timeout to 1m.
  3. I connected to the cluster and waited 1m
  4. Client did not disconnect.

Environment:

  • Teleport version (use teleport version): 2.7.3
  • Tsh version (use tsh version): 2.7.3
  • OS (e.g. from /etc/os-release): Amazon Linux AMI 2018.03

Relevant Debug Logs If Applicable
I ran with debug and never saw the debug output from the client idle checks. It would appear that the config is getting read as 0.

@datasage datasage changed the title client_idle_timeout does not seem to work. client_idle_timeout does not work Aug 12, 2018
@klizhentas
Copy link
Contributor

klizhentas commented Aug 12, 2018

if you change the settings in teleport.yaml config file, you have to restart or reload the server in order for the changes to take effect.

@datasage
Copy link
Author

@klizhentas I have done that plenty of times. I've also disabled it, and run it manually with --debug on to see if I could get more information.

@klizhentas
Copy link
Contributor

ok, we will take a look. thanks for your bug report

@klizhentas
Copy link
Contributor

@datasage I've looked into it and could not reproduce. Can you paste your configuration here (removing all specifics of course) and the places where you are looking at?

@datasage
Copy link
Author

datasage commented Aug 14, 2018

Starting with the config:

teleport:
    nodename: bastion.mydomain.com
    data_dir: /var/lib/teleport
    advertise_ip: 10.0.0.1
    connection_limits:
        max_connections: 1000
        max_users: 250
    log:
        output: stderr
        severity: ERROR
    storage:
        region: us-east-1
        audit_sessions_uri: s3://my-session-bucket/
    ciphers:
      - aes128-ctr
      - aes192-ctr
      - aes256-ctr
      - aes128-gcm@openssh.com
    kex_algos:
      - curve25519-sha256@libssh.org
      - ecdh-sha2-nistp256
      - ecdh-sha2-nistp384
      - ecdh-sha2-nistp521
    mac_algos:
      - hmac-sha2-256-etm@openssh.com
      - hmac-sha2-256

auth_service:
    enabled: yes
    client_idle_timeout: 15m
    disconnect_expired_cert: yes
    authentication:
        type: local
        second_factor: otp
    listen_addr: 0.0.0.0:3025
    tokens:
        - "node:xxxxxxxxxxxxxxxxxxxx"
    
    cluster_name: "production"

ssh_service:
    enabled: yes
    listen_addr: 0.0.0.0:3022
    labels:
        role: my-role
        type: my-type
    commands:
    - name: awsid
      command: [curl, "http://169.254.169.254/latest/meta-data/instance-id"]
      period: 1h0m0s
    - name: version
      command: [/usr/local/bin/teleport, "version"]
      period: 1h0m0s

proxy_service:
    enabled: yes
    listen_addr: 0.0.0.0:3023
    tunnel_listen_addr: 0.0.0.0:3024
    web_listen_addr: 0.0.0.0:3080
    https_key_file: /etc/teleport/teleport.key
    https_cert_file: /etc/teleport/teleport.crt

I updated the session storage setting to s3 and that worked right away. Are auth service settings initialized once and then stored in the cluster state? The code I looked at seem to indicate that.

Cluster config from cache file (cache/auth/cluster_configuration):

{"kind":"cluster_config","version":"v3","metadata":{"name":"cluster-config"},"spec":{"session_recording":"node","cluster_id":"cluster-uuid","proxy_checks_host_keys":"yes","audit":{"region":"us-east-1","audit_sessions_uri":"s3://my-session-bucket/"},"client_idle_timeout":"0s","disconnect_expired_cert":false}}

@datasage
Copy link
Author

I set up the cluster originally with 2.3.x so it uses the boltdb backend by default. I was able to find a way to read that db. This does show the correct values.

{"kind":"cluster_config","version":"v3","metadata":{"name":"cluster-config"},"spec":{"session_recording":"node","cluster_id":"cluster-uuid","proxy_checks_host_keys":"yes","audit":{"region":"us-east-1","audit_sessions_uri":"s3://my-session-bucket/"},"client_idle_timeout":"15m0s","disconnect_expired_cert":true}}

I am not sure why cache file is showing a different value or which value is used by the system to make a determination for terminating the idle session. I've never seen any if the idle session entries in the debug log so I would assume the value being used on a given connection is 0.

@klizhentas
Copy link
Contributor

cache/node or cache/proxy will be using this setting, not cache/auth, can you take a look there as well? Also you may want to try to set it to 30seconds and quickly see if it works.

@datasage
Copy link
Author

The state is the same for both node and proxy. I have tried a low timeout, 60 seconds in my case, and it did not disconnect the client.

I recently changed the s3 storage location and that updated in the cache, but the idle timeout and client expiration settings did not.

klizhentas added a commit that referenced this issue Aug 14, 2018
This commit fixes #2166
klizhentas added a commit that referenced this issue Aug 14, 2018
This PR fixes #2166, adds suite tests.
klizhentas added a commit that referenced this issue Aug 14, 2018
This commit fixes #2166
@elg0ch0
Copy link

elg0ch0 commented Oct 22, 2018

Hi @datasage, is your issue solved?

I'm having issues with client_idle_timeout too but the behavior is a little bit different (I'm using TSH), since it's an issue with idle_timeout might them be related?:

  1. If client_idle_timeout < 5m -> timeout works as expected
  2. if client_idle_timeout > 5m -> the shell becomes unresponsive after 5m idle and by the time when I type anything it gets disconnected a few seconds later.

Regards,

@datasage
Copy link
Author

My issues have been solved, but i primarily use the Web UI.

This sounds like an idle connection issue to me. A router, or firewall is dropping idle connections after 5 minutes.

@elg0ch0
Copy link

elg0ch0 commented Oct 22, 2018

I just found that it might be related with tsh version, I tried v2.5.6 and it worked properly but using tsh v3.0.1 didn't.
I'll use it as a workaround (Teleport server running v3.0.1 and tsh v2.5.6)

Thank you anyway!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants