Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monitor setup failed: both scp : STDERR: Pseudo-terminal will not be allocated because stdin is not a terminal. & rsync failed: ssh_exchange_identification: Connection closed by remote host #1631

Closed
1 of 3 tasks
amoskong opened this issue Dec 27, 2019 · 2 comments

Comments

@amoskong
Copy link
Contributor

amoskong commented Dec 27, 2019

Prerequisites

  • Are you rebased to master ?
  • Is it reproducible ?
  • Did you perform a cursory search if this issue isn't opened ?

Versions

  • SCT: branch-2019.1

Logs

Description

monitor setup failed.

Both scp and rsync failed

  • scp : STDERR: Pseudo-terminal will not be allocated because stdin is not a terminal.
  • rsync failed: ssh_exchange_identification: Connection closed by remote host

We need to address the problem of ssh err. Retry might solve the problem, but not a good solution.

Steps to Reproduce

Run upgrade job with branch-2019.1, problem not occurred all the time.

Expected behavior: [What you expected to happen]

monitor setup succeeded.

Actual behavior: [What actually happened]

< t:2019-12-27 09:09:59,306 f:common.py       l:139  c:utils                p:DEBUG > END: install_scylla_monitoring <MonitorSetGCE> (ran 102.018696s)
< t:2019-12-27 09:09:59,542 f:connectionpool.py l:393  c:urllib3.connectionpool p:DEBUG > https://www.googleapis.com:443 "GET /compute/v1/projects/skilled-adapter-452/zones/
us-east1-b/instances/rolling-upgrade-upgrade--centos-db-node-e477c721-0-1 HTTP/1.1" 200 None
< t:2019-12-27 09:09:59,874 f:connectionpool.py l:393  c:urllib3.connectionpool p:DEBUG > https://www.googleapis.com:443 "GET /compute/v1/projects/skilled-adapter-452/aggreg
ated/disks?maxResults=500 HTTP/1.1" 200 None
< t:2019-12-27 09:10:00,069 f:connectionpool.py l:393  c:urllib3.connectionpool p:DEBUG > https://www.googleapis.com:443 "GET /compute/v1/projects/skilled-adapter-452/zones/
us-east1-b/instances/rolling-upgrade-upgrade--centos-db-node-e477c721-0-2 HTTP/1.1" 200 None
< t:2019-12-27 09:10:00,301 f:connectionpool.py l:393  c:urllib3.connectionpool p:DEBUG > https://www.googleapis.com:443 "GET /compute/v1/projects/skilled-adapter-452/aggreg
ated/disks?maxResults=500 HTTP/1.1" 200 None
< t:2019-12-27 09:10:00,522 f:connectionpool.py l:393  c:urllib3.connectionpool p:DEBUG > https://www.googleapis.com:443 "GET /compute/v1/projects/skilled-adapter-452/zones/
us-east1-b/instances/rolling-upgrade-upgrade--centos-db-node-e477c721-0-3 HTTP/1.1" 200 None
< t:2019-12-27 09:10:00,764 f:connectionpool.py l:393  c:urllib3.connectionpool p:DEBUG > https://www.googleapis.com:443 "GET /compute/v1/projects/skilled-adapter-452/aggreg
ated/disks?maxResults=500 HTTP/1.1" 200 None
< t:2019-12-27 09:10:01,004 f:cluster.py      l:1281 c:sdcm.cluster         p:DEBUG > 019-12-27T09:09:50+00:00  rolling-upgrade-upgrade--centos-db-node-e477c721-0-1 !NOTICE 
 | sudo: scylla-test : TTY=unknown ; PWD=/home/scylla-test ; USER=root ; COMMAND=/bin/coredumpctl --no-pager --no-legend
< t:2019-12-27 09:10:01,004 f:cluster.py      l:1281 c:sdcm.cluster         p:DEBUG > 2019-12-27T09:09:50+00:00  rolling-upgrade-upgrade--centos-db-node-e477c721-0-1 !INFO  
  | sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
< t:2019-12-27 09:10:01,004 f:cluster.py      l:1281 c:sdcm.cluster         p:DEBUG > 2019-12-27T09:09:50+00:00  rolling-upgrade-upgrade--centos-db-node-e477c721-0-1 !INFO  
  | sudo: pam_unix(sudo:session): session closed for user root
< t:2019-12-27 09:10:01,016 f:connectionpool.py l:393  c:urllib3.connectionpool p:DEBUG > https://www.googleapis.com:443 "GET /compute/v1/projects/skilled-adapter-452/zones/
us-east1-b/instances/rolling-upgrade-upgrade--centos-db-node-e477c721-0-4 HTTP/1.1" 200 None
< t:2019-12-27 09:10:01,226 f:connectionpool.py l:393  c:urllib3.connectionpool p:DEBUG > https://www.googleapis.com:443 "GET /compute/v1/projects/skilled-adapter-452/aggreg
ated/disks?maxResults=500 HTTP/1.1" 200 None
< t:2019-12-27 09:10:01,238 f:remote.py       l:319  c:sdcm.remote          p:DEBUG > RemoteCmdRunner [scylla-test@10.142.0.78]: Receive files (src) /home/scylla-test/scylla
-monitoring-branch-3.0/prometheus/prometheus.yml.template -> (dst) /tmp/tmpsapDTR/prometheus.yml.template.orig


< t:2019-12-27 09:10:01,238 f:remote.py       l:237  c:sdcm.remote          p:DEBUG > RemoteCmdRunner [scylla-test@10.142.0.78]: Running command "rsync --version"...
< t:2019-12-27 09:10:01,264 f:remote.py       l:697  c:sdcm.remote          p:INFO  > RemoteCmdRunner [scylla-test@10.142.0.78]: rsync  version 3.1.2  protocol version 31
< t:2019-12-27 09:10:01,265 f:remote.py       l:697  c:sdcm.remote          p:INFO  > RemoteCmdRunner [scylla-test@10.142.0.78]: Copyright (C) 1996-2015 by Andrew Tridgell, Wayne Davison, and others.
< t:2019-12-27 09:10:01,265 f:remote.py       l:697  c:sdcm.remote          p:INFO  > RemoteCmdRunner [scylla-test@10.142.0.78]: Web site: http://rsync.samba.org/
< t:2019-12-27 09:10:01,266 f:remote.py       l:697  c:sdcm.remote          p:INFO  > RemoteCmdRunner [scylla-test@10.142.0.78]: Capabilities:
< t:2019-12-27 09:10:01,266 f:remote.py       l:697  c:sdcm.remote          p:INFO  > RemoteCmdRunner [scylla-test@10.142.0.78]:     64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
< t:2019-12-27 09:10:01,266 f:remote.py       l:697  c:sdcm.remote          p:INFO  > RemoteCmdRunner [scylla-test@10.142.0.78]:     socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
< t:2019-12-27 09:10:01,266 f:remote.py       l:697  c:sdcm.remote          p:INFO  > RemoteCmdRunner [scylla-test@10.142.0.78]:     append, ACLs, xattrs, iconv, symtimes, prealloc
< t:2019-12-27 09:10:01,266 f:remote.py       l:697  c:sdcm.remote          p:INFO  > RemoteCmdRunner [scylla-test@10.142.0.78]: 
< t:2019-12-27 09:10:01,266 f:remote.py       l:697  c:sdcm.remote          p:INFO  > RemoteCmdRunner [scylla-test@10.142.0.78]: rsync comes with ABSOLUTELY NO WARRANTY.  This is free software, and you
< t:2019-12-27 09:10:01,266 f:remote.py       l:697  c:sdcm.remote          p:INFO  > RemoteCmdRunner [scylla-test@10.142.0.78]: are welcome to redistribute it under certain conditions.  See the GNU
< t:2019-12-27 09:10:01,266 f:remote.py       l:697  c:sdcm.remote          p:INFO  > RemoteCmdRunner [scylla-test@10.142.0.78]: General Public Licence for details.
< t:2019-12-27 09:10:01,268 f:remote.py       l:122  c:sdcm.remote          p:INFO  > RemoteCmdRunner [scylla-test@10.142.0.78]: Command "rsync --version" finished with status 0
< t:2019-12-27 09:10:01,270 f:config.py       l:273  c:fabric               p:DEBUG > File not found, skipping
< t:2019-12-27 09:10:01,270 f:config.py       l:271  c:fabric               p:DEBUG > Loaded 2 new ssh_config rules from '/etc/ssh/ssh_config'
< t:2019-12-27 09:10:01,273 f:remote.py       l:148  c:sdcm.remote          p:DEBUG > LocalCmdRunner [jenkins@public-jenkins-builder4-qavpc]: Running command "which ssh"...
< t:2019-12-27 09:10:01,308 f:remote.py       l:122  c:sdcm.remote          p:INFO  > LocalCmdRunner [jenkins@public-jenkins-builder4-qavpc]: Command "which ssh" finished with status 0
< t:2019-12-27 09:10:01,309 f:config.py       l:273  c:fabric               p:DEBUG > File not found, skipping
< t:2019-12-27 09:10:01,310 f:config.py       l:271  c:fabric               p:DEBUG > Loaded 2 new ssh_config rules from '/etc/ssh/ssh_config'
< t:2019-12-27 09:10:01,313 f:remote.py       l:148  c:sdcm.remote          p:DEBUG > LocalCmdRunner [jenkins@public-jenkins-builder4-qavpc]: Running command "rsync -L  --timeout=300 --rsh='/usr/bin/ssh -t -a -x  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/tmp5GO7UC -o BatchMode=yes -o ConnectTimeout=300 -o ServerAliveInterval=300 -l scylla-test -p 22 -i /jenkins/.ssh/scylla-test' -az scylla-test@[10.142.0.78]:"/home/scylla-test/scylla-monitoring-branch-3.0/prometheus/prometheus.yml.template" /tmp/tmpsapDTR/prometheus.yml.template.orig"...
< t:2019-12-27 09:10:01,356 f:remote.py       l:126  c:sdcm.remote          p:ERROR > LocalCmdRunner [jenkins@public-jenkins-builder4-qavpc]: Error executing command: "rsync -L  --timeout=300 --rsh='/usr/bin/ssh -t -a -x  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/tmp5GO7UC -o BatchMode=yes -o ConnectTimeout=300 -o ServerAliveInterval=300 -l scylla-test -p 22 -i /jenkins/.ssh/scylla-test' -az scylla-test@[10.142.0.78]:"/home/scylla-test/scylla-monitoring-branch-3.0/prometheus/prometheus.yml.template" /tmp/tmpsapDTR/prometheus.yml.template.orig"; Exit status: 255
< t:2019-12-27 09:10:01,357 f:remote.py       l:130  c:sdcm.remote          p:DEBUG > LocalCmdRunner [jenkins@public-jenkins-builder4-qavpc]: STDERR: Pseudo-terminal will not be allocated because stdin is not a terminal.
< t:2019-12-27 09:10:01,357 f:remote.py       l:130  c:sdcm.remote          p:DEBUG > ssh_exchange_identification: Connection closed by remote host
< t:2019-12-27 09:10:01,357 f:remote.py       l:130  c:sdcm.remote          p:DEBUG > rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
< t:2019-12-27 09:10:01,357 f:remote.py       l:130  c:sdcm.remote          p:DEBUG > rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.2]
< t:2019-12-27 09:10:01,357 f:remote.py       l:339  c:sdcm.remote          p:WARNING > RemoteCmdRunner [scylla-test@10.142.0.78]: Trying scp, rsync failed: Encountered a bad command exit code!
< t:2019-12-27 09:10:01,357 f:remote.py       l:339  c:sdcm.remote          p:WARNING > 
< t:2019-12-27 09:10:01,357 f:remote.py       l:339  c:sdcm.remote          p:WARNING > Command: u'rsync -L  --timeout=300 --rsh=\'/usr/bin/ssh -t -a -x  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/tmp5GO7UC -o BatchMode=yes -o ConnectTimeout=300 -o ServerAliveInterval=300 -l scylla-test -p 22 -i /jenkins/.ssh/scylla-test\' -az scylla-test@[10.142.0.78]:"/home/scylla-test/scylla-monitoring-branch-3.0/prometheus/prometheus.yml.template" /tmp/tmpsapDTR/prometheus.yml.template.orig'
< t:2019-12-27 09:10:01,357 f:remote.py       l:339  c:sdcm.remote          p:WARNING > 
< t:2019-12-27 09:10:01,357 f:remote.py       l:339  c:sdcm.remote          p:WARNING > Exit code: 255
< t:2019-12-27 09:10:01,357 f:remote.py       l:339  c:sdcm.remote          p:WARNING > 
< t:2019-12-27 09:10:01,357 f:remote.py       l:339  c:sdcm.remote          p:WARNING > Stdout:
< t:2019-12-27 09:10:01,357 f:remote.py       l:339  c:sdcm.remote          p:WARNING > 
< t:2019-12-27 09:10:01,357 f:remote.py       l:339  c:sdcm.remote          p:WARNING > 
< t:2019-12-27 09:10:01,357 f:remote.py       l:339  c:sdcm.remote          p:WARNING > 
< t:2019-12-27 09:10:01,357 f:remote.py       l:339  c:sdcm.remote          p:WARNING > Stderr:
< t:2019-12-27 09:10:01,357 f:remote.py       l:339  c:sdcm.remote          p:WARNING > 
< t:2019-12-27 09:10:01,357 f:remote.py       l:339  c:sdcm.remote          p:WARNING > Pseudo-terminal will not be allocated because stdin is not a terminal.
< t:2019-12-27 09:10:01,357 f:remote.py       l:339  c:sdcm.remote          p:WARNING > ssh_exchange_identification: Connection closed by remote host
< t:2019-12-27 09:10:01,357 f:remote.py       l:339  c:sdcm.remote          p:WARNING > rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
< t:2019-12-27 09:10:01,357 f:remote.py       l:339  c:sdcm.remote          p:WARNING > rsync error: unexplained error (code 255) at io.c(226) [Receiver=3.1.2]
< t:2019-12-27 09:10:01,357 f:remote.py       l:339  c:sdcm.remote          p:WARNING > 
< t:2019-12-27 09:10:01,359 f:config.py       l:273  c:fabric               p:DEBUG > File not found, skipping
< t:2019-12-27 09:10:01,359 f:config.py       l:271  c:fabric               p:DEBUG > Loaded 2 new ssh_config rules from '/etc/ssh/ssh_config'
< t:2019-12-27 09:10:01,362 f:remote.py       l:148  c:sdcm.remote          p:DEBUG > LocalCmdRunner [jenkins@public-jenkins-builder4-qavpc]: Running command "scp -r -o StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=300 -o ServerAliveInterval=300 -o UserKnownHostsFile=/tmp/tmp5GO7UC -P 22 -i /jenkins/.ssh/scylla-test scylla-test@[10.142.0.78]:"/home/scylla-test/scylla-monitoring-branch-3.0/prometheus/prometheus.yml.template" '/tmp/tmpsapDTR/prometheus.yml.template.orig'"...
< t:2019-12-27 09:10:01,405 f:remote.py       l:126  c:sdcm.remote          p:ERROR > LocalCmdRunner [jenkins@public-jenkins-builder4-qavpc]: Error executing command: "scp -r -o StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=300 -o ServerAliveInterval=300 -o UserKnownHostsFile=/tmp/tmp5GO7UC -P 22 -i /jenkins/.ssh/scylla-test scylla-test@[10.142.0.78]:"/home/scylla-test/scylla-monitoring-branch-3.0/prometheus/prometheus.yml.template" '/tmp/tmpsapDTR/prometheus.yml.template.orig'"; Exit status: 1
< t:2019-12-27 09:10:01,405 f:remote.py       l:130  c:sdcm.remote          p:DEBUG > LocalCmdRunner [jenkins@public-jenkins-builder4-qavpc]: STDERR: ssh_exchange_identification: Connection closed by remote host

....

< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR > Traceback (most recent call last):
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >   File "/sct/sdcm/cluster.py", line 2537, in node_setup
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >     cl_inst.node_setup(node, **setup_kwargs)
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >   File "/sct/sdcm/cluster.py", line 3530, in node_setup
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >     self.configure_scylla_monitoring(node)
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >   File "/sct/sdcm/cluster.py", line 3657, in configure_scylla_monitoring
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >     dst=local_template_tmp)
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >   File "/sct/sdcm/remote.py", line 356, in receive_files
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >     result = LocalCmdRunner().run(scp)
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >   File "/sct/sdcm/remote.py", line 155, in run
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >     env=os.environ, replace_env=True)
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >   File "/usr/lib/python2.7/site-packages/fabric/connection.py", line 748, in local
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >     return super(Connection, self).run(*args, **kwargs)
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >   File "/usr/lib/python2.7/site-packages/invoke/context.py", line 94, in run
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >     return self._run(runner, command, **kwargs)
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >   File "/usr/lib/python2.7/site-packages/invoke/context.py", line 101, in _run
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >     return runner.run(command, **kwargs)
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >   File "/usr/lib/python2.7/site-packages/invoke/runners.py", line 291, in run
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >     return self._run_body(command, **kwargs)
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >   File "/usr/lib/python2.7/site-packages/invoke/runners.py", line 442, in _run_body
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR >     raise UnexpectedExit(result)
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR > UnexpectedExit: Encountered a bad command exit code!
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR > 
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR > Command: u'scp -r -o StrictHostKeyChecking=no -o BatchMode=yes -o ConnectTimeout=300 -o
 ServerAliveInterval=300 -o UserKnownHostsFile=/tmp/tmp5GO7UC -P 22 -i /jenkins/.ssh/scylla-test scylla-test@[10.142.0.78]:"/home/scylla-test/scylla-monitoring-branch-3.0/pr
ometheus/prometheus.yml.template" \'/tmp/tmpsapDTR/prometheus.yml.template.orig\''
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR > 
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR > Exit code: 1
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR > 
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR > Stdout:
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR > 
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR > 
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR > 
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR > Stderr:
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR > 
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR > ssh_exchange_identification: Connection closed by remote host
< t:2019-12-27 09:10:02,300 f:cluster.py      l:2539 c:sdcm.cluster         p:ERROR > 

/CC @bentsi @aleksbykov @roydahan

@bentsi
Copy link
Contributor

bentsi commented Dec 29, 2019

@amoskong you are tagging the wrong Alex ;)

@amoskong
Copy link
Contributor Author

@amoskong you are tagging the wrong Alex ;)

Sorry, I made same mistake for many times.

@aleksbykov

bentsi pushed a commit to bentsi/scylla-cluster-tests that referenced this issue Feb 18, 2020
We can safely retry the command when it didn't run on remote.
This situation can happen when SSH/channel connection was not
successfully initiated.
Related issues: scylladb#1793, scylladb#1631, scylladb#1815
bentsi pushed a commit to bentsi/scylla-cluster-tests that referenced this issue Feb 19, 2020
We can safely retry the command when it didn't run on remote.
This situation can happen when SSH/channel connection was not
successfully initiated.
Related issues: scylladb#1793, scylladb#1631, scylladb#1815
bentsi pushed a commit that referenced this issue Feb 20, 2020
We can safely retry the command when it didn't run on remote.
This situation can happen when SSH/channel connection was not
successfully initiated.
Related issues: #1793, #1631, #1815
@bentsi bentsi closed this as completed Feb 23, 2020
bentsi pushed a commit that referenced this issue Feb 24, 2020
We can safely retry the command when it didn't run on remote.
This situation can happen when SSH/channel connection was not
successfully initiated.
Related issues: #1793, #1631, #1815

(cherry picked from commit 5503f25)
amoskong pushed a commit that referenced this issue Feb 28, 2020
We can safely retry the command when it didn't run on remote.
This situation can happen when SSH/channel connection was not
successfully initiated.
Related issues: #1793, #1631, #1815

(cherry picked from commit 5503f25)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants