Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sonic-cfggen fails to connect to /var/run/redis/redis.sock #5277

Closed
akokhan opened this issue Aug 31, 2020 · 6 comments · Fixed by #5289
Closed

sonic-cfggen fails to connect to /var/run/redis/redis.sock #5277

akokhan opened this issue Aug 31, 2020 · 6 comments · Fixed by #5289
Assignees

Comments

@akokhan
Copy link
Contributor

akokhan commented Aug 31, 2020

Description
The sonic-cfggen utility fails to connect to /var/run/redis/redis.sock:

sudo systemctl restart bgp

admin@sonic:~$ sudo cat /var/log/syslog | grep -B 30 -A 5 /var/run/redis/redis.sock
Aug 20 10:55:05.963130 sonic NOTICE admin: Stopped bgp service...
Aug 20 10:55:05.967556 sonic INFO systemd[1]: bgp.service: Succeeded.
Aug 20 10:55:05.969013 sonic INFO systemd[1]: Stopped BGP container.
Aug 20 10:55:05.972528 sonic INFO systemd[1]: Starting BGP container...
Aug 20 10:55:05.979468 sonic NOTICE admin: Starting bgp service...
Aug 20 10:55:06.231106 sonic NOTICE admin: Warm boot flag: bgp false.
Aug 20 10:55:06.238290 sonic NOTICE admin: Fast boot flag: bgp .
Aug 20 10:55:06.734560 sonic INFO bgp.sh[28088]: Traceback (most recent call last):
Aug 20 10:55:06.734827 sonic INFO bgp.sh[28088]:   File "/usr/local/bin/sonic-cfggen", line 416, in <module>
Aug 20 10:55:06.735145 sonic INFO bgp.sh[28088]:     main()
Aug 20 10:55:06.735326 sonic INFO bgp.sh[28088]:   File "/usr/local/bin/sonic-cfggen", line 343, in main
Aug 20 10:55:06.735498 sonic INFO bgp.sh[28088]:     configdb.connect()
Aug 20 10:55:06.735671 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/swsssdk/configdb.py", line 74, in connect
Aug 20 10:55:06.735854 sonic INFO bgp.sh[28088]:     self.db_connect('CONFIG_DB', wait_for_init, retry_on)
Aug 20 10:55:06.736030 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/swsssdk/configdb.py", line 69, in db_connect
Aug 20 10:55:06.736202 sonic INFO bgp.sh[28088]:     SonicV2Connector.connect(self, self.db_name, retry_on)
Aug 20 10:55:06.736377 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/swsssdk/dbconnector.py", line 250, in connect
Aug 20 10:55:06.736549 sonic INFO bgp.sh[28088]:     self.dbintf.connect(db_id, retry_on)
Aug 20 10:55:06.736778 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/swsssdk/interface.py", line 171, in connect
Aug 20 10:55:06.736941 sonic INFO bgp.sh[28088]:     self._onetime_connect(db_id)
Aug 20 10:55:06.737094 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/swsssdk/interface.py", line 183, in _onetime_connect
Aug 20 10:55:06.737249 sonic INFO bgp.sh[28088]:     client.config_set('notify-keyspace-events', self.KEYSPACE_EVENTS)
Aug 20 10:55:06.737408 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/redis/client.py", line 1243, in config_set
Aug 20 10:55:06.737562 sonic INFO bgp.sh[28088]:     return self.execute_command('CONFIG SET', name, value)
Aug 20 10:55:06.737714 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/redis/client.py", line 898, in execute_command
Aug 20 10:55:06.737872 sonic INFO bgp.sh[28088]:     conn = self.connection or pool.get_connection(command_name, **options)
Aug 20 10:55:06.738027 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 1192, in get_connection
Aug 20 10:55:06.738180 sonic INFO bgp.sh[28088]:     connection.connect()
Aug 20 10:55:06.738334 sonic INFO bgp.sh[28088]:   File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 563, in connect
Aug 20 10:55:06.738487 sonic INFO bgp.sh[28088]:     raise ConnectionError(self._error_message(e))
Aug 20 10:55:06.738642 sonic INFO bgp.sh[28088]: redis.exceptions.ConnectionError: Error 13 connecting to unix socket: /var/run/redis/redis.sock. Permission denied.
Aug 20 10:55:06.836439 sonic INFO bgp.sh[28088]: Removing obsolete bgp container with HWSKU montara
Aug 20 10:55:06.902063 sonic INFO bgp.sh[28088]: bgp
Aug 20 10:55:06.905879 sonic INFO bgp.sh[28088]: Creating new bgp container with HWSKU
Aug 20 10:55:07.085741 sonic INFO bgp.sh[28088]: a9f9487d078ec6ae0d3ca4c793bcfa41975d0c50a7decf48331f50a162d0e438
Aug 20 10:55:07.203221 sonic INFO containerd[505]: time="2020-08-20T10:55:07.202160280Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/a9f9487d078ec6ae0d3ca4c793bcfa41975d0c50a7decf48331f50a162d0e438/shim.sock" debug=false pid=28176
admin@sonic:~$ 

The issue flow is:

bgp.service ->  /usr/local/bin/bgp.sh ->  /usr/bin/bgp.sh ->  start() ->   HWSKU=${HWSKU:-`$SONIC_CFGGEN -d -v 'DEVICE_METADATA["localhost"]["hwsku"]'`}

Probably because User=admin in bgp.service:

[Service]
User=admin
ExecStartPre=/usr/local/bin/bgp.sh start

Steps to reproduce the issue:

  1. sudo systemctl restart bgp
  2. sudo cat /var/log/syslog | grep -B 30 -A 5 /var/run/redis/redis.sock

Describe the results you received:
The sonic-cfggen utility fails to connect to /var/run/redis/redis.sock.
The errors in logs are generated.

Describe the results you expected:
No errors in logs

Additional information you deem important (e.g. issue happens only occasionally):
Environment:

SONiC Software Version: SONiC.HEAD.740-dirty-20200813.174118
Distribution: Debian 10.5
Kernel: 4.19.0-9-2-amd64
Build commit: 9c22d19b

Platform: x86_64-arista_7170_64c
HwSKU: Arista-7170-64C
ASIC: barefoot

The issue has been introduced by https://github.com/Azure/sonic-buildimage/pull/4941/files?diff=unified&w=1#diff-2901ad8ea8e7ba16059aa09588944b30L300

The following PR may cause the same issue in SONiC 201911: #5200

@tahmed-dev
Copy link
Contributor

Thanks @akokhan for creating this issue. The issue was eclipsed on my end as I had /etc/sonic/sonic-environment present. In this case, no call is made to sonic-cfggen when creating/starting services. I've put out PR:5289 with suggested solution to this issue. @lguohan, @qiluo-msft can you please have a look?

@anshuv-mfst
Copy link

@lguohan to follow-up with Tamer.

@lguohan
Copy link
Collaborator

lguohan commented Sep 2, 2020

why the kvm test is not failing if the service cannot be started? @tahmed-dev

@tahmed-dev
Copy link
Contributor

why the kvm test is not failing if the service cannot be started? @tahmed-dev

@lguohan it is not clear to me why it did not fail on kvm. Based on my finding from real DUT, sonic-cfggen does run under the same user as defined in the .service file. Here is a log from an instrumented sonic-cfggen and it does see same user id as defined in the service unit file.

Sep  2 21:26:58.736051 str-s6000-acs-14 INFO sonic-cfggen: [bgp]user id: admin with commands: ['/usr/local/bin/sonic-cfggen', '-H', '-v', 'DEVICE_METADATA.localhost.platform']
Sep  2 21:26:59.912399 str-s6000-acs-14 INFO sonic-cfggen: [rsyslog-config]user id: root with commands: ['/usr/local/bin/sonic-cfggen', '-d', '-t', '/usr/share/sonic/templates/rsyslog.conf.j2', '-a', '{"udp_server_ip": "127.0.0.1"}']
Sep  2 21:27:00.017489 str-s6000-acs-14 INFO sonic-cfggen: [pmon]user id: admin with commands: ['/usr/local/bin/sonic-cfggen', '-d', '-v', 'DEVICE_METADATA["localhost"]["hwsku"]']
Sep  2 21:27:01.314639 str-s6000-acs-14 INFO sonic-cfggen: [bgp]user id: admin with commands: ['/usr/local/bin/sonic-cfggen', '-d', '-v', 'DEVICE_METADATA["localhost"]["hwsku"]']
Sep  2 21:27:10.246820 str-s6000-acs-14 INFO sonic-cfggen: [swss]user id: root with commands: ['/usr/local/bin/sonic-cfggen', '-H', '-v', 'DEVICE_METADATA.localhost.platform']
Sep  2 21:27:11.426660 str-s6000-acs-14 INFO sonic-cfggen: [swss]user id: root with commands: ['/usr/local/bin/sonic-cfggen', '-d', '-v', 'DEVICE_METADATA["localhost"]["hwsku"]']
Sep  2 21:28:20.156303 str-s6000-acs-14 INFO sonic-cfggen: [ntp]user id: root with commands: ['/usr/local/bin/sonic-cfggen', '-d', '-v', 'MGMT_VRF_CONFIG["vrf_global"]["mgmtVrfEnabled"]']
Sep  2 21:28:24.007019 str-s6000-acs-14 INFO sonic-cfggen: [database]user id: root with commands: ['/usr/local/bin/sonic-cfggen', '-H', '-v', 'DEVICE_METADATA.localhost.platform']
Sep  2 21:28:29.943282 str-s6000-acs-14 INFO sonic-cfggen: [platform-modules-s6000]user id: root with commands: ['/usr/local/bin/sonic-cfggen', '-H', '-v', 'DEVICE_METADATA.localhost.platform']
Sep  2 21:28:35.329346 str-s6000-acs-14 INFO sonic-cfggen: [database]user id: root with commands: ['/usr/local/bin/sonic-cfggen', '-j', '/etc/sonic/init_cfg.json', '-j', '/etc/sonic/config_db.json', '--write-to-db']
Sep  2 21:28:37.769355 str-s6000-acs-14 INFO sonic-cfggen: [config-setup]user id: root with commands: ['/usr/local/bin/sonic-cfggen', '-H', '-v', 'DEVICE_METADATA.localhost.platform']
Sep  2 21:28:40.800758 str-s6000-acs-14 INFO sonic-cfggen: [rsyslog-config]user id: root with commands: ['/usr/local/bin/sonic-cfggen', '-H', '-v', 'DEVICE_METADATA.localhost.platform']
Sep  2 21:28:40.882057 str-s6000-acs-14 INFO sonic-cfggen: [pmon]user id: admin with commands: ['/usr/local/bin/sonic-cfggen', '-H', '-v', 'DEVICE_METADATA.localhost.platform']
Sep  2 21:28:40.952449 str-s6000-acs-14 INFO sonic-cfggen: [hostname-config]user id: root with commands: ['/usr/local/bin/sonic-cfggen', '-d', '-v', "DEVICE_METADATA['localhost']['hostname']"]
Sep  2 21:28:43.382098 str-s6000-acs-14 INFO sonic-cfggen: [interfaces-config]user id: root with commands: ['/usr/local/bin/sonic-cfggen', '-d', '-j', '/tmp/ztp_input.json', '-t', '/usr/share/sonic/templates/interfaces.j2,/etc/network/interfaces', '-t', '/usr/share/sonic/templates/90-dhcp6-systcl.conf.j2,/etc/sysc

I'll dig a bit more on why kvm does not catch this.

@tahmed-dev
Copy link
Contributor

Reopen in order to track why kvm did not show this issue

@tahmed-dev
Copy link
Contributor

This issue is fixed. KVM issue will be tracked via #5323

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants