Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[all linecards] platform_tests/test_reload_config.py::test_reload_configuration_checks failure #91

Closed
wenyiz2021 opened this issue Jun 16, 2023 · 5 comments

Comments

@wenyiz2021
Copy link

hi @patrickmacarthur @Staphylo @kenneth-arista

it failed to issue 'Retry later' message immediately after a reboot happened on the dut.
failure point:
https://github.com/sonic-net/sonic-mgmt/blob/0d6fedb76aa3b95ce5b6cbd44c528b0dd7ffbfcd/tests/platform_tests/test_reload_config.py#L102

output of shell cmd on Arista card:

(Pdb) out
{'stderr_lines': [], u'cmd': u'sudo config reload -y', u'end': u'2023-06-16 23:21:19.811575', '_ansible_no_log': False, u'stdout': u'Disabling container monitoring ...\nStopping SONiC target ...\nRunning command: /usr/local/bin/sonic-cfggen  -j /etc/sonic/init_cfg.json  -j /etc/sonic/config_db.json  --write-to-db\nRunning command: /usr/local/bin/db_migrator.py -o migrate\nRunning command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment\nRestarting SONiC target ...\nEnabling container monitoring ...\nReloading Monit configuration ...\nReinitializing monit daemon', u'changed': True, u'rc': 0, u'start': u'2023-06-16 23:20:45.459575', u'stderr': u'', u'delta': u'0:00:34.352000', u'invocation': {u'module_args': {u'creates': None, u'executable': u'/bin/bash', u'_uses_shell': True, u'strip_empty_ends': True, u'_raw_params': u'sudo config reload -y', u'removes': None, u'argv': None, u'warn': True, u'chdir': None, u'stdin_add_newline': True, u'stdin': None}}, 'stdout_lines': [u'Disabling container monitoring ...', u'Stopping SONiC target ...', u'Running command: /usr/local/bin/sonic-cfggen  -j /etc/sonic/init_cfg.json  -j /etc/sonic/config_db.json  --write-to-db', u'Running command: /usr/local/bin/db_migrator.py -o migrate', u'Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment', u'Restarting SONiC target ...', u'Enabling container monitoring ...', u'Reloading Monit configuration ...', u'Reinitializing monit daemon'], u'warnings': [u"Consider using 'become', 'become_method', and 'become_user' rather than running sudo"], 'failed': False}

expected:

(Pdb) out
{'stderr_lines': [], u'changed': True, u'end': u'2023-06-16 22:39:27.768113', '_ansible_no_log': False, u'stdout': u'Relevant services are not up. Retry later or use -f to avoid system checks', u'cmd': u'sudo config reload -y', u'msg': u'non-zero return code', u'rc': 1, u'start': u'2023-06-16 22:39:26.168914', u'warnings': [u"Consider using 'become', 'become_method', and 'become_user' rather than running sudo"], u'delta': u'0:00:01.599199', u'invocation': {u'module_args': {u'creates': None, u'executable': u'/bin/bash', u'_uses_shell': True, u'strip_empty_ends': True, u'_raw_params': u'sudo config reload -y', u'removes': None, u'argv': None, u'warn': True, u'chdir': None, u'stdin_add_newline': True, u'stdin': None}}, 'stdout_lines': [u'Relevant services are not up. Retry later or use -f to avoid system checks'], u'stderr': u'', 'failed': True}
(Pdb) out['stdout']
u'Relevant services are not up. Retry later or use -f to avoid system checks'

this is the case happened on all linecards -- CL2 and wolverine

@wenyiz2021
Copy link
Author

I tried change module_ignore_error to false, on terminal it'll show the cmd fail, but output of the shell cmd still say failed = false

(Pdb) out = duthost.shell("sudo config reload -y", executable="/bin/bash", module_ignore_errors=False)
Friday 16 June 2023  23:48:14 +0000 (0:00:46.822)       0:09:26.849 *********** 
*** RunAnsibleModuleFail: run module shell failed, Ansible Results =>
{"changed": true, "cmd": "sudo config reload -y", "delta": "0:00:00.448957", "end": "2023-06-16 23:48:15.978370", "failed": true, "msg": "non-zero return code", "rc": 1, "start": "2023-06-16 23:48:15.529413", "stderr": "", "stderr_lines": [], "stdout": "SwSS container is not ready. Retry later or use -f to avoid system checks", "stdout_lines": ["SwSS container is not ready. Retry later or use -f to avoid system checks"], "warnings": ["Consider using 'become', 'become_method', and 'become_user' rather than running sudo"]}
(Pdb) out
{'stderr_lines': [], u'cmd': u'sudo config reload -y', u'end': u'2023-06-16 23:48:04.137881', '_ansible_no_log': False, u'stdout': u'Disabling container monitoring ...\nStopping SONiC target ...\nRunning command: /usr/local/bin/sonic-cfggen  -j /etc/sonic/init_cfg.json  -j /etc/sonic/config_db.json  --write-to-db\nRunning command: /usr/local/bin/db_migrator.py -o migrate\nRunning command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment\nRestarting SONiC target ...\nEnabling container monitoring ...\nReloading Monit configuration ...\nReinitializing monit daemon', u'changed': True, u'rc': 0, u'start': u'2023-06-16 23:47:28.697315', u'stderr': u'', u'delta': u'0:00:35.440566', u'invocation': {u'module_args': {u'creates': None, u'executable': u'/bin/bash', u'_uses_shell': True, u'strip_empty_ends': True, u'_raw_params': u'sudo config reload -y', u'removes': None, u'argv': None, u'warn': True, u'chdir': None, u'stdin_add_newline': True, u'stdin': None}}, 'stdout_lines': [u'Disabling container monitoring ...', u'Stopping SONiC target ...', u'Running command: /usr/local/bin/sonic-cfggen  -j /etc/sonic/init_cfg.json  -j /etc/sonic/config_db.json  --write-to-db', u'Running command: /usr/local/bin/db_migrator.py -o migrate', u'Running command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment', u'Restarting SONiC target ...', u'Enabling container monitoring ...', u'Reloading Monit configuration ...', u'Reinitializing monit daemon'], u'warnings': [u"Consider using 'become', 'become_method', and 'become_user' rather than running sudo"], 'failed': False}
(Pdb) out['stdout']
u'Disabling container monitoring ...\nStopping SONiC target ...\nRunning command: /usr/local/bin/sonic-cfggen  -j /etc/sonic/init_cfg.json  -j /etc/sonic/config_db.json  --write-to-db\nRunning command: /usr/local/bin/db_migrator.py -o migrate\nRunning command: /usr/local/bin/sonic-cfggen -d -y /etc/sonic/sonic_version.yml -t /usr/share/sonic/templates/sonic-environment.j2,/etc/sonic/sonic-environment\nRestarting SONiC target ...\nEnabling container monitoring ...\nReloading Monit configuration ...\nReinitializing monit daemon'

cc @arlakshm

@wenyiz2021
Copy link
Author

expectation is:

  1. able to recognize this cmd failed to executre -> 'failed' = True
  2. able to log error message in out['stdout']
  3. even though when changing module_ignore_errors to False, we are able to catch error message, while at the same time, stdout still say fail=False which is inconsistent.

I am unsure if it's Ansible issue or hardware issue, @Staphylo @kenneth-arista can you please help to confirm?

@patrickmacarthur
Copy link
Contributor

able to recognize this cmd failed to executre -> 'failed' = True

Ansible by default only considers return code for determining if the command succeeded or failed. And from the perspective of the config command, it looks like returning success is appropriate if it didn't detect that the system still booting.

I'm looking into why the config command isn't detecting that the system is still booting.

@patrickmacarthur
Copy link
Contributor

I haven't been able to reproduce this locally but my theory is that this is being caused by (1) the switch booting up faster than the test can reach this check, so the system is already running or (2) a service fails during startup, leaving the system in degraded state, which may be overriding the started state that config reload is looking for.

If you encounter this issue again, it would be useful to run systemctl status on the DUT to rule out (2).

@wenyiz2021
Copy link
Author

this is fixed in sonic-net/sonic-mgmt#7953

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants