-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hostcfgd] hoscfgd doesn't honor CFG DB updates if they arrive in a specific time interval #8619
Labels
Comments
Vivek will fix this issue. |
5 tasks
qiluo-msft
pushed a commit
that referenced
this issue
Oct 28, 2021
…Table (#8861) #### Why I did it Fixes #8619 #### How I did it 1) Listening to CFG_DB notifications was migrated from ConfigDBConnector to SubscriberStateTable & Select 2) This change in design helped me to remove `update_all_features_config` which was roughly taking a 5-10 sec time to execute and thus the reason for blackout 3) Edited FeatureHandler, Feature & NtpCfgd classes to suit this design 4) Added corresponding mocks and UT's **Changes made to classes other than HostConfigDaemon:** With the previous design, the initially read data from the config db was applied by using hardcoded methods even before the config_db.listen() was called. For Eg: `update_all_features_config` for FeatureHandler and load() named methods for NtpCfgd etc But with this design, since the existing data is read and given out as a notification by SubscriberStateTable, i've pretty much removed these hardcoded methods. Thus changes made to these class will be around adapting them to the new design and no change in the actual functionality . #### How to verify it UT's: ``` tests/determine-reboot-cause_test.py ......... [ 29%] tests/procdockerstatsd_test.py . [ 32%] tests/caclmgrd/caclmgrd_dhcp_test.py ...... [ 51%] tests/hostcfgd/hostcfgd_radius_test.py .. [ 58%] tests/hostcfgd/hostcfgd_test.py ............. [100%] ``` Verified manually, ``` Sep 10 22:53:25.662621 sonic INFO systemd[1]: hostcfgd.service: Succeeded. Sep 10 22:55:04.127719 sonic INFO /hostcfgd: ConfigDB connect success Sep 10 22:55:04.128108 sonic INFO /hostcfgd: KdumpCfg init ... Sep 10 22:55:04.148819 sonic INFO /hostcfgd: Waiting for systemctl to finish initialization Sep 10 22:55:04.163452 sonic INFO /hostcfgd: systemctl has finished initialization -- proceeding ... Sep 10 22:55:04.163834 sonic INFO /hostcfgd: Kdump handler... Sep 10 22:55:04.164019 sonic INFO /hostcfgd: Kdump global configuration update Sep 10 22:55:04.758784 sonic INFO hostcfgd[184471]: kdump is already disabled Sep 10 22:55:04.758876 sonic INFO hostcfgd[184471]: Kdump is already disabled Sep 10 22:55:05.182021 sonic INFO hostcfgd[184511]: Kdump configuration has been updated in the startup configuration Sep 10 22:55:05.596919 sonic INFO hostcfgd[184528]: Kdump configuration has been updated in the startup configuration Sep 10 22:55:06.140627 sonic INFO /hostcfgd: Feature nat is stopped and disabled Sep 10 22:55:06.642629 sonic INFO /hostcfgd: Feature telemetry is enabled and started Sep 10 22:55:07.101297 sonic INFO /hostcfgd: Feature pmon is enabled and started Sep 10 22:55:07.554366 sonic INFO /hostcfgd: Feature database is enabled and started Sep 10 22:55:08.009329 sonic INFO /hostcfgd: Feature mgmt-framework is enabled and started Sep 10 22:55:08.394952 sonic INFO /hostcfgd: Feature macsec is stopped and disabled Sep 10 22:55:08.782853 sonic INFO /hostcfgd: Feature snmp is enabled and started Sep 10 22:55:09.205381 sonic INFO /hostcfgd: Feature teamd is enabled and started Sep 10 22:55:09.224877 sonic INFO /hostcfgd: Feature what-just-happened is enabled and started Sep 10 22:55:09.627929 sonic INFO /hostcfgd: Feature lldp is enabled and started Sep 10 22:55:10.086993 sonic INFO /hostcfgd: Feature swss is enabled and started Sep 10 22:55:10.170312 sonic INFO /hostcfgd: cmd - service aaastatsd stop Sep 10 22:55:11.012236 sonic INFO /hostcfgd: cmd - service aaastatsd stop Sep 10 22:55:12.225946 sonic INFO /hostcfgd: Feature bgp is enabled and started Sep 10 22:55:12.712792 sonic INFO /hostcfgd: Feature dhcp_relay is enabled and started Sep 10 22:55:13.166656 sonic INFO /hostcfgd: Feature sflow is stopped and disabled Sep 10 22:55:13.593639 sonic INFO /hostcfgd: Feature radv is enabled and started Sep 10 22:55:14.034106 sonic INFO /hostcfgd: Feature syncd is enabled and started Sep 10 22:55:14.113064 sonic INFO /hostcfgd: cmd - service aaastatsd stop Sep 10 22:55:14.863601 sonic INFO /hostcfgd: RADIUS_SERVER update: key: 10.10.10.1, op: SET, data: {'auth_type': 'pap', 'passkey': 'p*****', 'retransmit': '1', 'timeout': '1'} Sep 10 22:55:14.938605 sonic INFO /hostcfgd: cmd - service aaastatsd stop Sep 10 22:55:15.667545 sonic INFO /hostcfgd: RADIUS_SERVER update: key: 10.10.10.3, op: SET, data: {'auth_type': 'chap', 'passkey': 'p*****', 'retransmit': '2', 'timeout': '2'} Sep 10 22:55:15.667801 sonic INFO /hostcfgd: RADIUS (NAS) IP change - key:eth0, current global info {} Sep 10 22:55:15.746531 sonic INFO /hostcfgd: cmd - service aaastatsd stop Sep 10 23:04:47.435340 sonic INFO /hostcfgd: ntp server update key 0.debian.pool.ntp.org Sep 10 23:04:47.435661 sonic INFO /hostcfgd: ntp server update, restarting ntp-config, ntp servers configured {'0.debian.pool.ntp.org'} Sep 10 23:04:47.866394 sonic INFO /hostcfgd: NTP GLOBAL Update Sep 10 23:04:47.866557 sonic INFO /hostcfgd: ntp global update for source intf old {''} new {'eth0', 'Loopback0'}, restarting ntp-config Sep 10 23:16:25.157600 sonic INFO /hostcfgd: Running cmd: 'sudo systemctl unmask sflow.service' Sep 10 23:16:25.178472 sonic INFO hostcfgd[192106]: Removed /etc/systemd/system/sflow.service. Sep 10 23:16:25.582018 sonic INFO /hostcfgd: Running cmd: 'sudo systemctl enable sflow.service' Sep 10 23:16:25.604534 sonic INFO hostcfgd[192123]: Created symlink /etc/systemd/system/sonic.target.wants/sflow.service → /lib/systemd/system/sflow.service. Sep 10 23:16:26.029416 sonic INFO /hostcfgd: Running cmd: 'sudo systemctl start sflow.service' Sep 10 23:16:26.691927 sonic INFO /hostcfgd: Feature sflow is enabled and started ```
6 tasks
ganglyu
pushed a commit
to sonic-net/sonic-host-services
that referenced
this issue
Jul 12, 2022
…Table (#8861) #### Why I did it Fixes sonic-net/sonic-buildimage#8619 #### How I did it 1) Listening to CFG_DB notifications was migrated from ConfigDBConnector to SubscriberStateTable & Select 2) This change in design helped me to remove `update_all_features_config` which was roughly taking a 5-10 sec time to execute and thus the reason for blackout 3) Edited FeatureHandler, Feature & NtpCfgd classes to suit this design 4) Added corresponding mocks and UT's **Changes made to classes other than HostConfigDaemon:** With the previous design, the initially read data from the config db was applied by using hardcoded methods even before the config_db.listen() was called. For Eg: `update_all_features_config` for FeatureHandler and load() named methods for NtpCfgd etc But with this design, since the existing data is read and given out as a notification by SubscriberStateTable, i've pretty much removed these hardcoded methods. Thus changes made to these class will be around adapting them to the new design and no change in the actual functionality . #### How to verify it UT's: ``` tests/determine-reboot-cause_test.py ......... [ 29%] tests/procdockerstatsd_test.py . [ 32%] tests/caclmgrd/caclmgrd_dhcp_test.py ...... [ 51%] tests/hostcfgd/hostcfgd_radius_test.py .. [ 58%] tests/hostcfgd/hostcfgd_test.py ............. [100%] ``` Verified manually, ``` Sep 10 22:53:25.662621 sonic INFO systemd[1]: hostcfgd.service: Succeeded. Sep 10 22:55:04.127719 sonic INFO /hostcfgd: ConfigDB connect success Sep 10 22:55:04.128108 sonic INFO /hostcfgd: KdumpCfg init ... Sep 10 22:55:04.148819 sonic INFO /hostcfgd: Waiting for systemctl to finish initialization Sep 10 22:55:04.163452 sonic INFO /hostcfgd: systemctl has finished initialization -- proceeding ... Sep 10 22:55:04.163834 sonic INFO /hostcfgd: Kdump handler... Sep 10 22:55:04.164019 sonic INFO /hostcfgd: Kdump global configuration update Sep 10 22:55:04.758784 sonic INFO hostcfgd[184471]: kdump is already disabled Sep 10 22:55:04.758876 sonic INFO hostcfgd[184471]: Kdump is already disabled Sep 10 22:55:05.182021 sonic INFO hostcfgd[184511]: Kdump configuration has been updated in the startup configuration Sep 10 22:55:05.596919 sonic INFO hostcfgd[184528]: Kdump configuration has been updated in the startup configuration Sep 10 22:55:06.140627 sonic INFO /hostcfgd: Feature nat is stopped and disabled Sep 10 22:55:06.642629 sonic INFO /hostcfgd: Feature telemetry is enabled and started Sep 10 22:55:07.101297 sonic INFO /hostcfgd: Feature pmon is enabled and started Sep 10 22:55:07.554366 sonic INFO /hostcfgd: Feature database is enabled and started Sep 10 22:55:08.009329 sonic INFO /hostcfgd: Feature mgmt-framework is enabled and started Sep 10 22:55:08.394952 sonic INFO /hostcfgd: Feature macsec is stopped and disabled Sep 10 22:55:08.782853 sonic INFO /hostcfgd: Feature snmp is enabled and started Sep 10 22:55:09.205381 sonic INFO /hostcfgd: Feature teamd is enabled and started Sep 10 22:55:09.224877 sonic INFO /hostcfgd: Feature what-just-happened is enabled and started Sep 10 22:55:09.627929 sonic INFO /hostcfgd: Feature lldp is enabled and started Sep 10 22:55:10.086993 sonic INFO /hostcfgd: Feature swss is enabled and started Sep 10 22:55:10.170312 sonic INFO /hostcfgd: cmd - service aaastatsd stop Sep 10 22:55:11.012236 sonic INFO /hostcfgd: cmd - service aaastatsd stop Sep 10 22:55:12.225946 sonic INFO /hostcfgd: Feature bgp is enabled and started Sep 10 22:55:12.712792 sonic INFO /hostcfgd: Feature dhcp_relay is enabled and started Sep 10 22:55:13.166656 sonic INFO /hostcfgd: Feature sflow is stopped and disabled Sep 10 22:55:13.593639 sonic INFO /hostcfgd: Feature radv is enabled and started Sep 10 22:55:14.034106 sonic INFO /hostcfgd: Feature syncd is enabled and started Sep 10 22:55:14.113064 sonic INFO /hostcfgd: cmd - service aaastatsd stop Sep 10 22:55:14.863601 sonic INFO /hostcfgd: RADIUS_SERVER update: key: 10.10.10.1, op: SET, data: {'auth_type': 'pap', 'passkey': 'p*****', 'retransmit': '1', 'timeout': '1'} Sep 10 22:55:14.938605 sonic INFO /hostcfgd: cmd - service aaastatsd stop Sep 10 22:55:15.667545 sonic INFO /hostcfgd: RADIUS_SERVER update: key: 10.10.10.3, op: SET, data: {'auth_type': 'chap', 'passkey': 'p*****', 'retransmit': '2', 'timeout': '2'} Sep 10 22:55:15.667801 sonic INFO /hostcfgd: RADIUS (NAS) IP change - key:eth0, current global info {} Sep 10 22:55:15.746531 sonic INFO /hostcfgd: cmd - service aaastatsd stop Sep 10 23:04:47.435340 sonic INFO /hostcfgd: ntp server update key 0.debian.pool.ntp.org Sep 10 23:04:47.435661 sonic INFO /hostcfgd: ntp server update, restarting ntp-config, ntp servers configured {'0.debian.pool.ntp.org'} Sep 10 23:04:47.866394 sonic INFO /hostcfgd: NTP GLOBAL Update Sep 10 23:04:47.866557 sonic INFO /hostcfgd: ntp global update for source intf old {''} new {'eth0', 'Loopback0'}, restarting ntp-config Sep 10 23:16:25.157600 sonic INFO /hostcfgd: Running cmd: 'sudo systemctl unmask sflow.service' Sep 10 23:16:25.178472 sonic INFO hostcfgd[192106]: Removed /etc/systemd/system/sflow.service. Sep 10 23:16:25.582018 sonic INFO /hostcfgd: Running cmd: 'sudo systemctl enable sflow.service' Sep 10 23:16:25.604534 sonic INFO hostcfgd[192123]: Created symlink /etc/systemd/system/sonic.target.wants/sflow.service → /lib/systemd/system/sflow.service. Sep 10 23:16:26.029416 sonic INFO /hostcfgd: Running cmd: 'sudo systemctl start sflow.service' Sep 10 23:16:26.691927 sonic INFO /hostcfgd: Feature sflow is enabled and started ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
We have an internal test which invokes "config feature state sflow enabled" at around 92 sec after the boot and the update was never honored. The issue was narrowed down to the delayed start of hostcfgd (which starts at 90th sec). I'll explain with an example
Let say system boots up at 0th sec. hostcfg starts at 90th sec. The hostcfgd reads the FEATURE|* tables at 90th sec and acts on it accordingly for the next 10 seconds (i.e mask/unmask/enable/disable). And it then starts listening to notifications of feature table from 100th sec.
So, an update which is made before 90th sec and after 100th sec is honored. Updates made between 90th – 100th sec will not.
Steps to reproduce the issue:
Describe the results you received:
The update will not be honored
Describe the results you expected:
The update should be honored
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: