-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dhcp_relay service stopped with "systemctl stop swss" but not restarted with "systemctl restart swss" #2752
Comments
@jipanyang: Which branch is this image built from? master? 201803? 201811? |
@jleveque master branch
|
Thanks for the info. I am currently working on implementing a solution to auto-restart the swss container upon critical process crash in the master branch (similar to what I did for the 201803 branch here: #2546). I believe I fixed this problem as part of that solution. Is there any chance you can test an image built from the head of the 201803 branch to test it there and confirm whether or not it's fixed there? |
looks good to me on 201803 branch:
|
Great! Thanks for confirming, Jipan! I will be raising a PR soon to implement similar changes in the master branch, and they will be cherry-picked into the 201811 branch, as well. |
Update sonic-utilities submodule pointer to include the following: * 88ffb167 [config]config reload should generate sysinfo if missing ([sonic-net#2778](sonic-net/sonic-utilities#2778)) * 7443b9e5 [sonic-package-manager] support extension with multiple YANG modules ([sonic-net#2752](sonic-net/sonic-utilities#2752)) * 522c3a9e [sonic-package-manager] add support for multiple CLI plugin files ([sonic-net#2753](sonic-net/sonic-utilities#2753)) * b38fcfd1 [show][muxcable] fix RC ([sonic-net#2812](sonic-net/sonic-utilities#2812)) * 7e24463f [chassis]: remote cli commands infra for sonic chassis ([sonic-net#2701](sonic-net/sonic-utilities#2701)) * bee593e4 [DPB]Fixing typo in config breakout output ([sonic-net#2802](sonic-net/sonic-utilities#2802)) * ada603c5 [config]Support multi-asic Golden Config override ([sonic-net#2738](sonic-net/sonic-utilities#2738)) * 88a7daa8 [show][barefoot] replace shell=True ([sonic-net#2699](sonic-net/sonic-utilities#2699)) * 5e99edb5 [sonic_package_manager] replace shell=True ([sonic-net#2726](sonic-net/sonic-utilities#2726)) * b547bb45 [acl-loader] Only add default deny rule when table is L3 or L3V6 ([sonic-net#2796](sonic-net/sonic-utilities#2796)) Signed-off-by: dprital <drorp@nvidia.com>
Why I did it 69abbc3c - (HEAD, origin/master, origin/HEAD) Revert "[GCU] Complete RDMA Platform Validation Checks [device][platform] Update Inventec new platform d6356 #2791" DellEMC S6100 Watchdog Support #2854 (8 minutes ago) 4fead896 - [sonic-package-manager] fix CLI plugin compatibility issue [sonic-utilities] advance submodule head to latest #2842 (27 hours ago) db61efca - [vlan][dhcp_relay] Clear dhcpv6 relay counter while deleting vlan ([201811] [services] Restart SwSS service upon unexpected critical process exit #2852) (33 hours ago) d5544b4a - [config] Generate sysinfo as needed when override config ([minigraph]: Add mirror type v6 condition #2836) (6 days ago) f258e2a3 - [GCU] Complete RDMA Platform Validation Checks ([device][platform] Update Inventec new platform d6356 #2791) (6 days ago) b4f4e63e - Revert "Revert frr route check ([mlnx] fix url inconsistency in fw.mk #2761)" (Support TACACS Accounting #2762) (7 days ago) 3d89589f - Update pcieutil error message on loading common pcie module (Enable Debugs in BCM Kernel-bde and Knet Modules #2786) (11 days ago) e6aacd37 - Update TRANSCEIVER_INFO table after CDB FW upgrade (Remove unused packages in docker images and host (#2807) #2837) (2 weeks ago) 33d665c4 - replace shell=True, replace xml, and replace exit() ([mellanox-simx] add ability to build simx-compatiable image #2664) (2 weeks ago) 9e510a83 - [chassis][voq[Add "config fabric port ..." commands and tests. (Watchdog enable/disable in DellEMC S6100 #2730) (2 weeks ago) aeb0dbc1 - Fix the invalid variable issue when set-fips in uboot (fix bug in file sonic-cfggen #2834) (3 weeks ago) 1e73632d - [test]: add UT coverage for GCU (Feed device info to orchagent process #2818) (3 weeks ago) 3a9995b6 - [config]Support multi-asic Golden Config override with fix ([mellanox] Update Mellanox MFT packedge #2825) (3 weeks ago) 3fb32588 - Revert "[chassis]: remote cli commands infra for sonic chassis ([mellanox] add makefiles to build Mellanox SDK from sources #2701)" ([dhcp_relay] Base DHCP Relay Docker container on Debian Stretch #2832) (3 weeks ago) 2ffe6e37 - [show][mlnx] replace shell=True, replace xml (Add support of HwSKU Mellanox-SN2700-C28D8 #2700) (3 weeks ago) a5091bba - [sonic_sku_create] remove shell=True, replace exit() with sys.exit() (removed exec from script which that prevents the further lines to be … #2816) (3 weeks ago) 71ef4f16 - [build] Fix base OS compilation issue caused by incompatibility with requests >= 2.29.0. ([201811][sairedis][utilities] advance sub module heads #2830) (3 weeks ago) 1097373b - [show] Added alias interface mode support for 'show interfaces counters ...' command ([kernel]: update sonic kernel to 4.9.0-8-2 #2468) (4 weeks ago) <Julian Chang - TW> 589375fc - correctly parsing complete ipv6 vnet info ([201811][mellanox] Update Mellanox FW version to 13.1910.0928 #2827) (4 weeks ago) 634ac77c - LAG keepalive script to reduce lacp session wait during warm-reboot (Set proper hostname on containers startup #2806) (4 weeks ago) 331c9de0 - [config]: Dynamically start and stop ndppd ([Arista] Add QoS needed files for Arista 7170 #2814) (4 weeks ago) d1f307d0 - [GCU]Fix rdma check failure ([device/celestica]: Add fwutil #2824) (4 weeks ago) ce81a340 - Revert "[config]Support multi-asic Golden Config override (Before issue “sonic-clear counters”, “show interface counters” result not complete #2738)" ([BGP docker]: start bgp_eoiu_mark service to populate bgp eoiu marker… #2823) (4 weeks ago) 61e0e810 - Added platform plugin support in load_minigraph ([db migrator] migrate the DB to latest schema when needed #2808) (4 weeks ago) d4355a96 - Change default CDB run mode to non-hitless (Revert "Watchdog enable/disable in DellEMC S6100 " #2817) (4 weeks ago) 88ffb167 - [config]config reload should generate sysinfo if missing ([Mellanox] Update SAI #2778) (4 weeks ago) 7443b9e5 - [sonic-package-manager] support extension with multiple YANG modules (dhcp_relay service stopped with "systemctl stop swss" but not restarted with "systemctl restart swss" #2752) (4 weeks ago) 522c3a9e - [sonic-package-manager] add support for multiple CLI plugin files (Updated Makefile infrastructure to build debug images. #2753) (4 weeks ago) b38fcfd1 - [show][muxcable] fix show mux hwmode muxdirection RC (syncd-rpc.mk: Fix stretch dockers build failure #2812) (5 weeks ago) 7e24463f - [chassis]: remote cli commands infra for sonic chassis ([mellanox] add makefiles to build Mellanox SDK from sources #2701) (6 weeks ago) bee593e4 - [DPB]Fixing typo in config breakout output ([submodule update]: Quagga bgpd crash fix #2802) (6 weeks ago) ada603c5 - [config]Support multi-asic Golden Config override (Before issue “sonic-clear counters”, “show interface counters” result not complete #2738) (6 weeks ago) 88a7daa8 - [show][barefoot] replace shell=True ([teamd] retry creating team_port after interface info changed #2699) (6 weeks ago) 5e99edb5 - [sonic_package_manager] replace shell=True (Upgrade Mellanox HW-MGMT: fix high CPU utilization issue #2726) (6 weeks ago) b547bb45 - [acl-loader] Only add default deny rule when table is L3 or L3V6 ([201811] [radvd] Build radvd from source; Patch so as not to treat out-of-range MTU as an error #2796) (6 weeks ago)
… fetching (sonic-net#2752) What I did Optimize QoS operations: Cache queue information to avoid fetching them from SAI every time The cache is created when a queue's information is fetched for the first time Avoid calling SAI API to fetch queue information if it exists in the cache Cache will be cleared for the queues of a certain port when the port is removed Apply buffer items (table: BUFFER_QUEUE, BUFFER_PG, BUFFER_PORT_INGRESS_PROFILE_LIST, BUFFER_PORT_EGRESS_PROFILE_LIST) only if they are updated There is only one attribute, profile or profile_list, in the items in all the tables, and the attribute is stored in BufferOrch::m_buffer_type_maps, which means we can just check whether the new value is the same as the one stored in the mapping and apply to SAI only if it differs. For the BUFFER_QUEUE table, it's possible that it needs to retry when a PFC storm is detected on the queue. A new set m_partiallyAppliedQueues is introduced to handle this case. In any case, if it fails to call SAI API, we do not repeat calling it when the buffer table is set with the same value of attribute because it's users' responsibility to correct the configuration. Signed-off-by: Stephen Sun stephens@nvidia.com Why I did it Theoretically, it should be fast for both operations. But there is a mutex in sairedis enforcing a critical section for all SAI APIs. In case there is another SAI API ongoing, eg. fetching the counter, it has to wait for the current one to finish which can take more milliseconds. This occurs frequently when a large number of buffer PG or queue items are being set and the accumulated time is significant. In this scenario, two threads run parallelly and they will compete the critical section. Syncd main thread in which the buffer PG, queue setting API, or queue info getting API runs, FlexCounter thread in which the counter is fetched. How I verified it Mock test Regression test Details if related An example of queue information fetching. For each queue, the information is fetched for 5 times, which consumes ~0.25 seconds. With the caching logic, it will be called only once. 2023-04-20.18:01:00.634562|a|INIT_VIEW 2023-04-20.18:01:00.635586|A|SAI_STATUS_SUCCESS -- 2023-04-20.18:01:43.290205|g|SAI_OBJECT_TYPE_QUEUE:oid:0x15000000000549|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_ALL|SAI_QUEUE_ATTR_INDEX=205 2023-04-20.18:01:43.331625|G|SAI_STATUS_SUCCESS|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_UNICAST|SAI_QUEUE_ATTR_INDEX=4 -- 2023-04-20.18:01:46.420931|g|SAI_OBJECT_TYPE_QUEUE:oid:0x15000000000549|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_ALL|SAI_QUEUE_ATTR_INDEX=0 2023-04-20.18:01:46.422113|G|SAI_STATUS_SUCCESS|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_UNICAST|SAI_QUEUE_ATTR_INDEX=4 -- 2023-04-20.18:01:56.825879|g|SAI_OBJECT_TYPE_QUEUE:oid:0x15000000000549|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_ALL|SAI_QUEUE_ATTR_INDEX=24 2023-04-20.18:01:56.866720|G|SAI_STATUS_SUCCESS|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_UNICAST|SAI_QUEUE_ATTR_INDEX=4 -- 2023-04-20.18:02:37.248679|a|APPLY_VIEW 2023-04-20.18:02:37.249435|A|SAI_STATUS_SUCCESS -- 2023-04-20.18:02:54.824194|g|SAI_OBJECT_TYPE_QUEUE:oid:0x15000000000549|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_ALL|SAI_QUEUE_ATTR_INDEX=205 2023-04-20.18:02:54.866955|G|SAI_STATUS_SUCCESS|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_UNICAST|SAI_QUEUE_ATTR_INDEX=4 -- 2023-04-20.18:02:54.932174|g|SAI_OBJECT_TYPE_QUEUE:oid:0x15000000000549|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_ALL|SAI_QUEUE_ATTR_INDEX=205 2023-04-20.18:02:54.965082|G|SAI_STATUS_SUCCESS|SAI_QUEUE_ATTR_TYPE=SAI_QUEUE_TYPE_UNICAST|SAI_QUEUE_ATTR_INDEX=4
Why I did it 69abbc3c - (HEAD, origin/master, origin/HEAD) Revert "[GCU] Complete RDMA Platform Validation Checks [device][platform] Update Inventec new platform d6356 sonic-net#2791" DellEMC S6100 Watchdog Support sonic-net#2854 (8 minutes ago) 4fead896 - [sonic-package-manager] fix CLI plugin compatibility issue [sonic-utilities] advance submodule head to latest sonic-net#2842 (27 hours ago) db61efca - [vlan][dhcp_relay] Clear dhcpv6 relay counter while deleting vlan ([201811] [services] Restart SwSS service upon unexpected critical process exit sonic-net#2852) (33 hours ago) d5544b4a - [config] Generate sysinfo as needed when override config ([minigraph]: Add mirror type v6 condition sonic-net#2836) (6 days ago) f258e2a3 - [GCU] Complete RDMA Platform Validation Checks ([device][platform] Update Inventec new platform d6356 sonic-net#2791) (6 days ago) b4f4e63e - Revert "Revert frr route check ([mlnx] fix url inconsistency in fw.mk sonic-net#2761)" (Support TACACS Accounting sonic-net#2762) (7 days ago) 3d89589f - Update pcieutil error message on loading common pcie module (Enable Debugs in BCM Kernel-bde and Knet Modules sonic-net#2786) (11 days ago) e6aacd37 - Update TRANSCEIVER_INFO table after CDB FW upgrade (Remove unused packages in docker images and host (sonic-net#2807) sonic-net#2837) (2 weeks ago) 33d665c4 - replace shell=True, replace xml, and replace exit() ([mellanox-simx] add ability to build simx-compatiable image sonic-net#2664) (2 weeks ago) 9e510a83 - [chassis][voq[Add "config fabric port ..." commands and tests. (Watchdog enable/disable in DellEMC S6100 sonic-net#2730) (2 weeks ago) aeb0dbc1 - Fix the invalid variable issue when set-fips in uboot (fix bug in file sonic-cfggen sonic-net#2834) (3 weeks ago) 1e73632d - [test]: add UT coverage for GCU (Feed device info to orchagent process sonic-net#2818) (3 weeks ago) 3a9995b6 - [config]Support multi-asic Golden Config override with fix ([mellanox] Update Mellanox MFT packedge sonic-net#2825) (3 weeks ago) 3fb32588 - Revert "[chassis]: remote cli commands infra for sonic chassis ([mellanox] add makefiles to build Mellanox SDK from sources sonic-net#2701)" ([dhcp_relay] Base DHCP Relay Docker container on Debian Stretch sonic-net#2832) (3 weeks ago) 2ffe6e37 - [show][mlnx] replace shell=True, replace xml (Add support of HwSKU Mellanox-SN2700-C28D8 sonic-net#2700) (3 weeks ago) a5091bba - [sonic_sku_create] remove shell=True, replace exit() with sys.exit() (removed exec from script which that prevents the further lines to be … sonic-net#2816) (3 weeks ago) 71ef4f16 - [build] Fix base OS compilation issue caused by incompatibility with requests >= 2.29.0. ([201811][sairedis][utilities] advance sub module heads sonic-net#2830) (3 weeks ago) 1097373b - [show] Added alias interface mode support for 'show interfaces counters ...' command ([kernel]: update sonic kernel to 4.9.0-8-2 sonic-net#2468) (4 weeks ago) <Julian Chang - TW> 589375fc - correctly parsing complete ipv6 vnet info ([201811][mellanox] Update Mellanox FW version to 13.1910.0928 sonic-net#2827) (4 weeks ago) 634ac77c - LAG keepalive script to reduce lacp session wait during warm-reboot (Set proper hostname on containers startup sonic-net#2806) (4 weeks ago) 331c9de0 - [config]: Dynamically start and stop ndppd ([Arista] Add QoS needed files for Arista 7170 sonic-net#2814) (4 weeks ago) d1f307d0 - [GCU]Fix rdma check failure ([device/celestica]: Add fwutil sonic-net#2824) (4 weeks ago) ce81a340 - Revert "[config]Support multi-asic Golden Config override (Before issue “sonic-clear counters”, “show interface counters” result not complete sonic-net#2738)" ([BGP docker]: start bgp_eoiu_mark service to populate bgp eoiu marker… sonic-net#2823) (4 weeks ago) 61e0e810 - Added platform plugin support in load_minigraph ([db migrator] migrate the DB to latest schema when needed sonic-net#2808) (4 weeks ago) d4355a96 - Change default CDB run mode to non-hitless (Revert "Watchdog enable/disable in DellEMC S6100 " sonic-net#2817) (4 weeks ago) 88ffb167 - [config]config reload should generate sysinfo if missing ([Mellanox] Update SAI sonic-net#2778) (4 weeks ago) 7443b9e5 - [sonic-package-manager] support extension with multiple YANG modules (dhcp_relay service stopped with "systemctl stop swss" but not restarted with "systemctl restart swss" sonic-net#2752) (4 weeks ago) 522c3a9e - [sonic-package-manager] add support for multiple CLI plugin files (Updated Makefile infrastructure to build debug images. sonic-net#2753) (4 weeks ago) b38fcfd1 - [show][muxcable] fix show mux hwmode muxdirection RC (syncd-rpc.mk: Fix stretch dockers build failure sonic-net#2812) (5 weeks ago) 7e24463f - [chassis]: remote cli commands infra for sonic chassis ([mellanox] add makefiles to build Mellanox SDK from sources sonic-net#2701) (6 weeks ago) bee593e4 - [DPB]Fixing typo in config breakout output ([submodule update]: Quagga bgpd crash fix sonic-net#2802) (6 weeks ago) ada603c5 - [config]Support multi-asic Golden Config override (Before issue “sonic-clear counters”, “show interface counters” result not complete sonic-net#2738) (6 weeks ago) 88a7daa8 - [show][barefoot] replace shell=True ([teamd] retry creating team_port after interface info changed sonic-net#2699) (6 weeks ago) 5e99edb5 - [sonic_package_manager] replace shell=True (Upgrade Mellanox HW-MGMT: fix high CPU utilization issue sonic-net#2726) (6 weeks ago) b547bb45 - [acl-loader] Only add default deny rule when table is L3 or L3V6 ([201811] [radvd] Build radvd from source; Patch so as not to treat out-of-range MTU as an error sonic-net#2796) (6 weeks ago)
radv and telemtry have same issue.
The text was updated successfully, but these errors were encountered: