-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAI_STATUS_TABLE_FULL and swss:orchagent shutdown #2125
Comments
As per the current design, orchagent crashes when there is a "table full" error. This is the expectation. However, in this case, looks like the crm available count is returning an incorrect value (4 instead of 0). The available count is returned by the SAI vendor based on their table size. If this is consistently happening, we would need to take this with Broadcom. |
Any plan to not to crash orchagent in this "table full" error? And we will work with Broadcom if incorrect crm available count becomes an issue to us. Thanks for your prompt follow-up Sunny! |
Not planned for any immediate release! |
Ok, Sunny, could you help to add this as a soon-to-fix critical issue and let us know when the fix will be available? SONiC no longer works after orchagent crashes, and it has critical impact on users. Thanks! |
please enable the alpm so that you are going to hit routing table issue in the near future. |
``` 22a388b [show] fix get routing stack routine (sonic-net#2137) cb3a047 Support option --ports of config qos reload for reloading ports' QoS and buffer configuration to default (sonic-net#2125) 154a801 Enhance "config interface type/advertised-type" to be blocked on RJ45 ports (sonic-net#2112) 3732ac5 Add CLI for route flow counter feature (sonic-net#2031) 29771e7 [techsupport] improve robustness (sonic-net#2117) f9dc681 [intfutil] Display RJ45 port and portchannel speed in 'M' instead of 'G' when it's <= 1000M (sonic-net#2110) 781ae9f [config] Do not enable pfcwd for BmcMgmtToRRouter (sonic-net#2136) 23e9398 [scripts/fast-reboot] Shutdown remaining containers through systemd (sonic-net#2133) 576c9ef [scripts/fast-reboot] stop timers in advance (sonic-net#2131) 4dad79c bugfix: incorrect command for portchannel creation (sonic-net#2134) c17b1f4 [show][muxcable] Decrease the timeout for show mux status/hwmode (sonic-net#2130) 49d61f8 [scripts/fast-reboot] cleanup (sonic-net#2132) 52ca324 [config/config_mgmt.py]: Fix dpb issue with upper case mac in (sonic-net#2066) 9e2fbf4 Update db_migrator to support `pfcwd_sw_enable` (sonic-net#2087) 4010bd0 FGNHG CLI changes (sonic-net#1588) 6bd54d0 Fix 'show mac' output when FDB entry for default vlan is None instead of 1 (sonic-net#2126) ``` Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
288c2d8 Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)" (#2161) bce4694 [autoneg] add support for remote speed advertisement (#2124) a73f156 [show][vrf]Fixing show vrf to include vlan subinterface (#2158) 7a06457 [auto_ts] Enable register/de-register auto_ts config for APP Extension (#2139) 083ebcc Add transceiver-info items advertised for cmis-supported moddules (#2135) 0811214 Validate destination port is not LAG (#2053) 6ab1c51 [minigraph] Consume golden_config_db.json while loading minigraph (#2140) c37a957 [Kdump] Remove the duplicate logic if Kdump was disabled (#2128) 1143869 Ordering fix for sfpshow eeprom (#2113) fdb79b8 Allow fw update for other boot type against on the previous "none" boot fw update (#2040) a54a091 [GCU] Supressing YANG errors from libyang while sorting (#1991) fbfa8bc [GCU] Enabling AddRack and adding RemoveRack tests (#2143) d012be9 [Command-Reference] Add CLI docs for route flow counter (#2069) 8c07d59 [Mellanox] [reboot] [asan] stop asan-enabled containers on reboot (#2107) 697aae3 Fix speed parsing when speed is NOT fetched from APPL_DB (#2138) 22a388b [show] fix get routing stack routine (#2137) cb3a047 Support option --ports of config qos reload for reloading ports' QoS and buffer configuration to default (#2125) 154a801 Enhance "config interface type/advertised-type" to be blocked on RJ45 ports (#2112) 3732ac5 Add CLI for route flow counter feature (#2031) 29771e7 [techsupport] improve robustness (#2117) f9dc681 [intfutil] Display RJ45 port and portchannel speed in 'M' instead of 'G' when it's <= 1000M (#2110) 781ae9f [config] Do not enable pfcwd for BmcMgmtToRRouter (#2136) 23e9398 [scripts/fast-reboot] Shutdown remaining containers through systemd (#2133) 576c9ef [scripts/fast-reboot] stop timers in advance (#2131) 4dad79c bugfix: incorrect command for portchannel creation (#2134) c17b1f4 [show][muxcable] Decrease the timeout for show mux status/hwmode (#2130) 49d61f8 [scripts/fast-reboot] cleanup (#2132) 52ca324 [config/config_mgmt.py]: Fix dpb issue with upper case mac in (#2066) 9e2fbf4 Update db_migrator to support `pfcwd_sw_enable` (#2087) 4010bd0 FGNHG CLI changes (#1588) 6bd54d0 Fix 'show mac' output when FDB entry for default vlan is None instead of 1 (#2126)
…anch Related work items: #52, #71, #73, #75, #77, sonic-net#1306, sonic-net#1588, sonic-net#1991, sonic-net#2031, sonic-net#2040, sonic-net#2053, sonic-net#2066, sonic-net#2069, sonic-net#2087, sonic-net#2107, sonic-net#2110, sonic-net#2112, sonic-net#2113, sonic-net#2117, sonic-net#2124, sonic-net#2125, sonic-net#2126, sonic-net#2128, sonic-net#2130, sonic-net#2131, sonic-net#2132, sonic-net#2133, sonic-net#2134, sonic-net#2135, sonic-net#2136, sonic-net#2137, sonic-net#2138, sonic-net#2139, sonic-net#2140, sonic-net#2143, sonic-net#2158, sonic-net#2161, sonic-net#2233, sonic-net#2243, sonic-net#2250, sonic-net#2254, sonic-net#2260, sonic-net#2261, sonic-net#2267, sonic-net#2278, sonic-net#2282, sonic-net#2285, sonic-net#2288, sonic-net#2289, sonic-net#2292, sonic-net#2294, sonic-net#8887, sonic-net#9279, sonic-net#9390, sonic-net#9511, sonic-net#9700, sonic-net#10025, sonic-net#10322, sonic-net#10479, sonic-net#10484, sonic-net#10493, sonic-net#10500, sonic-net#10580, sonic-net#10595, sonic-net#10628, sonic-net#10634, sonic-net#10635, sonic-net#10644, sonic-net#10670, sonic-net#10691, sonic-net#10716, sonic-net#10731, sonic-net#10750, sonic-net#10751, sonic-net#10752, sonic-net#10761, sonic-net#10769, sonic-net#10775, sonic-net#10776, sonic-net#10779, sonic-net#10786, sonic-net#10792, sonic-net#10793, sonic-net#10800, sonic-net#10806, sonic-net#10826, sonic-net#10839, sonic-net#10840, sonic-net#10842, sonic-net#10844, sonic-net#10847, sonic-net#10849, sonic-net#10852, sonic-net#10865, sonic-net#10872, sonic-net#10877, sonic-net#10886, sonic-net#10889, sonic-net#10903, sonic-net#10904, sonic-net#10905, sonic-net#10913, sonic-net#10914, sonic-net#10916, sonic-net#10919, sonic-net#10925, sonic-net#10926, sonic-net#10929, sonic-net#10933, sonic-net#10934, sonic-net#10937, sonic-net#10941, sonic-net#10947, sonic-net#10952, sonic-net#10953, sonic-net#10957, sonic-net#10959, sonic-net#10971, sonic-net#10972, sonic-net#10980
We are testing SONiC to add 8k+ ipv4 routes in Broadcom BCM56960 switch. CRM showed 8192 ipv4_routes available. We could add up to 8188. Then, when adding the 8189th route, we hit following syslog errors, see "* syslog:" below, SAI_STATUS_TABLE_FULL, and syncd calls exit_and_notify() to shutdown orchagent running in swss container. We have to do "config reload" or reboot system to recover.
I saw there is one similar issue case opened before:
[syncd][topology t0] exit_and_notify after processing the event of SAI_STATUS_TABLE_FULL #654
sonic-net/sonic-mgmt#654
Is any way to prevent to hit this condition, e.g. SONiC code RouterOrch::addRoute checks available routes before actually adding a route? Looks like SAI_STATUS_TABLE_FULL and shutting down orchagent would apply on all resources listed in crm when more than allowed resources are used, e.g. ipv6_route, ipv4_neighbor, etc., see below. Any plan to enhance and avoid shutting down orchagent in this SAI_STATUS_TABLE_FULL case?
Thanks!
Wilson
The text was updated successfully, but these errors were encountered: