-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stale Counter db table exists even after rif is deleted. #2193
Comments
I am submitting a PR for this issue |
sumanbrcm
added a commit
to sumanbrcm/sonic-swss
that referenced
this issue
Mar 25, 2022
sumanbrcm
added a commit
to sumanbrcm/sonic-swss
that referenced
this issue
Apr 1, 2022
sumanbrcm
added a commit
to sumanbrcm/sonic-swss
that referenced
this issue
Apr 1, 2022
sumanbrcm
added a commit
to sumanbrcm/sonic-swss
that referenced
this issue
Apr 1, 2022
prsunny
pushed a commit
that referenced
this issue
May 12, 2022
preetham-singh
pushed a commit
to preetham-singh/sonic-swss
that referenced
this issue
Aug 6, 2022
To handle this , whenever RIF is deleted (removeRouterIntfs) , we ensure that these COUNTER_DB tables are deleted(handled in cleanUpRifFromCounterDb) .
sumanbrcm
added a commit
to sumanbrcm/sonic-swss
that referenced
this issue
Oct 13, 2022
* Fixing issue #11621 * The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue sonic-net#2193 the cleanup for stale rif counters was added. * But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them. * So as a fix all such cleanup has been moved to syncd. Signed-off-by: suman.kumar@broadcom.com
sumanbrcm
added a commit
to sumanbrcm/sonic-swss
that referenced
this issue
Oct 13, 2022
* Fixing issue #11621 * The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue sonic-net#2193 the cleanup for stale rif counters was added. * But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them. * So as a fix all such cleanup has been moved to syncd. Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
liat-grozovik
pushed a commit
that referenced
this issue
Nov 8, 2022
The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue #2193 the cleanup for stale rif counters was added. But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them. So as a fix all such cleanup has been moved to syncd. - What I did Fix for sonic-net/sonic-buildimage#11621 As a past fix which aimed at removing stale rif counters (#2199) , there is a chance of race condition and it leads to lua script reporting error. To handle this , the rif counters cleanup code(handled in cleanUpRifFromCounterDb) is now called from syncd ( removeCounter ) to avoid such race condition. - Why I did it The operations in Orchagent and syncd is not synchronous, so while Orchagent deletes the rif counters from Counters Db, the syncd could still access it. In race conditions the lua script trying to fetch rif counters will have errors syslog for such access as it was already deleted by orchagent. The cleanup code is removed from orchagent is added in syncd - it will make sure no such race condition would get hit. - How I verified it Followed the steps in (sonic-net/sonic-buildimage#11621) : Create RIF in SONiC, wait till RIF rates are populated in COUNTERS DB Remove RIF Repeat the steps multiple times and check if any error syslog is seen (No error syslog is seen) Also checked cleanup for rif counters. After RIF creation derived info of oid for RIF from "COUNTERS_RIF_NAME_MAP" 127) "Vlan100" 128) "oid:0x6000000000aa5" Checked all the tabled in COUNTER_DB which has same OID in keys 127.0.0.1:6379[2]> keys 6000000000aa5 1) "RATES:oid:0x6000000000aa5:RIF" 2) "COUNTERS:oid:0x6000000000aa5" 3) "RATES:oid:0x6000000000aa5" 127.0.0.1:6379[2]> Deleted the RIF by removing the ip on the intf. Checked COUNTER_DB again with same OID if there are stale entries or not. No stale entries exist now. 127.0.0.1:6379[2]> keys 6000000000aa5 (empty array) 127.0.0.1:6379[2]> Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
yxieca
pushed a commit
that referenced
this issue
Nov 10, 2022
The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue #2193 the cleanup for stale rif counters was added. But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them. So as a fix all such cleanup has been moved to syncd. - What I did Fix for sonic-net/sonic-buildimage#11621 As a past fix which aimed at removing stale rif counters (#2199) , there is a chance of race condition and it leads to lua script reporting error. To handle this , the rif counters cleanup code(handled in cleanUpRifFromCounterDb) is now called from syncd ( removeCounter ) to avoid such race condition. - Why I did it The operations in Orchagent and syncd is not synchronous, so while Orchagent deletes the rif counters from Counters Db, the syncd could still access it. In race conditions the lua script trying to fetch rif counters will have errors syslog for such access as it was already deleted by orchagent. The cleanup code is removed from orchagent is added in syncd - it will make sure no such race condition would get hit. - How I verified it Followed the steps in (sonic-net/sonic-buildimage#11621) : Create RIF in SONiC, wait till RIF rates are populated in COUNTERS DB Remove RIF Repeat the steps multiple times and check if any error syslog is seen (No error syslog is seen) Also checked cleanup for rif counters. After RIF creation derived info of oid for RIF from "COUNTERS_RIF_NAME_MAP" 127) "Vlan100" 128) "oid:0x6000000000aa5" Checked all the tabled in COUNTER_DB which has same OID in keys 127.0.0.1:6379[2]> keys 6000000000aa5 1) "RATES:oid:0x6000000000aa5:RIF" 2) "COUNTERS:oid:0x6000000000aa5" 3) "RATES:oid:0x6000000000aa5" 127.0.0.1:6379[2]> Deleted the RIF by removing the ip on the intf. Checked COUNTER_DB again with same OID if there are stale entries or not. No stale entries exist now. 127.0.0.1:6379[2]> keys 6000000000aa5 (empty array) 127.0.0.1:6379[2]> Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
oleksandrivantsiv
pushed a commit
to oleksandrivantsiv/sonic-swss
that referenced
this issue
Mar 1, 2023
* Fixing issue #11621 * The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue sonic-net#2193 the cleanup for stale rif counters was added. * But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them. * So as a fix all such cleanup has been moved to syncd. Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
SONiC maintains rif counter statistics in COUNTER_DB with the the help of 4 tables. They are needed for displaying packets/bytes as well as rates in tx/tx direction. These tables are accessed by object-id(OID) for the corresponding RIF. The RIF - OID is maintained in another counter-db table COUNTERS_RIF_NAME_MAP. But once RIF is deleted, although the mapping is deleted , but the 4 tables do not get deleted and remain as stale.
Steps to reproduce the issue:
Describe the results you received:
Upon execution of the steps , after RIF was deleted the stale table still existed in COUNTERS DB. For clarity the exact CLI is also shown as below . After deletion of RIF, the stale table exists which is shown in step 6 .
sudo config vlan member add 100 Ethernet48
HGETALL "COUNTERS_RIF_NAME_MAP"
127) "Vlan100"
128) "oid:0x6000000000ba6"
1) "COUNTERS:oid:0x6000000000ba6"
2) "RATES:oid:0x6000000000ba6"
3) "RATES:oid:0x6000000000ba6:RIF"
127.0.0.1:6379[2]>
1) "COUNTERS:oid:0x6000000000ba6"
2) "RATES:oid:0x6000000000ba6"
3) "RATES:oid:0x6000000000ba6:RIF"
Describe the results you expected:
The expectation is there should not be any stale table for any deleted/non-existing RIF.
Output of show version:
admin@sonic:~$ show version
SONiC Software Version: SONiC.master.0-9ac289a88
Distribution: Debian 11.2
Kernel: 5.10.0-8-2-amd64
Build commit: 9ac289a88
Build date: Tue Mar 22 11:08:10 UTC 2022
Built by: sk408115@sonic-lvn-csg-004
Platform: x86_64-accton_as7816_64x-r0
HwSKU: Accton-AS7816-64X
ASIC: broadcom
ASIC Count: 1
Serial Number: N/A
Model Number: N/A
Hardware Revision: N/A
Uptime: 18:49:58 up 33 min, 1 user, load average: 0.15, 0.61, 0.70
Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-orchagent latest da83f8d4ea98 462MB
docker-orchagent master.0-9ac289a88 da83f8d4ea98 462MB
docker-fpm-frr latest 3ee865fd5b1c 460MB
docker-fpm-frr master.0-9ac289a88 3ee865fd5b1c 460MB
docker-nat latest a69d62fac6ba 445MB
docker-nat master.0-9ac289a88 a69d62fac6ba 445MB
docker-macsec latest 5a6aa61a8668 445MB
docker-macsec master.0-9ac289a88 5a6aa61a8668 445MB
docker-sflow latest 8bf63e3ef39a 443MB
docker-sflow master.0-9ac289a88 8bf63e3ef39a 443MB
docker-teamd latest 1ee6c94c589b 442MB
docker-teamd master.0-9ac289a88 1ee6c94c589b 442MB
docker-platform-monitor latest 5effe289b674 541MB
docker-platform-monitor master.0-9ac289a88 5effe289b674 541MB
docker-snmp latest a6442486551c 471MB
docker-snmp master.0-9ac289a88 a6442486551c 471MB
docker-mux latest 530e7ff28531 480MB
docker-mux master.0-9ac289a88 530e7ff28531 480MB
docker-dhcp-relay latest 3af842692b56 449MB
docker-sonic-telemetry latest 10893f4fc878 528MB
docker-sonic-telemetry master.0-9ac289a88 10893f4fc878 528MB
docker-database latest 2af56fc5bcab 439MB
docker-database master.0-9ac289a88 2af56fc5bcab 439MB
docker-syncd-brcm latest 15125617b736 805MB
docker-syncd-brcm master.0-9ac289a88 15125617b736 805MB
docker-gbsyncd-credo latest f2fb6158c6ba 443MB
docker-gbsyncd-credo master.0-9ac289a88 f2fb6158c6ba 443MB
docker-sonic-mgmt-framework latest 2e24cd014e00 582MB
docker-sonic-mgmt-framework master.0-9ac289a88 2e24cd014e00 582MB
docker-lldp latest 4aab8886498e 467MB
docker-lldp master.0-9ac289a88 4aab8886498e 467MB
docker-router-advertiser latest f4666d5d47ed 427MB
docker-router-advertiser master.0-9ac289a88 f4666d5d47ed 427MB
admin@sonic:~$
Output of show techsupport:
The text was updated successfully, but these errors were encountered: