Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sairedis: Fixing race condition for rif counters #1136

Merged
merged 1 commit into from
Nov 10, 2022
Merged

sairedis: Fixing race condition for rif counters #1136

merged 1 commit into from
Nov 10, 2022

Conversation

sumanbrcm
Copy link
Contributor

  • Fixing issue #11621
  • The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue #2193 the cleanup for stale rif counters was added.
  • But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them.
  • So as a fix all such cleanup has been moved to syncd.

Signed-off-by: Suman Kumar suman.kumar@broadcom.com

All the details are mentioned in :
sonic-net/sonic-swss#2488

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 13, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: sumanbrcm / name: Suman Kumar (d664f98)

@sumanbrcm
Copy link
Contributor Author

Added @prsunny for review

@sumanbrcm
Copy link
Contributor Author

This PR along with (sonic-net/sonic-swss#2488) handles the below issue :
sonic-net/sonic-buildimage#11621

return "RATES:" + key + ":RIF";
}

void FlexCounter::cleanUpRifFromCounterDb(_In_ sai_object_id_t rifId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please try reuse removeDataFromCountersDB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Junchao-Mellanox , Sure , I will use as suggested and check/UT . If it works fine, will submit a patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Junchao-Mellanox , The changes you suggested worked fine , submitting a new patch for this.

@kcudnik
Copy link
Collaborator

kcudnik commented Oct 14, 2022

is this backward compatible?

@sumanbrcm
Copy link
Contributor Author

@kcudnik , it is backward compatible. The stale rif counter db table/entries was cleaned up in orchagent in previous version while now it will be cleaned up in syncd.

@kcudnik
Copy link
Collaborator

kcudnik commented Oct 17, 2022

can you solve build issues?

@sumanbrcm
Copy link
Contributor Author

sure , will resolve it (Please note the new patch will be submitted if it works fine with @Junchao-Mellanox suggestion)

* Fixing issue #11621
* The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue #2193 the cleanup for stale rif counters was added.
* But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them.
* So as a fix all such cleanup has been moved to syncd.

Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
@sumanbrcm
Copy link
Contributor Author

/azpw run Azure.sonic-sairedis

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-sairedis

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sumanbrcm
Copy link
Contributor Author

/AzurePipelines run Azure.sonic-sairedis

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1136 in repo sonic-net/sonic-sairedis

@sumanbrcm
Copy link
Contributor Author

/AzurePipelines run Azure.sonic-sairedis

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1136 in repo sonic-net/sonic-sairedis

@sumanbrcm
Copy link
Contributor Author

/AzurePipelines run Azure.sonic-sairedis

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1136 in repo sonic-net/sonic-sairedis

@sumanbrcm
Copy link
Contributor Author

/AzurePipelines run Azure.sonic-sairedis

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1136 in repo sonic-net/sonic-sairedis

@sumanbrcm
Copy link
Contributor Author

/AzurePipelines run Azure.sonic-sairedis

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1136 in repo sonic-net/sonic-sairedis

@sumanbrcm
Copy link
Contributor Author

/AzurePipelines run

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1136 in repo sonic-net/sonic-sairedis

@sumanbrcm
Copy link
Contributor Author

/AzurePipelines run Azure.sonic-sairedis

@sumanbrcm
Copy link
Contributor Author

/AzurePipelines run Azure.sonic-sairedis

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1136 in repo sonic-net/sonic-sairedis

@prsunny
Copy link
Contributor

prsunny commented Nov 4, 2022

Please try with "/azpw run"

@sumanbrcm
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sumanbrcm
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sumanbrcm
Copy link
Contributor Author

@prsunny @liat-grozovik @kcudnik
This is request for merge/related-approval.

@prsunny prsunny merged commit 2f2698c into sonic-net:master Nov 10, 2022
richardyu-ms pushed a commit to richardyu-ms/sonic-sairedis that referenced this pull request Nov 15, 2022
* Fixing issue #11621
* The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue #2193 the cleanup for stale rif counters was added.
* But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them.
* So as a fix all such cleanup has been moved to syncd.

Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
richardyu-ms pushed a commit that referenced this pull request Nov 16, 2022
* Fixing issue #11621
* The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue #2193 the cleanup for stale rif counters was added.
* But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them.
* So as a fix all such cleanup has been moved to syncd.

Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
pettershao-ragilenetworks pushed a commit to pettershao-ragilenetworks/sonic-sairedis that referenced this pull request Nov 18, 2022
* Fixing issue #11621
* The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue #2193 the cleanup for stale rif counters was added.
* But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them.
* So as a fix all such cleanup has been moved to syncd.

Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
@stepanblyschak
Copy link
Contributor

@kcudnik @yxieca Could you please help to assign label "Request for 202205"? This fix is neccessary for 202205

@yxieca
Copy link
Contributor

yxieca commented Nov 29, 2022

@sumanbrcm this change cannot be cherry-picked cleanly, can you help enter an new PR for 202205 branch?

pettershao-ragilenetworks pushed a commit to pettershao-ragilenetworks/sonic-sairedis that referenced this pull request Nov 30, 2022
* Fixing issue #11621
* The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue #2193 the cleanup for stale rif counters was added.
* But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them.
* So as a fix all such cleanup has been moved to syncd.

Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
pettershao-ragilenetworks pushed a commit to pettershao-ragilenetworks/sonic-sairedis that referenced this pull request Nov 30, 2022
* Fixing issue #11621
* The cleanup code for stale rif counters are now moved to syncd . Earlier as part of fix for issue #2193 the cleanup for stale rif counters was added.
* But it could create a race condition between orchagent removes RIF rate counters from DB and lua script fetching them.
* So as a fix all such cleanup has been moved to syncd.

Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
@yxieca
Copy link
Contributor

yxieca commented Feb 2, 2023

@adyeung can you help ping the author? This change cannot be cherry-picked cleanly to 202205 branch. a separate PR is needed.

@adyeung
Copy link

adyeung commented Feb 2, 2023

Requested submitter @sumanbrcm to prioritize next wk

sumanbrcm added a commit to sumanbrcm/sonic-sairedis that referenced this pull request Feb 6, 2023
Changes for 202205 branch

Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
@sumanbrcm
Copy link
Contributor Author

@yxieca , @adyeung
The PR for 202205 branch : -
#1202

sumanbrcm added a commit to sumanbrcm/sonic-sairedis that referenced this pull request Feb 7, 2023
Changes for 202205 branch

Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
sumanbrcm added a commit to sumanbrcm/sonic-sairedis that referenced this pull request Feb 7, 2023
Changes for 202205 branch

Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
yxieca pushed a commit that referenced this pull request Feb 9, 2023
Changes for 202205 branch

Fixing issue #11621
All the details of the fix in master branch is in the PR : -
sairedis: Fixing race condition for rif counters #1136
swss: Fixing race condition for rif counters sonic-swss#2488
This change is already merged to master branch , but sonic-sairedis change (sairedis: Fixing race condition for rif counters #1136) could not be cherry picked to 202205 branch as the base file FlexCounter.cpp differs in both the branches. Also, the API removeDataFromCountersDB is not available in 202205 branch for the cleanup. Hence the current fix takes care of cleaning up the stale rif counters issue through newly added API cleanUpRifFromCounterDb .
Unit Tests:-
The steps followed are same as in sonic-net/sonic-buildimage#11621

Create RIF in SONiC, wait till RIF rates are populated in COUNTERS DB
Remove RIF
Repeat the steps multiple times and check if any error syslog is seen (No error syslog is seen)
Also checked cleanup for rif counters.
After RIF creation derived info of oid for RIF from "COUNTERS_RIF_NAME_MAP"
127) "Vlan100"
128) "oid:0x6000000000aa5"

Checked all the tabled in COUNTER_DB which has same OID in keys
127.0.0.1:6379[2]> keys 6000000000aa5

"RATES:oid:0x6000000000aa5:RIF"
"COUNTERS:oid:0x6000000000aa5"
"RATES:oid:0x6000000000aa5"
127.0.0.1:6379[2]>
Deleted the RIF by removing the ip on the intf.

Checked COUNTER_DB again with same OID if there are stale entries or not. No stale entries exist now.
127.0.0.1:6379[2]> keys 6000000000aa5
(empty array)
127.0.0.1:6379[2]>
Signed-off-by: Suman Kumar suman.kumar@broadcom.com

Signed-off-by: Suman Kumar <suman.kumar@broadcom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants