-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[route_check.py] - update includes checks on subscriptions #1344
[route_check.py] - update includes checks on subscriptions #1344
Conversation
…commodate the possible latency between APPL & ASIC DB. Switched to swsscommon & python3 Added unit tests
With the following diff, this can be cherry picked onto 201911.
|
retest this please |
|
Please note that many of the PRs under sonic-utilities are failing on the 'default' and it seems to be general issue with the build system. |
2) Read APPL-DB & ASIC-DB | ||
3) Get the diff. | ||
4) If any diff, | ||
4.1) Collect subscribe messages for a second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should you verify it persist for 3 consecutive reads instead of 1 read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is another scheme. 3 consecutive reads and look for diff that is consistently same across all to imply that as possible failures. But this has a hole if a a route was getting removed & added quickly multiple times, then it could be a false positive.
Here we look for SET/DEL in subscription messages to see, if a missed ASIC entry matches with a SET subscribe message, which would have added to ASIC. Similarly look for DEL message for an unaccounted ASIC entry as that would have removed it.
No logical code change.
retest this please |
1 similar comment
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.. minor comments..
scripts/route_check.py
Outdated
@@ -45,6 +94,12 @@ def set_level(lvl, log_to_syslog): | |||
|
|||
|
|||
def print_message(lvl, *args): | |||
""" | |||
print and log the message for give level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given?
print_message(syslog.LOG_ERR, "Failed. Look at reported mismatches above") | ||
return -1 | ||
print_message(syslog.LOG_ERR, "add: {", json.dumps(adds, indent=4), "}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking, do we need this add/del json dumps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When failure is reported, the log of add/del during that one sec period, will be very useful for looking in-depth.
- [route_check.py] - update includes checks on subscriptions (sonic-net/sonic-utilities#1344) - Validations checks while adding a member to PortChannel and removing a member from a Portchannel (sonic-net/sonic-utilities#1328) - [show] Add subcommand to show midplane status for modular chassis (sonic-net/sonic-utilities#1267) - [pytest][qos][config] Added pytests for "config qos reload" commands" (sonic-net/sonic-utilities#1346) - Drop explict 3 seconds pause between two object updates/deletes. (sonic-net/sonic-utilities#1359) - [show]fix for show muxcable status by replacing "hostname" to "peer_switch" for deriving tor ipv4_address (sonic-net/sonic-utilities#1360) - [PFCWD] Fix 'start' pfcwd command (sonic-net/sonic-utilities#1345) Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
- [route_check.py] - update includes checks on subscriptions (sonic-net/sonic-utilities#1344) - Validations checks while adding a member to PortChannel and removing a member from a Portchannel (sonic-net/sonic-utilities#1328) - [show] Add subcommand to show midplane status for modular chassis (sonic-net/sonic-utilities#1267) - [pytest][qos][config] Added pytests for "config qos reload" commands" (sonic-net/sonic-utilities#1346) - Drop explict 3 seconds pause between two object updates/deletes. (sonic-net/sonic-utilities#1359) - [show]fix for show muxcable status by replacing "hostname" to "peer_switch" for deriving tor ipv4_address (sonic-net/sonic-utilities#1360) - [PFCWD] Fix 'start' pfcwd command (sonic-net/sonic-utilities#1345) Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
@renukamanavalan Please create PR for 201911. Cherry-pick has conflcit. |
@renukamanavalan please create pr for 201911 |
…#1344) Summary: Improve the tool to handle the possible latency between APPL-DB & ASIC-DB by looking at subscription messages. - What I did The routes flow from APPL-DB to ASIC-DB, via orchagent. This tool's job is to verify that all routes added to APPL-DB do get into ASIC-DB and all routes removed from APPL-DB are deleted from ASIC-DB. - How I did it NOTE: The flow from APPL-DB to ASIC-DB takes non zero milliseconds. 1) Initiate subscribe for ASIC-DB updates. 2) Read APPL-DB & ASIC-DB 3) Get the diff. 4) If any diff, 4.1) Collect subscribe messages for a second 4.2) check diff against the subscribe messages 5) Rule out local interfaces & default routes 6) If still outstanding diffs, report failure. - How to verify it Run this tool in SONiC switch and watch the result. In case of failure checkout the result to validate the failure. To simulate failure: Stop Orchagent. Run this tool, and likely you would see some failures. You could potentially remove / add routes in APPL / ASIC DBs with orchagent down to ensure failure. Analyze the reported failures to match expected. You may use the exit code to verify the result as success or not.
…#1344) Summary: Improve the tool to handle the possible latency between APPL-DB & ASIC-DB by looking at subscription messages. - What I did The routes flow from APPL-DB to ASIC-DB, via orchagent. This tool's job is to verify that all routes added to APPL-DB do get into ASIC-DB and all routes removed from APPL-DB are deleted from ASIC-DB. - How I did it NOTE: The flow from APPL-DB to ASIC-DB takes non zero milliseconds. 1) Initiate subscribe for ASIC-DB updates. 2) Read APPL-DB & ASIC-DB 3) Get the diff. 4) If any diff, 4.1) Collect subscribe messages for a second 4.2) check diff against the subscribe messages 5) Rule out local interfaces & default routes 6) If still outstanding diffs, report failure. - How to verify it Run this tool in SONiC switch and watch the result. In case of failure checkout the result to validate the failure. To simulate failure: Stop Orchagent. Run this tool, and likely you would see some failures. You could potentially remove / add routes in APPL / ASIC DBs with orchagent down to ensure failure. Analyze the reported failures to match expected. You may use the exit code to verify the result as success or not.
- What I did
The routes flow from APPL-DB to ASIC-DB, via orchagent.
This tool's job is to verify that all routes added to APPL-DB do
get into ASIC-DB and all routes removed from APPL-DB are deleted from ASIC-DB.
- How I did it
NOTE: The flow from APPL-DB to ASIC-DB takes non zero milliseconds.
1) Initiate subscribe for ASIC-DB updates.
2) Read APPL-DB & ASIC-DB
3) Get the diff.
4) If any diff,
4.1) Collect subscribe messages for a second
4.2) check diff against the subscribe messages
5) Rule out local interfaces & default routes
6) If still outstanding diffs, report failure.
- How to verify it
Run this tool in SONiC switch and watch the result. In case of failure
checkout the result to validate the failure.
To simulate failure:
Stop Orchagent.
Run this tool, and likely you would see some failures.
You could potentially remove / add routes in APPL / ASIC DBs with orchagent
down to ensure failure.
Analyze the reported failures to match expected.
You may use the exit code to verify the result as success or not.