-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk #2086
Merged
qiluo-msft
merged 3 commits into
sonic-net:master
from
stephenxs:fix-route-bulk-del-set
Dec 22, 2021
Merged
[Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk #2086
qiluo-msft
merged 3 commits into
sonic-net:master
from
stephenxs:fix-route-bulk-del-set
Dec 22, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Originally, the route entry SET operation can be ignored if it follows a DEL operation with the same prefix and nexthop. For example, the following operations ``` 2021-12-18.06:17:54.793234|ROUTE_TABLE:1.1.1.0/24|SET|nexthop:10.0.0.33|ifname:Ethernet64 2021-12-18.06:19:53.187294|ROUTE_TABLE:1.1.1.0/24|DEL 2021-12-18.06:20:10.279850|ROUTE_TABLE:1.1.1.0/24|SET|nexthop:10.0.0.33|ifname:Ethernet64 ``` result in the next sai operations ``` 2021-12-18.06:19:31.203092|C|SAI_OBJECT_TYPE_ROUTE_ENTRY||{"dest":"1.1.1.0/24","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000010"}|SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000aae 2021-12-18.06:20:23.813475|R|SAI_OBJECT_TYPE_ROUTE_ENTRY||{"dest":"1.1.1.0/24","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000010"} ``` and the last SET is lost. This is because: Both DEL and the 2nd SET operations were bundled into bulk operations. There is a logic which is not to program ASIC if the next hop information in the route entry update isn't changed, which is done by comparing the next hop information between route entry update and m_syncdRoutes. However, in bulk mode, a route entry that will be removed will not be removed from m_syncdRoutes until the bulk is flushed, The bulk has not been flushed when the 2nd SET is handled, which means the comparison returns "not changed" and the 2nd SET operation is ignored. The fix is to check whether the prefix is in gRouteBulker.removing_entries and treat it as non-exist if yes. Signed-off-by: Stephen Sun <stephens@nvidia.com>
This is because: 1. packet forward action is initialized when route entry is created 2. the default route will be set to DROP if application asks for removing it 3. if the default route is readded later in the same bulk session, it doesn't update packet forward action but provides a valid nexthop 4. SAI complains if both DROP and a valid next hop are provided Solution: Always set packet action for default route Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
5 tasks
stephenxs
requested review from
prsunny,
qiluo-msft and
shi-su
and removed request for
prsunny
December 20, 2021 10:52
/azpw run |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azpw run |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azpw run |
/AzurePipelines run |
Azure Pipelines successfully started running 1 pipeline(s). |
qiluo-msft
reviewed
Dec 21, 2021
@shi-su Could you help review? |
shi-su
reviewed
Dec 21, 2021
@shi-su @qiluo-msft can we merge this pr if no other concern? We need it to fix azure pipeline checker failures. Thanks |
shi-su
approved these changes
Dec 22, 2021
qiluo-msft
approved these changes
Dec 22, 2021
Do we need to backport this PR to old branches? |
I suggest doing so as it is a production issue. We need #2071 first |
liat-grozovik
pushed a commit
to sonic-net/sonic-buildimage
that referenced
this pull request
Dec 28, 2021
691c37b [Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk (sonic-net/sonic-swss#2086) a4c80c3 patch for issue sonic-net/sonic-swss#1971 - enable Rx Drop handling for cisco-8000 (sonic-net/sonic-swss#2041) 71751d1 [macsec] Support setting IPG by gearbox_config.json (sonic-net/sonic-swss#2051) 5d5c169 [bulk mode] Fix bulk conflict when in case there are both remove and set operations (sonic-net/sonic-swss#2071) 8bbdbd2 Fix SRV6 NHOP CRM object type (sonic-net/sonic-swss#2072) ef5b35f [vstest] VS test failure fix after fabric port orch PR merge (sonic-net/sonic-swss#1811) 89ea538 Supply the missing ingress/egress port profile list in document (sonic-net/sonic-swss#2064) 8123437 [pfc_detect] fix RedisReply errors (sonic-net/sonic-swss#2040) b38f527 [swss][CRM][MPLS] MPLS CRM Nexthop - switch back to using SAI OBJECT rather than SWITCH OBJECT ae061e5 create debug_shell_enable config to enable debug shell (sonic-net/sonic-swss#2060) 45e446d [cbf] Fix max FC value (sonic-net/sonic-swss#2049) b1b5b29 Initial p4orch pytest code. (sonic-net/sonic-swss#2054) d352d5a Update default route status to state DB (sonic-net/sonic-swss#2009) 24a64d6 Orchagent: Integrate P4Orch (sonic-net/sonic-swss#2029) 15a3b6c Delete the IPv6 link-local Neighbor when ipv6 link-local mode is disabled (sonic-net/sonic-swss#1897) ed783e1 [orchagent] Add trap flow counter support (sonic-net/sonic-swss#1951) e9b05a3 [vnetorch] ECMP for vnet tunnel routes with endpoint health monitor (sonic-net/sonic-swss#1955) bcb7d61 P4Orch: inital add of source (sonic-net/sonic-swss#1997) f6f6f86 [mclaglink] fix acl out ports (sonic-net/sonic-swss#2026) fd887bf [Reclaim buffer] Reclaim unused buffer for dynamic buffer model (sonic-net/sonic-swss#1910) 9258978 [orchagent, cfgmgr] Add response publisher and state recording (sonic-net/sonic-swss#1992) 3d862a7 Fixing subport vs test script for subport under VNET (sonic-net/sonic-swss#2048) fb0a5fd Don't handle buffer pool watermark during warm reboot reconciling (sonic-net/sonic-swss#1987) 16d4bcd Routed subinterface enhancements (sonic-net/sonic-swss#1907) 9639db7 [vstest/subintf] Add vs test to validate sub interface ingress to a vnet (sonic-net/sonic-swss#1642) Signed-off-by: Stephen Sun stephens@nvidia.com
stephenxs
added a commit
to stephenxs/sonic-buildimage
that referenced
this pull request
Jan 6, 2022
691c37b [Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk (sonic-net/sonic-swss#2086) a4c80c3 patch for issue sonic-net/sonic-swss#1971 - enable Rx Drop handling for cisco-8000 (sonic-net/sonic-swss#2041) 71751d1 [macsec] Support setting IPG by gearbox_config.json (sonic-net/sonic-swss#2051) 5d5c169 [bulk mode] Fix bulk conflict when in case there are both remove and set operations (sonic-net/sonic-swss#2071) 8bbdbd2 Fix SRV6 NHOP CRM object type (sonic-net/sonic-swss#2072) ef5b35f [vstest] VS test failure fix after fabric port orch PR merge (sonic-net/sonic-swss#1811) 89ea538 Supply the missing ingress/egress port profile list in document (sonic-net/sonic-swss#2064) 8123437 [pfc_detect] fix RedisReply errors (sonic-net/sonic-swss#2040) b38f527 [swss][CRM][MPLS] MPLS CRM Nexthop - switch back to using SAI OBJECT rather than SWITCH OBJECT ae061e5 create debug_shell_enable config to enable debug shell (sonic-net/sonic-swss#2060) 45e446d [cbf] Fix max FC value (sonic-net/sonic-swss#2049) b1b5b29 Initial p4orch pytest code. (sonic-net/sonic-swss#2054) d352d5a Update default route status to state DB (sonic-net/sonic-swss#2009) 24a64d6 Orchagent: Integrate P4Orch (sonic-net/sonic-swss#2029) 15a3b6c Delete the IPv6 link-local Neighbor when ipv6 link-local mode is disabled (sonic-net/sonic-swss#1897) ed783e1 [orchagent] Add trap flow counter support (sonic-net/sonic-swss#1951) e9b05a3 [vnetorch] ECMP for vnet tunnel routes with endpoint health monitor (sonic-net/sonic-swss#1955) bcb7d61 P4Orch: inital add of source (sonic-net/sonic-swss#1997) f6f6f86 [mclaglink] fix acl out ports (sonic-net/sonic-swss#2026) fd887bf [Reclaim buffer] Reclaim unused buffer for dynamic buffer model (sonic-net/sonic-swss#1910) 9258978 [orchagent, cfgmgr] Add response publisher and state recording (sonic-net/sonic-swss#1992) 3d862a7 Fixing subport vs test script for subport under VNET (sonic-net/sonic-swss#2048) fb0a5fd Don't handle buffer pool watermark during warm reboot reconciling (sonic-net/sonic-swss#1987) 16d4bcd Routed subinterface enhancements (sonic-net/sonic-swss#1907) 9639db7 [vstest/subintf] Add vs test to validate sub interface ingress to a vnet (sonic-net/sonic-swss#1642) Signed-off-by: Stephen Sun stephens@nvidia.com
5 tasks
judyjoseph
pushed a commit
to sonic-net/sonic-buildimage
that referenced
this pull request
Jan 6, 2022
691c37b [Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk (sonic-net/sonic-swss#2086) a4c80c3 patch for issue sonic-net/sonic-swss#1971 - enable Rx Drop handling for cisco-8000 (sonic-net/sonic-swss#2041) 71751d1 [macsec] Support setting IPG by gearbox_config.json (sonic-net/sonic-swss#2051) 5d5c169 [bulk mode] Fix bulk conflict when in case there are both remove and set operations (sonic-net/sonic-swss#2071) 8bbdbd2 Fix SRV6 NHOP CRM object type (sonic-net/sonic-swss#2072) ef5b35f [vstest] VS test failure fix after fabric port orch PR merge (sonic-net/sonic-swss#1811) 89ea538 Supply the missing ingress/egress port profile list in document (sonic-net/sonic-swss#2064) 8123437 [pfc_detect] fix RedisReply errors (sonic-net/sonic-swss#2040) b38f527 [swss][CRM][MPLS] MPLS CRM Nexthop - switch back to using SAI OBJECT rather than SWITCH OBJECT ae061e5 create debug_shell_enable config to enable debug shell (sonic-net/sonic-swss#2060) 45e446d [cbf] Fix max FC value (sonic-net/sonic-swss#2049) b1b5b29 Initial p4orch pytest code. (sonic-net/sonic-swss#2054) d352d5a Update default route status to state DB (sonic-net/sonic-swss#2009) 24a64d6 Orchagent: Integrate P4Orch (sonic-net/sonic-swss#2029) 15a3b6c Delete the IPv6 link-local Neighbor when ipv6 link-local mode is disabled (sonic-net/sonic-swss#1897) ed783e1 [orchagent] Add trap flow counter support (sonic-net/sonic-swss#1951) e9b05a3 [vnetorch] ECMP for vnet tunnel routes with endpoint health monitor (sonic-net/sonic-swss#1955) bcb7d61 P4Orch: inital add of source (sonic-net/sonic-swss#1997) f6f6f86 [mclaglink] fix acl out ports (sonic-net/sonic-swss#2026) fd887bf [Reclaim buffer] Reclaim unused buffer for dynamic buffer model (sonic-net/sonic-swss#1910) 9258978 [orchagent, cfgmgr] Add response publisher and state recording (sonic-net/sonic-swss#1992) 3d862a7 Fixing subport vs test script for subport under VNET (sonic-net/sonic-swss#2048) fb0a5fd Don't handle buffer pool watermark during warm reboot reconciling (sonic-net/sonic-swss#1987) 16d4bcd Routed subinterface enhancements (sonic-net/sonic-swss#1907) 9639db7 [vstest/subintf] Add vs test to validate sub interface ingress to a vnet (sonic-net/sonic-swss#1642) Signed-off-by: Stephen Sun stephens@nvidia.com
preetham-singh
pushed a commit
to preetham-singh/sonic-swss
that referenced
this pull request
Aug 6, 2022
… in the same bulk (sonic-net#2086) **What I did** Fix 2 issues in route bulk operation in the scenario where a SET operation follows a DEL operation on the same prefix. 1. The route entry SET operation can be ignored if it follows a DEL operation with the same prefix and next-hop. 2. DEL and then SET default route results in orchagent aborted Added test case to cover the scenarios **How I verified it** Manually test Unit test **Details if related** ***The route entry SET operation can be ignored if it follows a DEL operation with the same prefix and next-hop.*** For example, the following operations ``` 2021-12-18.06:17:54.793234|ROUTE_TABLE:1.1.1.0/24|SET|nexthop:10.0.0.33|ifname:Ethernet64 2021-12-18.06:19:53.187294|ROUTE_TABLE:1.1.1.0/24|DEL 2021-12-18.06:20:10.279850|ROUTE_TABLE:1.1.1.0/24|SET|nexthop:10.0.0.33|ifname:Ethernet64 ``` result in the next sai operations ``` 2021-12-18.06:19:31.203092|C|SAI_OBJECT_TYPE_ROUTE_ENTRY||{"dest":"1.1.1.0/24","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000010"}|SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000aae 2021-12-18.06:20:23.813475|R|SAI_OBJECT_TYPE_ROUTE_ENTRY||{"dest":"1.1.1.0/24","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000010"} ``` and the last SET is lost. This is because: Both DEL and the 2nd SET operations were bundled into bulk operations. There is a logic that is not to program ASIC if the next-hop information in the route entry update isn't changed, which is done by comparing the next-hop information between route entry update and m_syncdRoutes. However, in bulk mode, a route entry that will be removed will not be removed from m_syncdRoutes until the bulk is flushed, The bulk has not been flushed when the 2nd SET is handled, which means the comparison returns "not changed" and the 2nd SET operation is ignored. The fix is to check whether the prefix is in gRouteBulker.removing_entries and treat it as non-exist if yes. ***DEL and then SET default route results in orchagent aborted*** This is because: 1. packet forward action is initialized when route entry is created 2. the default route will be set to DROP if the application asks for removing it 3. if the default route is readded later in the same bulk session, it doesn't update packet forward action but provides a valid next-hop 4. SAI complains if both DROP and a valid next hop are provided Solution: Always set packet action for the default route
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What I did
Fix 2 issues in route bulk operation in the scenario where a SET operation follows a DEL operation on the same prefix.
Added test case to cover the scenarios
Signed-off-by: Stephen Sun stephens@nvidia.com
Why I did it
How I verified it
Manually test
Unit test
Details if related
The route entry SET operation can be ignored if it follows a DEL operation with the same prefix and next-hop.
For example, the following operations
result in the next sai operations
and the last SET is lost.
This is because:
Both DEL and the 2nd SET operations were bundled into bulk operations.
There is a logic that is not to program ASIC if the next-hop information in the route entry update isn't changed,
which is done by comparing the next-hop information between route entry update and m_syncdRoutes.
However, in bulk mode, a route entry that will be removed will not be removed from m_syncdRoutes until the bulk is flushed,
The bulk has not been flushed when the 2nd SET is handled,
which means the comparison returns "not changed" and the 2nd SET operation is ignored.
The fix is to check whether the prefix is in gRouteBulker.removing_entries and treat it as non-exist if yes.
DEL and then SET default route results in orchagent aborted
This is because:
it doesn't update packet forward action but provides a valid next-hop
Solution:
Always set packet action for the default route