Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk #2086

Merged
merged 3 commits into from
Dec 22, 2021

Conversation

stephenxs
Copy link
Collaborator

@stephenxs stephenxs commented Dec 20, 2021

What I did

Fix 2 issues in route bulk operation in the scenario where a SET operation follows a DEL operation on the same prefix.

  1. The route entry SET operation can be ignored if it follows a DEL operation with the same prefix and next-hop.
  2. DEL and then SET default route results in orchagent aborted

Added test case to cover the scenarios

Signed-off-by: Stephen Sun stephens@nvidia.com

Why I did it

How I verified it

Manually test
Unit test

Details if related
The route entry SET operation can be ignored if it follows a DEL operation with the same prefix and next-hop.
For example, the following operations

2021-12-18.06:17:54.793234|ROUTE_TABLE:1.1.1.0/24|SET|nexthop:10.0.0.33|ifname:Ethernet64
2021-12-18.06:19:53.187294|ROUTE_TABLE:1.1.1.0/24|DEL
2021-12-18.06:20:10.279850|ROUTE_TABLE:1.1.1.0/24|SET|nexthop:10.0.0.33|ifname:Ethernet64

result in the next sai operations

2021-12-18.06:19:31.203092|C|SAI_OBJECT_TYPE_ROUTE_ENTRY||{"dest":"1.1.1.0/24","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000010"}|SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000aae
2021-12-18.06:20:23.813475|R|SAI_OBJECT_TYPE_ROUTE_ENTRY||{"dest":"1.1.1.0/24","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000010"}

and the last SET is lost.

This is because:
Both DEL and the 2nd SET operations were bundled into bulk operations.
There is a logic that is not to program ASIC if the next-hop information in the route entry update isn't changed,
which is done by comparing the next-hop information between route entry update and m_syncdRoutes.
However, in bulk mode, a route entry that will be removed will not be removed from m_syncdRoutes until the bulk is flushed,
The bulk has not been flushed when the 2nd SET is handled,
which means the comparison returns "not changed" and the 2nd SET operation is ignored.

The fix is to check whether the prefix is in gRouteBulker.removing_entries and treat it as non-exist if yes.

DEL and then SET default route results in orchagent aborted

This is because:

  1. packet forward action is initialized when route entry is created
  2. the default route will be set to DROP if the application asks for removing it
  3. if the default route is readded later in the same bulk session,
    it doesn't update packet forward action but provides a valid next-hop
  4. SAI complains if both DROP and a valid next hop are provided

Solution:
Always set packet action for the default route

Originally, the route entry SET operation can be ignored if it follows a DEL operation with the same prefix and nexthop.
For example, the following operations
```
2021-12-18.06:17:54.793234|ROUTE_TABLE:1.1.1.0/24|SET|nexthop:10.0.0.33|ifname:Ethernet64
2021-12-18.06:19:53.187294|ROUTE_TABLE:1.1.1.0/24|DEL
2021-12-18.06:20:10.279850|ROUTE_TABLE:1.1.1.0/24|SET|nexthop:10.0.0.33|ifname:Ethernet64
```
result in the next sai operations
```
2021-12-18.06:19:31.203092|C|SAI_OBJECT_TYPE_ROUTE_ENTRY||{"dest":"1.1.1.0/24","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000010"}|SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000aae
2021-12-18.06:20:23.813475|R|SAI_OBJECT_TYPE_ROUTE_ENTRY||{"dest":"1.1.1.0/24","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000010"}
```
and the last SET is lost.

This is because:
Both DEL and the 2nd SET operations were bundled into bulk operations.
There is a logic which is not to program ASIC if the next hop information in the route entry update isn't changed,
which is done by comparing the next hop information between route entry update and m_syncdRoutes.
However, in bulk mode, a route entry that will be removed will not be removed from m_syncdRoutes until the bulk is flushed,
The bulk has not been flushed when the 2nd SET is handled,
which means the comparison returns "not changed" and the 2nd SET operation is ignored.

The fix is to check whether the prefix is in gRouteBulker.removing_entries and treat it as non-exist if yes.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
This is because:
1. packet forward action is initialized when route entry is created
2. the default route will be set to DROP if application asks for removing it
3. if the default route is readded later in the same bulk session,
   it doesn't update packet forward action but provides a valid nexthop
4. SAI complains if both DROP and a valid next hop are provided
Solution:
Always set packet action for default route

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs stephenxs requested review from prsunny, qiluo-msft and shi-su and removed request for prsunny December 20, 2021 10:52
@stephenxs stephenxs marked this pull request as ready for review December 20, 2021 10:55
@stephenxs
Copy link
Collaborator Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs
Copy link
Collaborator Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs
Copy link
Collaborator Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@qiluo-msft
Copy link
Contributor

@shi-su Could you help review?

orchagent/routeorch.cpp Show resolved Hide resolved
orchagent/routeorch.cpp Show resolved Hide resolved
@stephenxs
Copy link
Collaborator Author

@shi-su @qiluo-msft can we merge this pr if no other concern? We need it to fix azure pipeline checker failures. Thanks

@qiluo-msft qiluo-msft merged commit 691c37b into sonic-net:master Dec 22, 2021
@qiluo-msft
Copy link
Contributor

Do we need to backport this PR to old branches?

@stephenxs stephenxs deleted the fix-route-bulk-del-set branch December 22, 2021 23:08
@stephenxs
Copy link
Collaborator Author

Do we need to backport this PR to old branches?

I suggest doing so as it is a production issue. We need #2071 first

liat-grozovik pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Dec 28, 2021
691c37b [Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk (sonic-net/sonic-swss#2086)
a4c80c3 patch for issue sonic-net/sonic-swss#1971 - enable Rx Drop handling for cisco-8000 (sonic-net/sonic-swss#2041)
71751d1 [macsec] Support setting IPG by gearbox_config.json (sonic-net/sonic-swss#2051)
5d5c169 [bulk mode] Fix bulk conflict when in case there are both remove and set operations (sonic-net/sonic-swss#2071)
8bbdbd2 Fix SRV6 NHOP CRM object type (sonic-net/sonic-swss#2072)
ef5b35f [vstest] VS test failure fix after fabric port orch PR merge (sonic-net/sonic-swss#1811)
89ea538 Supply the missing ingress/egress port profile list in document (sonic-net/sonic-swss#2064)
8123437 [pfc_detect] fix RedisReply errors (sonic-net/sonic-swss#2040)
b38f527 [swss][CRM][MPLS] MPLS CRM Nexthop - switch back to using SAI OBJECT rather than SWITCH OBJECT
ae061e5 create debug_shell_enable config to enable debug shell (sonic-net/sonic-swss#2060)
45e446d [cbf] Fix max FC value (sonic-net/sonic-swss#2049)
b1b5b29 Initial p4orch pytest code. (sonic-net/sonic-swss#2054)
d352d5a Update default route status to state DB (sonic-net/sonic-swss#2009)
24a64d6 Orchagent: Integrate P4Orch (sonic-net/sonic-swss#2029)
15a3b6c Delete the IPv6 link-local Neighbor when ipv6 link-local mode is disabled (sonic-net/sonic-swss#1897)
ed783e1 [orchagent] Add trap flow counter support (sonic-net/sonic-swss#1951)
e9b05a3 [vnetorch] ECMP for vnet tunnel routes with endpoint health monitor (sonic-net/sonic-swss#1955)
bcb7d61 P4Orch: inital add of source (sonic-net/sonic-swss#1997)
f6f6f86 [mclaglink] fix acl out ports (sonic-net/sonic-swss#2026)
fd887bf [Reclaim buffer] Reclaim unused buffer for dynamic buffer model (sonic-net/sonic-swss#1910)
9258978 [orchagent, cfgmgr] Add response publisher and state recording (sonic-net/sonic-swss#1992)
3d862a7 Fixing subport vs test script for subport under VNET (sonic-net/sonic-swss#2048)
fb0a5fd Don't handle buffer pool watermark during warm reboot reconciling (sonic-net/sonic-swss#1987)
16d4bcd Routed subinterface enhancements (sonic-net/sonic-swss#1907)
9639db7 [vstest/subintf] Add vs test to validate sub interface ingress to a vnet (sonic-net/sonic-swss#1642)

Signed-off-by: Stephen Sun stephens@nvidia.com
stephenxs added a commit to stephenxs/sonic-buildimage that referenced this pull request Jan 6, 2022
691c37b [Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk (sonic-net/sonic-swss#2086)
a4c80c3 patch for issue sonic-net/sonic-swss#1971 - enable Rx Drop handling for cisco-8000 (sonic-net/sonic-swss#2041)
71751d1 [macsec] Support setting IPG by gearbox_config.json (sonic-net/sonic-swss#2051)
5d5c169 [bulk mode] Fix bulk conflict when in case there are both remove and set operations (sonic-net/sonic-swss#2071)
8bbdbd2 Fix SRV6 NHOP CRM object type (sonic-net/sonic-swss#2072)
ef5b35f [vstest] VS test failure fix after fabric port orch PR merge (sonic-net/sonic-swss#1811)
89ea538 Supply the missing ingress/egress port profile list in document (sonic-net/sonic-swss#2064)
8123437 [pfc_detect] fix RedisReply errors (sonic-net/sonic-swss#2040)
b38f527 [swss][CRM][MPLS] MPLS CRM Nexthop - switch back to using SAI OBJECT rather than SWITCH OBJECT
ae061e5 create debug_shell_enable config to enable debug shell (sonic-net/sonic-swss#2060)
45e446d [cbf] Fix max FC value (sonic-net/sonic-swss#2049)
b1b5b29 Initial p4orch pytest code. (sonic-net/sonic-swss#2054)
d352d5a Update default route status to state DB (sonic-net/sonic-swss#2009)
24a64d6 Orchagent: Integrate P4Orch (sonic-net/sonic-swss#2029)
15a3b6c Delete the IPv6 link-local Neighbor when ipv6 link-local mode is disabled (sonic-net/sonic-swss#1897)
ed783e1 [orchagent] Add trap flow counter support (sonic-net/sonic-swss#1951)
e9b05a3 [vnetorch] ECMP for vnet tunnel routes with endpoint health monitor (sonic-net/sonic-swss#1955)
bcb7d61 P4Orch: inital add of source (sonic-net/sonic-swss#1997)
f6f6f86 [mclaglink] fix acl out ports (sonic-net/sonic-swss#2026)
fd887bf [Reclaim buffer] Reclaim unused buffer for dynamic buffer model (sonic-net/sonic-swss#1910)
9258978 [orchagent, cfgmgr] Add response publisher and state recording (sonic-net/sonic-swss#1992)
3d862a7 Fixing subport vs test script for subport under VNET (sonic-net/sonic-swss#2048)
fb0a5fd Don't handle buffer pool watermark during warm reboot reconciling (sonic-net/sonic-swss#1987)
16d4bcd Routed subinterface enhancements (sonic-net/sonic-swss#1907)
9639db7 [vstest/subintf] Add vs test to validate sub interface ingress to a vnet (sonic-net/sonic-swss#1642)

Signed-off-by: Stephen Sun stephens@nvidia.com
judyjoseph pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Jan 6, 2022
691c37b [Route bulk] Fix bugs in case a SET operation follows a DEL operation in the same bulk (sonic-net/sonic-swss#2086)
a4c80c3 patch for issue sonic-net/sonic-swss#1971 - enable Rx Drop handling for cisco-8000 (sonic-net/sonic-swss#2041)
71751d1 [macsec] Support setting IPG by gearbox_config.json (sonic-net/sonic-swss#2051)
5d5c169 [bulk mode] Fix bulk conflict when in case there are both remove and set operations (sonic-net/sonic-swss#2071)
8bbdbd2 Fix SRV6 NHOP CRM object type (sonic-net/sonic-swss#2072)
ef5b35f [vstest] VS test failure fix after fabric port orch PR merge (sonic-net/sonic-swss#1811)
89ea538 Supply the missing ingress/egress port profile list in document (sonic-net/sonic-swss#2064)
8123437 [pfc_detect] fix RedisReply errors (sonic-net/sonic-swss#2040)
b38f527 [swss][CRM][MPLS] MPLS CRM Nexthop - switch back to using SAI OBJECT rather than SWITCH OBJECT
ae061e5 create debug_shell_enable config to enable debug shell (sonic-net/sonic-swss#2060)
45e446d [cbf] Fix max FC value (sonic-net/sonic-swss#2049)
b1b5b29 Initial p4orch pytest code. (sonic-net/sonic-swss#2054)
d352d5a Update default route status to state DB (sonic-net/sonic-swss#2009)
24a64d6 Orchagent: Integrate P4Orch (sonic-net/sonic-swss#2029)
15a3b6c Delete the IPv6 link-local Neighbor when ipv6 link-local mode is disabled (sonic-net/sonic-swss#1897)
ed783e1 [orchagent] Add trap flow counter support (sonic-net/sonic-swss#1951)
e9b05a3 [vnetorch] ECMP for vnet tunnel routes with endpoint health monitor (sonic-net/sonic-swss#1955)
bcb7d61 P4Orch: inital add of source (sonic-net/sonic-swss#1997)
f6f6f86 [mclaglink] fix acl out ports (sonic-net/sonic-swss#2026)
fd887bf [Reclaim buffer] Reclaim unused buffer for dynamic buffer model (sonic-net/sonic-swss#1910)
9258978 [orchagent, cfgmgr] Add response publisher and state recording (sonic-net/sonic-swss#1992)
3d862a7 Fixing subport vs test script for subport under VNET (sonic-net/sonic-swss#2048)
fb0a5fd Don't handle buffer pool watermark during warm reboot reconciling (sonic-net/sonic-swss#1987)
16d4bcd Routed subinterface enhancements (sonic-net/sonic-swss#1907)
9639db7 [vstest/subintf] Add vs test to validate sub interface ingress to a vnet (sonic-net/sonic-swss#1642)

Signed-off-by: Stephen Sun stephens@nvidia.com
preetham-singh pushed a commit to preetham-singh/sonic-swss that referenced this pull request Aug 6, 2022
… in the same bulk (sonic-net#2086)

**What I did**

Fix 2 issues in route bulk operation in the scenario where a SET operation follows a DEL operation on the same prefix.
1. The route entry SET operation can be ignored if it follows a DEL operation with the same prefix and next-hop.
2. DEL and then SET default route results in orchagent aborted

Added test case to cover the scenarios

**How I verified it**

Manually test
Unit test

**Details if related**
***The route entry SET operation can be ignored if it follows a DEL operation with the same prefix and next-hop.***
For example, the following operations
```
2021-12-18.06:17:54.793234|ROUTE_TABLE:1.1.1.0/24|SET|nexthop:10.0.0.33|ifname:Ethernet64
2021-12-18.06:19:53.187294|ROUTE_TABLE:1.1.1.0/24|DEL
2021-12-18.06:20:10.279850|ROUTE_TABLE:1.1.1.0/24|SET|nexthop:10.0.0.33|ifname:Ethernet64
```
result in the next sai operations
```
2021-12-18.06:19:31.203092|C|SAI_OBJECT_TYPE_ROUTE_ENTRY||{"dest":"1.1.1.0/24","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000010"}|SAI_ROUTE_ENTRY_ATTR_NEXT_HOP_ID=oid:0x4000000000aae
2021-12-18.06:20:23.813475|R|SAI_OBJECT_TYPE_ROUTE_ENTRY||{"dest":"1.1.1.0/24","switch_id":"oid:0x21000000000000","vr":"oid:0x3000000000010"}
```
and the last SET is lost.

This is because:
Both DEL and the 2nd SET operations were bundled into bulk operations.
There is a logic that is not to program ASIC if the next-hop information in the route entry update isn't changed,
which is done by comparing the next-hop information between route entry update and m_syncdRoutes.
However, in bulk mode, a route entry that will be removed will not be removed from m_syncdRoutes until the bulk is flushed,
The bulk has not been flushed when the 2nd SET is handled,
which means the comparison returns "not changed" and the 2nd SET operation is ignored.

The fix is to check whether the prefix is in gRouteBulker.removing_entries and treat it as non-exist if yes.

***DEL and then SET default route results in orchagent aborted***

This is because:
1. packet forward action is initialized when route entry is created
2. the default route will be set to DROP if the application asks for removing it
3. if the default route is readded later in the same bulk session,
   it doesn't update packet forward action but provides a valid next-hop
4. SAI complains if both DROP and a valid next hop are provided

Solution:
Always set packet action for the default route
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants