Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EVPN]When EVPN NVO config arrives later than remote VNI entries, the remote entries don't get added #14949

Open
dgsudharsan opened this issue May 4, 2023 · 8 comments · Fixed by sonic-net/sonic-swss#2756
Assignees
Labels
BRCM Issue for 202305 Triaged this issue has been triaged

Comments

@dgsudharsan
Copy link
Collaborator

Description

Sometime during config reload, EVPN NVO table arrives later than remote VNI table entries. In such scenarios, remote vni entries are ignored and this leads to traffic loss.

2023-04-30.14:23:40.528130|VXLAN_TUNNEL_MAP_TABLE:vtep1:map_98_Vlan98|SET|vlan:Vlan98|vni:98
2023-04-30.14:23:40.559594|VXLAN_TUNNEL_MAP_TABLE:vtep1:map_99_Vlan99|SET|vlan:Vlan99|vni:99
2023-04-30.14:23:40.572133|VXLAN_REMOTE_VNI_TABLE:Vlan98:1.1.1.1|SET|vni:98
2023-04-30.14:23:40.572180|VXLAN_REMOTE_VNI_TABLE:Vlan98:1.1.1.2|SET|vni:98
2023-04-30.14:23:40.575208|VXLAN_FDB_TABLE:Vlan98:04:3f:72:f7:2d:52|SET|remote_vtep:1.1.1.2|type:dynamic|vni:98
2023-04-30.14:23:40.575240|VXLAN_FDB_TABLE:Vlan98:0c:42:a1:6d:5b:94|SET|remote_vtep:1.1.1.1|type:dynamic|vni:98
2023-04-30.14:23:40.575249|VXLAN_FDB_TABLE:Vlan98:1c:34:da:2c:be:00|SET|remote_vtep:1.1.1.1|type:dynamic|vni:98
2023-04-30.14:23:40.575257|VXLAN_FDB_TABLE:Vlan98:1c:34:da:2c:ca:00|SET|remote_vtep:1.1.1.2|type:dynamic|vni:98
2023-04-30.14:23:40.589995|VXLAN_EVPN_NVO_TABLE:nvo1|SET|source_vtep:vtep1

Steps to reproduce the issue:

  1. Configure EVPN
  2. Perform config reload

Describe the results you received:

Remote entries are not added leading to traffic loss

Describe the results you expected:

No issues

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@dgsudharsan
Copy link
Collaborator Author

The fix added breaks the previously added workaround sonic-net/sonic-swss#2626. Hence requesting to revert the fix.
Once we find a proper solution for #12361 we need to reintegrate sonic-net/sonic-swss#2756

@arlakshm arlakshm added Triaged this issue has been triaged BRCM labels May 24, 2023
@adyeung
Copy link
Collaborator

adyeung commented May 25, 2023

@srj102 pls help take a look and share your analysis

@adyeung adyeung assigned srj102 and unassigned adyeung May 25, 2023
@srj102
Copy link
Contributor

srj102 commented Jun 5, 2023

From the Techsupport added in #12361 it looks like VXLAN_EVPN_NVO was not configured leading to the OA not processing the VXLAN_REMOTE_VNI table APP DB entries.

Before the workaround for swss#2626, the case of EVPN_NVO coming later would have been handled via the following check..
"
if (!tunnel_orch->getTunnelPort(remote_vtep,tunnelPort))
{
SWSS_LOG_WARN("Vxlan tunnelPort doesn't exist: %s", remote_vtep.c_str());
return false;
}
"

However with the workaround we are seeing this issue.

@dgsudharsan can you please confirm this by removing the workaround made for swss#2626 ? It was agreed that this was a temporary workaround at that time for that specific branch.

@dgsudharsan
Copy link
Collaborator Author

From the Techsupport added in #12361 it looks like VXLAN_EVPN_NVO was not configured leading to the OA not processing the VXLAN_REMOTE_VNI table APP DB entries.

Before the workaround for swss#2626, the case of EVPN_NVO coming later would have been handled via the following check.. " if (!tunnel_orch->getTunnelPort(remote_vtep,tunnelPort)) { SWSS_LOG_WARN("Vxlan tunnelPort doesn't exist: %s", remote_vtep.c_str()); return false; } "

However with the workaround we are seeing this issue.

@dgsudharsan can you please confirm this by removing the workaround made for swss#2626 ? It was agreed that this was a temporary workaround at that time for that specific branch.

@srj102 I don't think removing that workaround alone helps. That work around is not present for p2mp orch. When evpn nvo is not present, we need to retry instead of returning success. My change sonic-net/sonic-swss#2756 did that but it undid the swss#2626.

We have to find proper solution for #12361 and we need to reintegrate sonic-net/sonic-swss#2756

@srj102
Copy link
Contributor

srj102 commented Jun 6, 2023

yes for p2mp case the changes made as part of 2756 will be required.
p2p works without 2756 as well.

Since 2626 is a workaround with incomplete root causing. I believe it has to be removed from master.
Changes made in 2756 is as expected and needs to be in the master and should not be reverted.

@dgsudharsan
Copy link
Collaborator Author

dgsudharsan commented Jun 6, 2023

yes for p2mp case the changes made as part of 2756 will be required. p2p works without 2756 as well.

Since 2626 is a workaround with incomplete root causing. I believe it has to be removed from master. Changes made in 2756 is as expected and needs to be in the master and should not be reverted.

@prsunny What is your feedback here? Should we remove the workaround sonic-net/sonic-swss#2626 and reintroduce sonic-net/sonic-swss#2756 in master? Is anyone debugging the root cause of #12361 ?

@prsunny
Copy link
Contributor

prsunny commented Jun 15, 2023

if we revert 2626, we will still have warmboot issue, right?

@dgsudharsan
Copy link
Collaborator Author

@srj102 Can you please provide ETA for fixing this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BRCM Issue for 202305 Triaged this issue has been triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants