Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warmboot Vlan neigh restore fix #1040

Merged
merged 3 commits into from
Sep 9, 2019
Merged

Warmboot Vlan neigh restore fix #1040

merged 3 commits into from
Sep 9, 2019

Conversation

prsunny
Copy link
Collaborator

@prsunny prsunny commented Aug 28, 2019

What I did

  1. During Warmboot, the restore_neighbor script sends out ARP/NS for Vlan interfaces based on oper status. Since Vlan interface is bound to bridge, it is up by default. Modified to wait for Vlan members to be added.

  2. Logger is changed to syslog for getting correct timestamps for events

  3. Nbrmgrd push to kernel must happen only after warmboot neighbor restoration

Why I did it
To fix neigh restore issue for Vlans

How I verified it

Details if related

Aug 28 02:29:09.296187 str-msn2700-04 INFO swss#supervisord: start.sh restore_neighbors: started
Aug 28 02:29:22.277583 str-msn2700-04 INFO swss#restore_neighbor: Error: [Errno 2] No such file or directory: '/sys/class/net/Vlan1000/carrier'
Aug 28 02:29:33.573648 str-msn2700-04 NOTICE swss#vlanmgrd: :- doVlanTask: Add Vlan  1000
Aug 28 02:29:51.640572 str-msn2700-04 NOTICE swss#vlanmgrd: :- doVlanMemberTask: Add Vlan member:Ethernet4 to Vlan 1000
Aug 28 02:29:53.343235 str-msn2700-04 INFO swss#restore_neighbor: intf Vlan1000 is up
Aug 28 02:29:55.155619 str-msn2700-04 INFO swss#restore_neighbor: Add neighbor entries: family: IPv4, intf_idx: 45, ip: 192.168.0.2, mac: 00:00:00:11:22:33
Aug 28 02:29:55.155863 str-msn2700-04 INFO swss#restore_neighbor: Sending Neigh with family: IPv4, intf_idx: 45, ip: 192.168.0.2, mac: 00:00:00:11:22:33
Aug 28 02:30:01.209086 str-msn2700-04 NOTICE swss#orchagent: :- setHostIntfsOperStatus: Set operation status UP to host interface Ethernet96
Aug 28 02:30:02.681734 str-msn2700-04 NOTICE swss#vlanmgrd: :- doVlanMemberTask: Add Vlan member:Ethernet96 to Vlan 1000
Aug 28 02:30:03.952226 str-msn2700-04 NOTICE swss#portsyncd: :- main: PortInitDone
Aug 28 02:30:03.961584 str-msn2700-04 NOTICE swss#orchagent: :- addVlan: Create an empty VLAN Vlan1000 vid:1000
Aug 28 02:30:03.963923 str-msn2700-04 NOTICE swss#orchagent: :- addNeighbor: Created neighbor 00:00:00:11:22:33 on Vlan1000
Aug 28 02:30:03.963923 str-msn2700-04 NOTICE swss#orchagent: :- addNextHop: Created next hop 192.168.0.2 on Vlan1000
Aug 28 02:30:08.902206 str-msn2700-04 NOTICE swss#orchagent: :- addVlanMember: Add member Ethernet4 to VLAN Vlan1000 vid:1000 pid1000000000549
Aug 28 02:30:08.916187 str-msn2700-04 NOTICE swss#orchagent: :- addVlanMember: Add member Ethernet96 to VLAN Vlan1000 vid:1000 pid1000000000252

@prsunny prsunny requested a review from zhenggen-xu August 28, 2019 18:12
neighsyncd/restore_neighbors.py Show resolved Hide resolved
neighsyncd/restore_neighbors.py Show resolved Hide resolved
neighsyncd/restore_neighbors.py Outdated Show resolved Hide resolved
neighsyncd/restore_neighbors.py Show resolved Hide resolved
neighsyncd/restore_neighbors.py Show resolved Hide resolved
cfgmgr/nbrmgrd.cpp Show resolved Hide resolved
@prsunny
Copy link
Collaborator Author

prsunny commented Sep 6, 2019

retest this please

2 similar comments
@prsunny
Copy link
Collaborator Author

prsunny commented Sep 6, 2019

retest this please

@prsunny
Copy link
Collaborator Author

prsunny commented Sep 6, 2019

retest this please

@prsunny prsunny merged commit 313ef5c into sonic-net:master Sep 9, 2019
yxieca pushed a commit that referenced this pull request Sep 9, 2019
* Send arp request after first Vlan member port is added

* Add wait logic after Vlan member add, nbrmgr to wait for restore complete

* Address comment to pass db as a parameter and open only once
@tylerlinp
Copy link
Contributor

nbrmgrd waits restore neighbors for 120s in normal startup? isNeighRestoreDone true only if really do restore. I found VS tests about neighbor/nexthop (vrf new cases) failed because nbrmgrd cannot work.

@prsunny
Copy link
Collaborator Author

prsunny commented Sep 12, 2019

In normal startup, there is no wait as warmboot flag is disabled and the isNeighRestoreDone flag would be set without any wait. can you provide any logs that nbrmgrd is stuck in VS?

@prsunny prsunny deleted the ip_retry branch September 12, 2019 03:51
@tylerlinp
Copy link
Contributor

nbrmgrd is stuck in VS because:

  1. In VS startup.sh, now there is no start restore_neighbors.
  2. In restore_neighbors.py, if not warmstart.isWarmStart() missing set_statedb_neigh_restore_done().

oleksandrivantsiv pushed a commit to oleksandrivantsiv/sonic-swss that referenced this pull request Mar 1, 2023
1. Setup pipeline without manual effort when checkout new release branch.
2. Use correct branch when downloading artifacts or checkout relative repos.
3. Clear downloaded artifacts to avoid using outdated dependencies.
4. Use commonlib pipeline to download libnl3 and libyang instead of vs image build, to increase success rate.
5. Add weekly build to keep artifacts remaining.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants