Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[teamsyncd]: Fix LAG add: write MTU configuration #1423

Merged
merged 1 commit into from
Sep 11, 2020

Conversation

nazariig
Copy link
Collaborator

@nazariig nazariig commented Aug 31, 2020

Signed-off-by: Nazarii Hnydyn nazariig@nvidia.com

The current apply_temp_view logic doesn't handle MTU configuration during warm-reboot:

Before warm-reboot:

root@sonic:/home/admin# redis-cli -n 0 HGETALL 'LAG_TABLE:PortChannel0001'
1) "admin_status"
2) "up"
3) "oper_status"
4) "up"
5) "mtu"
6) "9100"

After warm-reboot:

root@sonic:/home/admin# redis-cli -n 0 HGETALL 'LAG_TABLE:PortChannel0001'
1) "admin_status"
2) "up"
3) "oper_status"
4) "up"

Apply state logs:

Aug 31 09:46:37.787959 sonic NOTICE teamd#teamsyncd: :- applyState: Applying state
Aug 31 09:46:37.789065 sonic NOTICE teamd#teamsyncd: :- dump: getting took 0.000322 sec
Aug 31 09:46:37.789065 sonic INFO teamd#teamsyncd: :- apply_temp_view: View switch of table LAG_TABLE required.
Aug 31 09:46:37.789065 sonic INFO teamd#teamsyncd: :- apply_temp_view: Objects in current view:
Aug 31 09:46:37.789065 sonic INFO teamd#teamsyncd: :- apply_temp_view:     PortChannel0001: 3 fields;
Aug 31 09:46:37.789065 sonic INFO teamd#teamsyncd: :- apply_temp_view: Objects in target view:
Aug 31 09:46:37.789065 sonic INFO teamd#teamsyncd: :- apply_temp_view:     PortChannel0001: 2 fields;
Aug 31 09:46:37.790676 sonic NOTICE teamd#teamsyncd: :- dump: getting took 0.000214 sec
Aug 31 09:46:37.790676 sonic INFO teamd#teamsyncd: :- apply_temp_view: View switch of table LAG_MEMBER_TABLE required.
Aug 31 09:46:37.790676 sonic INFO teamd#teamsyncd: :- apply_temp_view: Objects in current view:
Aug 31 09:46:37.790676 sonic INFO teamd#teamsyncd: :- apply_temp_view:     PortChannel0001:Ethernet16: 1 fields;
Aug 31 09:46:37.790676 sonic INFO teamd#teamsyncd: :- apply_temp_view: Objects in target view:
Aug 31 09:46:37.790722 sonic INFO teamd#teamsyncd: :- apply_temp_view:     PortChannel0001:Ethernet16: 1 fields;
Aug 31 09:46:37.791751 sonic NOTICE teamd#teamsyncd: :- setWarmStartState: teamsyncd warm start state changed to reconciled

This ends up in traffic disruption after ISSU end:

Aug 31 07:57:07.043050 sonic NOTICE swss#orchagent: :- addRouterIntfs: Create router interface Ethernet116 MTU 9100
Aug 31 07:57:07.052926 sonic NOTICE swss#orchagent: :- addRouterIntfs: Create router interface PortChannel0001 MTU 1492
Aug 31 07:57:07.204451 sonic NOTICE swss#orchagent: :- addRouterIntfs: Create router interface Vlan23 MTU 9100

Aug 31 07:57:07.554444 sonic INFO syncd#supervisord: syncd Aug 31 07:57:07 NOTICE  SAI_RIF: mlnx_sai_rif.c[566]- mlnx_create_router_interface: Create rif, #0 VIRTUAL_ROUTER_ID=VIRTUAL_ROUTER,(0:0),0,0000,0 #1 SRC_MAC_ADDRESS=[50:6b:4b:96:5e:00] #2 TYPE=PORT #3 PORT_ID=PORT,(0:0),11b00,0000,0 #4 MTU=9100
Aug 31 07:57:07.573953 sonic INFO syncd#supervisord: syncd Aug 31 07:57:07 NOTICE  SAI_RIF: mlnx_sai_rif.c[566]- mlnx_create_router_interface: Create rif, #0 VIRTUAL_ROUTER_ID=VIRTUAL_ROUTER,(0:0),0,0000,0 #1 SRC_MAC_ADDRESS=[50:6b:4b:96:5e:00] #2 TYPE=PORT #3 PORT_ID=LAG,(0:0),40,0000,0 #4 MTU=1492
Aug 31 07:57:07.599988 sonic INFO syncd#supervisord: syncd Aug 31 07:57:07 NOTICE  SAI_RIF: mlnx_sai_rif.c[566]- mlnx_create_router_interface: Create rif, #0 VIRTUAL_ROUTER_ID=VIRTUAL_ROUTER,(0:0),0,0000,0 #1 SRC_MAC_ADDRESS=[50:6b:4b:96:5e:00] #2 TYPE=VLAN #3 VLAN_ID=VLAN,(0:0),17,0000,0 #4 MTU=9100

Aug 31 07:57:11.967761 sonic INFO syncd#supervisord: syncd Aug 31 07:57:11 NOTICE  SAI_UTILS: mlnx_sai_utils.c[2397]- set_dispatch_attrib_handler: Set FAST_API_ENABLE, key:Switch ID 1, val:false

Aug 31 07:57:22.013521 sonic NOTICE swss#orchagent: :- setRouterIntfsMtu: Set router interface Vlan23 MTU to 9100
Aug 31 07:57:22.016550 sonic INFO syncd#supervisord: syncd Aug 31 07:57:22 NOTICE  SAI_UTILS: mlnx_sai_utils.c[2397]- set_dispatch_attrib_handler: Set MTU, key:rif idx 3, val:9100
Aug 31 07:57:22.017188 sonic NOTICE swss#orchagent: :- setRouterIntfsMtu: Set router interface PortChannel0001 MTU to 9100
Aug 31 07:57:22.021689 sonic INFO syncd#supervisord: syncd Aug 31 07:57:22 NOTICE  SAI_UTILS: mlnx_sai_utils.c[2397]- set_dispatch_attrib_handler: Set MTU, key:rif idx 2, val:9100

What I did

  • Added missing MTU filed

Why I did it

  • To fix warm-reboot apply_temp_view logic

How I verified it

  • Did warm-reboot several times with frame size equal to 9100 - no traffic disruption is observed

Details if related

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
@pavel-shirshov pavel-shirshov requested a review from prsunny August 31, 2020 18:51
@prsunny
Copy link
Collaborator

prsunny commented Aug 31, 2020

This was done on purpose for a use-case where kernel mtu is different from config_db mtu - #1053.

Copy link
Collaborator

@prsunny prsunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As comments

@liat-grozovik
Copy link
Collaborator

This was done on purpose for a use-case where kernel mtu is different from config_db mtu - #1053.

So how do you suggest to overcome this flow?
This is legal flow were warm boot is not working.

BTW why should we have MTU mismatch between kernel and config db. At some point the Linux direct command was not recommended and all should go via the config db.

@yxieca
Copy link
Contributor

yxieca commented Sep 9, 2020

Alibaba is ok with moving forward with closing this PR and tracking this change internally.

After merging this change, sonic will NOT support different MTU size between Kernel and ASIC.

@yxieca
Copy link
Contributor

yxieca commented Sep 9, 2020

Alibaba is ok with moving forward with closing this PR and tracking this change internally.

After merging this change, sonic will NOT support different MTU size between Kernel and ASIC.

This change will need to pass regular and warm reboot test.

Unit test can come with the master change. Please provide following logs with this change:

  1. config reload configuration flow
  2. swss restart configuration flow
  3. first warm reboot configuration flow
  4. subsequent warm reboot configuration flow

Copy link
Collaborator

@prsunny prsunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for test result

@nazariig
Copy link
Collaborator Author

Issue: #1432

@nazariig
Copy link
Collaborator Author

@prsunny done. Please check the issue.

@abdosi
Copy link
Contributor

abdosi commented Sep 11, 2020

we are able to hit this issue on one of our multi-asic platforms for some of port-channels after cold-boot.
So this issue is not during warm-boot only and possible race condition can happen between teammgr and teamsync in updating LAG Table in APP DB.

@judyjoseph @prsunny

@abdosi abdosi merged commit 4aa48d6 into sonic-net:201911 Sep 11, 2020
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
#### What I did

Fix Python deprecation warnings as reported by pytest

#### How I did it

Convert strings which include escape sequences into raw strings
oleksandrivantsiv pushed a commit to oleksandrivantsiv/sonic-swss that referenced this pull request Mar 1, 2023
Update SAI submodule v1.9 with the following fixes

7594e53 (HEAD, origin/v1.9) Skip brcm teardown assertion (sonic-net#1423) (sonic-net#1428)
0c33f4a [FIX]Fix the circular reference issue when build sai header py (sonic-net#1427)
7e0fc24 Add support for building under Doxygen 1.9.1 (sonic-net#1414) (sonic-net#1424)
8ecf3ef [Fix]Correct enum check on branch 1.9 (sonic-net#1418)
e2b2f39 Add Thrift 0.14.1 compatibility (sonic-net#1403) (sonic-net#1416)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants