-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[teamd] Add Warm-reboot startup and shutdown mode for teamd #2173
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the change able to support teamd docker warm restart too?
In case of teamd docker warm restart , the port channel in Linux kernel should not be touched to avoid affecting all the configuration on LAG.
+ /* Read data from file and process it */ | ||
+ if (ctx->warm_reboot && ctx->lacp_directory) { | ||
+ (void)lacpdu_read(lacp_port); | ||
+ } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When will the dumped data file for each port be removed? erase after read or upon port remove, or outside of the teamd process? Leaving the file permanently there might cause confusion if there are multiple reboot/restart (Mixed warm/cold).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be outside program to remove the file, teamd already close it and read it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My design is to remove the files right after the read. The information in the file is outdated right after the read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in the latest patch
struct lacp_port *agg_lead; /* leading port of aggregator. | ||
* NULL in case this port is not selected */ | ||
enum lacp_port_state state; | ||
+ bool lacpdu_received; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't see it being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. You're absolutely right. I've found this issue too, but didn't have a chance to fix it. I'm going to fix it on the next iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
+ struct lacp_port *lacp_port = lacp_port_get(lacp, tdport); | ||
+ if(lacp_port->lacpdu_saved && lacp_port->ctx->lacp_directory) { | ||
+ char filename[PATH_MAX]; | ||
+ strcpy(filename, lacp_port->ctx->lacp_directory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case the directory is not ended with '/', you may want to add here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
+ ctx->warm_reboot = true; | ||
+ break; | ||
+ case 'L': | ||
+ ctx->lacp_directory = strdup(optarg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make sure the directory exist here? we do not want to get error msg when we try to dump the lacp pdu to disk, if directory does not exist, we'd like to know the error msg early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What to do after the error message? exit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Just error message, no exit
@jipanyang , teamd docker warm restart is not planned. the scenario for teamd docker warm restart is not clearly defined, should we remove teamd netdev in this case or not? are we assuming bgp connection not affected at all in this case, or are we assuming bgp docker will be warm restart as well? |
we need to add some test in the vs test to test teamd warm restart. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as comments.
@lguohan I think the ideal teamd docker warm restart scenario is clear: Teamd software upgrade(teamd, teamsyncd, teammgrd), self-contained, no impact on data plane or other dockers including bgp. It may not be in current phase of development, while doing the implementation it will be good to keep that in mind whenever possible. |
char ifname[IFNAMSIZ]; | ||
uint32_t master_ifindex; | ||
bool admin_state; | ||
+ bool orig_addr_updated; /* FIXME: Check this. I think we don't need this flag */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 1, length = 8)
Better follow existing code's indentation style. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this chunk was removed. It was needed only if netdev wasn't removed
" -U --usock-enable Enable UNIX domain socket interface\n" | ||
- " -u --usock-disable Disable UNIX domain socket interface\n", | ||
+ " -u --usock-disable Disable UNIX domain socket interface\n" | ||
+ " -w --warm-reboot Warm-reboot startup mode\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warm-reboot [](start = 23, length = 11)
I believe these was discussion on the terminology. Use 'warm-start'? @lguohan #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, warm-start is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
{ | ||
teamd_run_loop_sent_ctrl_byte(ctx, 'r'); | ||
@@ -694,6 +716,10 @@ static int callback_daemon_signal(struct teamd_context *ctx, int events, | ||
teamd_log_warn("Got SIGINT, SIGQUIT or SIGTERM."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIGQUIT [](start = 31, length = 7)
We may reuse sigterm instead of sigusr1.
If lacp-directory is provided, we can save state, otherwise just normal quit.
It's no harm to save for cold system reboot or cold docker restart. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warm stop is not about file saving only. It includes other activities, like remove portchannel interface, put down physical ports. Also I think explicit signal is better than changed behavior as side effect of some setting.
- " -u --usock-disable Disable UNIX domain socket interface\n", | ||
+ " -u --usock-disable Disable UNIX domain socket interface\n" | ||
+ " -w --warm-reboot Warm-reboot startup mode\n" | ||
+ " -L --lacp-directory Directory for saving lacp pdu dumps\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lacp-directory [](start = 23, length = 14)
Suggest users specify the lacp-filename directly. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not that easy. We have not just one name, but list of names, one filename per interface from LAG.
+ return err; | ||
+ } | ||
+ | ||
+ return lacpdu_process(lacp_port, &lacpdu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[](start = 1, length = 8)
less indentation? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you use 8 blanks, and original style is tab.
In reply to: 229898243 [](ancestors = 229898243)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest upstream official teamd when the feature ready.
Update sonic-utilities submodule pointer to include the following: * Convert IPv6 addresses to lowercase in apply-patch ([sonic-net#2299](sonic-net/sonic-utilities#2299)) * [CLI] Move hostname, mgmt interface/vrf config to hostcfgd ([sonic-net#2173](sonic-net/sonic-utilities#2173)) * [config][muxcable] add support to enable/disable ycable telemetry ([sonic-net#2297](sonic-net/sonic-utilities#2297)) Signed-off-by: dprital <drorp@nvidia.com>
Update sonic-utilities submodule pointer to include the following: * Convert IPv6 addresses to lowercase in apply-patch ([#2299](sonic-net/sonic-utilities#2299)) * [CLI] Move hostname, mgmt interface/vrf config to hostcfgd ([#2173](sonic-net/sonic-utilities#2173)) * [config][muxcable] add support to enable/disable ycable telemetry ([#2297](sonic-net/sonic-utilities#2297)) Signed-off-by: dprital <drorp@nvidia.com>
I'll add teamd WB mode integrations into this PR and sonic-utilities as soon as I've tested this patch with the read testbed.
To implement WB make sure that.
- What I did
Added:
- How I did it
I've added SIGUSR1 signal to request teamd stopping in WB mode. When teamd receive this signal it will:
When teamd starts in WB mode it will read saved state from the disk and start from the same state it was before the WR.
For that I've added two extra parameters to teamd:
w - WR mode
L - directory where lacp frames will be saved.
- How to verify it
pkill -USR1 teamd
- Description for the changelog
- A picture of a cute animal (not mandatory but encouraged)