Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[teamd] Add Warm-reboot startup and shutdown mode for teamd #2173

Merged
merged 3 commits into from
Nov 6, 2018

Conversation

pavel-shirshov
Copy link
Contributor

@pavel-shirshov pavel-shirshov commented Oct 20, 2018

I'll add teamd WB mode integrations into this PR and sonic-utilities as soon as I've tested this patch with the read testbed.
To implement WB make sure that.

  1. You start teamd with -L option with a valid directory.
  2. You stop it with USR1 signal
  3. You start teamd with -w flag for WB mode
  4. You make admin up for portchannel interface, after teamd starts

- What I did
Added:

  1. Warm-boot shutdown mode to allow teamd saving information to restart its state on Warm-boot startup
  2. Warm-boot startup mode. To allow start without resynchronization with the partner.

- How I did it

  1. I've added SIGUSR1 signal to request teamd stopping in WB mode. When teamd receive this signal it will:

    1. save last received LACP frame for every LAG member port.
    2. send last lacp update to its neighbors
    3. exit, keeping both portchannel and LAG member ports active
  2. When teamd starts in WB mode it will read saved state from the disk and start from the same state it was before the WR.
    For that I've added two extra parameters to teamd:

  3. w - WR mode

  4. L - directory where lacp frames will be saved.

- How to verify it

  1. Build and install everything on your DUT. Make sure that you used -L option with a valid directory
  2. Restart your DUT and check that LAG interface is up and running
  3. send -USR1 signal to your teamd: pkill -USR1 teamd
  4. Check that teamd send last LACP frame to its neighbors
  5. Check that both LAG and member LAG interface still running
  6. Check that lacp frames were saved under -L directory
  7. restart box using kexec
  8. start your teamd with -w -L flags. Ensure that teamd started in "current" state, not "expired" state.

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

Copy link
Collaborator

@jipanyang jipanyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the change able to support teamd docker warm restart too?
In case of teamd docker warm restart , the port channel in Linux kernel should not be touched to avoid affecting all the configuration on LAG.

+ /* Read data from file and process it */
+ if (ctx->warm_reboot && ctx->lacp_directory) {
+ (void)lacpdu_read(lacp_port);
+ }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will the dumped data file for each port be removed? erase after read or upon port remove, or outside of the teamd process? Leaving the file permanently there might cause confusion if there are multiple reboot/restart (Mixed warm/cold).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be outside program to remove the file, teamd already close it and read it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My design is to remove the files right after the read. The information in the file is outdated right after the read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest patch

struct lacp_port *agg_lead; /* leading port of aggregator.
* NULL in case this port is not selected */
enum lacp_port_state state;
+ bool lacpdu_received;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't see it being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. You're absolutely right. I've found this issue too, but didn't have a chance to fix it. I'm going to fix it on the next iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

+ struct lacp_port *lacp_port = lacp_port_get(lacp, tdport);
+ if(lacp_port->lacpdu_saved && lacp_port->ctx->lacp_directory) {
+ char filename[PATH_MAX];
+ strcpy(filename, lacp_port->ctx->lacp_directory);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case the directory is not ended with '/', you may want to add here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

+ ctx->warm_reboot = true;
+ break;
+ case 'L':
+ ctx->lacp_directory = strdup(optarg);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make sure the directory exist here? we do not want to get error msg when we try to dump the lacp pdu to disk, if directory does not exist, we'd like to know the error msg early.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What to do after the error message? exit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Just error message, no exit

@lguohan
Copy link
Collaborator

lguohan commented Oct 30, 2018

@jipanyang , teamd docker warm restart is not planned. the scenario for teamd docker warm restart is not clearly defined, should we remove teamd netdev in this case or not? are we assuming bgp connection not affected at all in this case, or are we assuming bgp docker will be warm restart as well?

@lguohan
Copy link
Collaborator

lguohan commented Oct 30, 2018

we need to add some test in the vs test to test teamd warm restart.

Copy link
Collaborator

@lguohan lguohan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as comments.

@jipanyang
Copy link
Collaborator

@lguohan I think the ideal teamd docker warm restart scenario is clear: Teamd software upgrade(teamd, teamsyncd, teammgrd), self-contained, no impact on data plane or other dockers including bgp.

It may not be in current phase of development, while doing the implementation it will be good to keep that in mind whenever possible.

char ifname[IFNAMSIZ];
uint32_t master_ifindex;
bool admin_state;
+ bool orig_addr_updated; /* FIXME: Check this. I think we don't need this flag */
Copy link
Collaborator

@qiluo-msft qiluo-msft Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    [](start = 1, length = 8)

Better follow existing code's indentation style. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this chunk was removed. It was needed only if netdev wasn't removed

" -U --usock-enable Enable UNIX domain socket interface\n"
- " -u --usock-disable Disable UNIX domain socket interface\n",
+ " -u --usock-disable Disable UNIX domain socket interface\n"
+ " -w --warm-reboot Warm-reboot startup mode\n"
Copy link
Collaborator

@qiluo-msft qiluo-msft Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warm-reboot [](start = 23, length = 11)

I believe these was discussion on the terminology. Use 'warm-start'? @lguohan #Closed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, warm-start is better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

{
teamd_run_loop_sent_ctrl_byte(ctx, 'r');
@@ -694,6 +716,10 @@ static int callback_daemon_signal(struct teamd_context *ctx, int events,
teamd_log_warn("Got SIGINT, SIGQUIT or SIGTERM.");
Copy link
Collaborator

@qiluo-msft qiluo-msft Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIGQUIT [](start = 31, length = 7)

We may reuse sigterm instead of sigusr1.
If lacp-directory is provided, we can save state, otherwise just normal quit.

It's no harm to save for cold system reboot or cold docker restart. #Closed

Copy link
Contributor Author

@pavel-shirshov pavel-shirshov Nov 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warm stop is not about file saving only. It includes other activities, like remove portchannel interface, put down physical ports. Also I think explicit signal is better than changed behavior as side effect of some setting.

- " -u --usock-disable Disable UNIX domain socket interface\n",
+ " -u --usock-disable Disable UNIX domain socket interface\n"
+ " -w --warm-reboot Warm-reboot startup mode\n"
+ " -L --lacp-directory Directory for saving lacp pdu dumps\n",
Copy link
Collaborator

@qiluo-msft qiluo-msft Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lacp-directory [](start = 23, length = 14)

Suggest users specify the lacp-filename directly. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not that easy. We have not just one name, but list of names, one filename per interface from LAG.

+ return err;
+ }
+
+ return lacpdu_process(lacp_port, &lacpdu);
Copy link
Collaborator

@qiluo-msft qiluo-msft Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    [](start = 1, length = 8)

less indentation? #Closed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you use 8 blanks, and original style is tab.


In reply to: 229898243 [](ancestors = 229898243)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now

Copy link
Collaborator

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest upstream official teamd when the feature ready.

@pavel-shirshov pavel-shirshov merged commit 4157f1d into master Nov 6, 2018
@pavel-shirshov pavel-shirshov deleted the pavelsh/wr_teamd branch November 6, 2018 00:38
dprital added a commit to dprital/sonic-buildimage that referenced this pull request Aug 11, 2022
Update sonic-utilities submodule pointer to include the following:
* Convert IPv6 addresses to lowercase in apply-patch ([sonic-net#2299](sonic-net/sonic-utilities#2299))
* [CLI] Move hostname, mgmt interface/vrf config to hostcfgd ([sonic-net#2173](sonic-net/sonic-utilities#2173))
* [config][muxcable] add support to enable/disable ycable telemetry ([sonic-net#2297](sonic-net/sonic-utilities#2297))

Signed-off-by: dprital <drorp@nvidia.com>
lguohan pushed a commit that referenced this pull request Aug 11, 2022
Update sonic-utilities submodule pointer to include the following:
* Convert IPv6 addresses to lowercase in apply-patch ([#2299](sonic-net/sonic-utilities#2299))
* [CLI] Move hostname, mgmt interface/vrf config to hostcfgd ([#2173](sonic-net/sonic-utilities#2173))
* [config][muxcable] add support to enable/disable ycable telemetry ([#2297](sonic-net/sonic-utilities#2297))

Signed-off-by: dprital <drorp@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants