Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm: handle Orchestrator during preStop shutdown hook #4493

Merged
merged 3 commits into from
Jan 1, 2019

Conversation

derekperkins
Copy link
Member

@derekperkins derekperkins commented Dec 31, 2018

Per my conversation with @acharis on Slack, https://vitess.slack.com/archives/C0PQY0PTK/p1545344028308100, I've added some api calls to the k8s preStop shutdown hook.

The ordering that we discussed was:

  1. vitess: PlannedReparentShard
  2. orc: refresh/:host/:port
  3. vitess: DeleteTablet
  4. orc: forget/:host/:port

In addition to adding those calls, I also manually downtimed Orchestrator for 30 10 seconds, though I'm curious to hear feedback on that. The final setup looks like:

  1. orc: begin-downtime/$hostname.vttablet/3306/preStopHook/VitessPlannedReparent/10s
  2. vitess: PlannedReparentShard
  3. orc: end-downtime/$hostname.vttablet/3306
  4. orc: refresh/$hostname.vttablet/3306
  5. vitess: DeleteTablet
  6. orc: forget/$hostname.vttablet/3306

edit: changed Orchestrator downtime from 30 seconds to 10 seconds after discussing with @acharis on Slack

cc @hmcgonig @shlomi-noach

Signed-off-by: Derek Perkins <derek@derekperkins.com>
@derekperkins derekperkins force-pushed the orc-reparent-api branch 2 times, most recently from b613313 to 308f3ec Compare January 1, 2019 04:00
Signed-off-by: Derek Perkins <derek@derekperkins.com>
Signed-off-by: Derek Perkins <derek@derekperkins.com>
@shlomi-noach
Copy link
Contributor

Sorry for not jumping in on the Slack discussion. I think the refresh might be redundant? If you're going to forget the instance immediately after, is there any need to refresh? At any case I don't think it hurts.

@derekperkins
Copy link
Member Author

The refresh before forgetting comes from Hubspot's production setup. I'm not sure what the reasoning behind it was.

Do you think beginning and ending downtime around the planned reparent is a good idea?

@shlomi-noach
Copy link
Contributor

Do you think beginning and ending downtime around the planned reparent is a good idea?

Definitely.

@derekperkins
Copy link
Member Author

Great. This is ready to be merged. Since this contains a change to the Dockerfile, once this PR is merged, I'll wait for the auto-build to apply the change, then tag a new helm release.

Copy link
Contributor

@dkhenry dkhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@dkhenry dkhenry merged commit d294ebd into vitessio:master Jan 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants