Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Hostname and Subdomain to be set if empty #51199

Merged
merged 1 commit into from
Sep 27, 2017

Conversation

kow3ns
Copy link
Member

@kow3ns kow3ns commented Aug 23, 2017

What this PR does / why we need it:
This PR allows the Hostname and Subdomain field of v1.PodSpec to be set when empty, and modifies the StatefulSet controller to set them when empty.

For #48327:
We have merged #50942 to ensure that the Hostname and Subdomain fields are set when a new Pod is created. Users should upgrade to 1.6.9 and perform a rolling restart of all Pods in their StatefulSets to ensure that these fields are set prior to an upgrade to 1.7.5.
We have merged #51149 and #51044 to rollback the attempted mutation introduced in #44137.
This PR allows the Hostname and Subdomain field to be set exactly once, so that when users fail to read the notes, and encounter this issue, their clusters should self heal (even though they will experience a temporary network disruption for Pods in their StatefulSets.)

StatefulSet will now fill the `hostname` and `subdomain` fields if they're empty on existing Pods it owns. This allows it to self-correct the issue where StatefulSet Pod DNS entries disappear after upgrading to v1.7.x (#48327).

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 23, 2017
@kow3ns kow3ns requested a review from enisoc August 23, 2017 15:30
@k8s-github-robot k8s-github-robot added do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Aug 23, 2017
@kow3ns kow3ns requested a review from lavalamp August 23, 2017 15:30
@k8s-cherrypick-bot
Copy link

Removing label cherrypick-candidate because no release milestone was set. This is an invalid state and thus this PR is not being considered for cherry-pick to any release branch. Please add an appropriate release milestone and then re-add the label.

Spec: api.PodSpec{Hostname: "bar"},
},
api.Pod{
ObjectMeta: metav1.ObjectMeta{Name: "foo", DeletionTimestamp: &now},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the DeletionTimestamp?

@enisoc
Copy link
Member

enisoc commented Aug 23, 2017

You verified that endpoints will react to the Pod update and bring back the DNS entry?

@enisoc
Copy link
Member

enisoc commented Aug 23, 2017

It looks like we will never reach updateIdentity() unless the Pod is Ready:

// If we have a Pod that has been created but is not running and ready we can not make progress.
// We must ensure that all for each Pod, when we create it, all of its predecessors, with respect to its
// ordinal, are Running and Ready.
if !isRunningAndReady(replicas[i]) && monotonic {
glog.V(4).Infof(
"StatefulSet %s/%s is waiting for Pod %s to be Running and Ready",
set.Namespace,
set.Name,
replicas[i].Name)
return &status, nil
}
// Enforce the StatefulSet invariants
if identityMatches(set, replicas[i]) && storageMatches(set, replicas[i]) {
continue
}
// Make a deep copy so we don't mutate the shared cache
copy, err := scheme.Scheme.DeepCopy(replicas[i])
if err != nil {
return &status, err
}
replica := copy.(*v1.Pod)
if err := ssc.podControl.UpdateStatefulPod(updateSet, replica); err != nil {
return &status, err
}

Is it worth handling the case where the Pod is Unready precisely because its hostname/subdomain is missing?

@kow3ns
Copy link
Member Author

kow3ns commented Aug 24, 2017

It may be worth doing the update, that is the way the control loop was originally written. If we do this, it should go into master and get cherry picked back. I'd like to minimize drift in the main control loop.

@kow3ns kow3ns added this to the v1.7 milestone Aug 24, 2017
@enisoc
Copy link
Member

enisoc commented Aug 25, 2017

/lgtm

@kow3ns Are you planning to address Unready Pods before 1.7.5 as well?

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 25, 2017
@k8s-github-robot k8s-github-robot added the do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. label Aug 30, 2017
@@ -195,7 +195,12 @@ func initIdentity(set *apps.StatefulSet, pod *v1.Pod) {
func updateIdentity(set *apps.StatefulSet, pod *v1.Pod) {
pod.Name = getPodName(set, getOrdinal(pod))
pod.Namespace = set.Namespace

if pod.Spec.Hostname == "" {
pod.Spec.Hostname = pod.Name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this reading the annotation as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotations and the fields are both constructed from the name and serviceName as below. What information would I want from the existing annotations?


// allow hostname and subdomain to be updated if they are empty. This allows for migration between the beta
// annotations and the GA field
if oldPod.Spec.Hostname == "" && oldPod.Spec.Hostname != newPod.Spec.Hostname {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second clause on both of these if statements is not actually necessary...

@@ -2703,6 +2703,16 @@ func ValidatePodUpdate(newPod, oldPod *api.Pod) field.ErrorList {

// handle updateable fields by munging those fields prior to deep equal comparison.
mungedPod := *newPod

// allow hostname and subdomain to be updated if they are empty. This allows for migration between the beta
// annotations and the GA field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expand this comment, is it supposed to stay this way forever?

@lavalamp
Copy link
Member

/approve

Need the branch manager to sign off, too.

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 14, 2017
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 14, 2017
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 14, 2017
@enisoc enisoc removed do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Sep 22, 2017
// Enforce the StatefulSet invariants - we do this without respect to the Pod's readiness so that the endpoints
// controller can be notified of identity changes if a Pod becomes unready due to a DNS inconsistency with respect
// to the Pods identity.
if !identityMatches(set, replicas[i]) || !storageMatches(set, replicas[i]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a manual test of this and found a bug. We also need to put the check for hostname/subdomain back into identityMatches(), or else this doesn't trigger.

k8s-github-robot pushed a commit that referenced this pull request Sep 22, 2017
Automatic merge from submit-queue.

StatefulSet: Fix auto-heal of Pod hostname/subdomain fields.

This is a followup to #52557, which was a backport of #51199.

The original fix was incomplete. It doesn't trigger unless we change the consistency check to look at the fields.
pod.Namespace == set.Namespace
pod.Namespace == set.Namespace &&
pod.Spec.Hostname != "" &&
pod.Spec.Subdomain != ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had a thought:

ServiceName is allowed to be empty in 1.7+ [1], so we should do something like:

(pod.Spec.Subdomain != "" || set.ServiceName == "")

Otherwise we will continually send Pod updates for any StatefulSet with an empty ServiceName.

[1] The reason ServiceName can't be empty in <1.7 is because the annotation gets added anyway, and then the Pod fails validation. In 1.7+ we no longer set the annotation, and the field has no distinction between "" and unset.

and updates the StatefulSet controller to set them when empty
@enisoc
Copy link
Member

enisoc commented Sep 25, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 25, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enisoc, kow3ns, lavalamp

Associated issue: 48327

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@kow3ns
Copy link
Member Author

kow3ns commented Sep 25, 2017

/retest

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

8 similar comments
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue.

@k8s-github-robot k8s-github-robot merged commit 9790389 into kubernetes:release-1.7 Sep 27, 2017
@k8s-ci-robot
Copy link
Contributor

@kow3ns: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-bazel 4373f33 link /test pull-kubernetes-e2e-gce-bazel

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

@wojtek-t
Copy link
Member

wojtek-t commented Oct 2, 2017

@enisoc - thanks. That LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.