Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Stateful Set Resoruce #140

Merged
merged 1 commit into from
Nov 9, 2017
Merged

Add Stateful Set Resoruce #140

merged 1 commit into from
Nov 9, 2017

Conversation

karanthukral
Copy link
Contributor

@karanthukral karanthukral commented Jul 19, 2017

What?

  • Adds stateful set as a kubernetes resource
  • Replicates logic built into the Replica Set resource here
  • The timeout currently is the same as a replica set
  • As mentioned in a comment below, stateful sets pods currently (1.6) only support update onDelete which means simply using apply or replace won't actually update the pods. Kubernetes 1.7 is adding support for RollingUpdate.
    • Since we want to move to 1.7 in the short term, it might not be worth implementing custom deploy logic to delete each pod as 1.7 would allow us to simply use apply
    • If we feel like this is needed before we move to 1.6 then we can look into implementing custom logic for deploys

@karanthukral karanthukral force-pushed the statefulset branch 2 times, most recently from 7e03580 to 48361a6 Compare July 19, 2017 14:19
@airhorns
Copy link
Contributor

This would be super awesome to get out <3

@karanthukral karanthukral force-pushed the statefulset branch 2 times, most recently from c790e39 to bee8da1 Compare July 24, 2017 16:35
end

def find_pods(ss_data)
label_string = ss_data["spec"]["selector"]["matchLabels"].map { |k, v| "#{k}=#{v}" }.join(",")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KnVerey Currently (in k8s 1.6) the updates to the stateful set pods are only applied onDelete which means kubectl apply or kubectl replace only updates the stateful set definition. In k8s 1.7, they are adding RollingUpdates for them and will add the revision tag to match the pods.

Do you think its worth implementing a custom apply logic to delete each pod during the deploy or do we want to wait for 1.7 where apply would be sufficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think its worth implementing a custom apply logic to delete each pod during the deploy or do we want to wait for 1.7 where apply would be sufficient?

Definitely not worth spending time on custom logic. Arguably making the deploy time out when the rollout doesn't happen automatically under 1.6 still would be more correct though. Let's chat tomorrow about the 1.7 timeline and what would be best for our (very few) users of StatefulSets if it isn't imminent.

@karanthukral karanthukral changed the title [WIP] Add Stateful Set Resouce Add Stateful Set Resoruce Jul 24, 2017
@KnVerey
Copy link
Contributor

KnVerey commented Oct 12, 2017

@karanthukral let's pick this up again now that we're supporting 1.7

@karanthukral karanthukral force-pushed the statefulset branch 2 times, most recently from 0d8cd38 to 847e9df Compare October 12, 2017 19:12
@karanthukral
Copy link
Contributor Author

@KnVerey Tests are running to make sure nothing broke. Added a new test for failure to make sure pods are selected. Should be good for review 👀

@karanthukral
Copy link
Contributor Author

@stefanmb Does 1.6.4 and 1.7.5 ci's use different kubectl versions? Trying to fix the tests, they are broken on master as well

@stefanmb
Copy link
Contributor

Yes, each one uses the stated version of kubectl.

@karanthukral
Copy link
Contributor Author

Yes, each one uses the stated version of kubectl.

I might be wrong but it looks like 1.7.5 is using kubectl 1.8 since I can replicate the failure locally by using 1.8 but not with 1.7.5

@stefanmb
Copy link
Contributor

Hmm, seems you're right. The minikube installation doesn't seem to install kubectl.

@stefanmb
Copy link
Contributor

stefanmb commented Oct 13, 2017

So the CI situation right now is that each cluster has the correct server version, but the client used is 1.8.1 everywhere.

@KnVerey does that make sense or should we test against the same client version as well?

Edit #2: I don't think mismatching the client/server makes sense, I'll see about making the requisite changes.

@karanthukral
Copy link
Contributor Author

@KnVerey whenever you get a chance

Copy link
Contributor

@stefanmb stefanmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a ton of code shared with ReplicaSet - could we extract some of it into a base class?

end

def deploy_failed?
@pods.present? && @pods.all?(&:deploy_failed?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all? or any?

I realized this is the same logic as ReplicaSet but I'd like to understand why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be any? that was my fault. Will fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the same logic error exists in ReplicaSet then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the same logic error exists in ReplicaSet then?

Not sure why 😕 . I use any? in DaemonSets. @KnVerey might know more about ReplicaSets

@karanthukral
Copy link
Contributor Author

could we extract some of it into a base class?

I'll give that a shot :)

@karanthukral
Copy link
Contributor Author

@stefanmb Hopefully this cleans up the repeated code

@karanthukral
Copy link
Contributor Author

This should be good for 👀 too

@@ -0,0 +1,73 @@
# frozen_string_literal: true
module KubernetesDeploy
class BaseSet < KubernetesResource
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we call it PodController or PodSetBase or something like that? It took me a minute to realize that "BaseSet" refers to (Replica|Daemon|Stateful)Set, rather than being a generic name for an alternative base class any resource type might be free to use.

module KubernetesDeploy
class BaseSet < KubernetesResource
def deploy_failed?
@pods.present? && @pods.any?(&:deploy_failed?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a rule, I like to avoid having classes or modules implicitly depend on something neither they nor their superclasses define. My suggestion here would be to have this use a pods accessor instead of the ivar, and define it as follows:

def pods
  raise NotImplementedError, "Subclasses must define a `pods` accessor"
end

Each subclass then has to deliberately overwrite it with a reference to wherever it has put the required pod data, which it is responsible for getting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make a point related to this specific method: I'd rather sick with .all? for ReplicaSet. For DaemonSet, .any? makes sense because the pods are more unique in that a bad node can cause a single one to fail, and that one will not be able to be rescheduled. One bad pod on a RS, where they're truly clones, is much more likely to be recovered automatically. For StatefulSet I could go either way but would lean toward any? because the nature of SS suggests the pods could fail for independent, unrecoverable reasons. IMO this decision needs to be made class-by-class, balancing deploy fragility/speed concerns, and the base class shouldn't define a default as a result.

@pods.map(&:timeout_message).compact.uniq.join("\n")
end

def deploy_timed_out?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this from ReplicaSet. Didn't think to remove it from DaemonSet too, but I think the same reasoning applies.

super || @pods.present? && @pods.any?(&:deploy_timed_out?)
end

def exists?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's repetitive, but I'd leave this to the subclasses as it depends entirely on a piece of data they must know to set.

regular_containers + init_containers
end

def find_pods(ss_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pod_controller_data or something, matching the class name (I assume ss here was because this is copied from statefulset)

context: context,
definition: pod_data,
logger: @logger,
parent: "#{@name.capitalize} set",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.class.name instead of "set"


all_pods = JSON.parse(raw_json)["items"]
all_pods.each_with_object([]) do |pod_data, relevant_pods|
next unless pod_data["metadata"]["ownerReferences"].any? { |ref| ref["uid"] == ss_data["metadata"]["uid"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that DS can't use this because it needs to check additional conditions to make sure it is only selecting pods in the current generation. Shouldn't this be the case for SS too? RS is the only one of them that only ever selects pods in the latest generation, because of the additional Deployment layer above it.

To make this usable by all of them, perhaps we could use the NotImplementedError pattern mentioned above to require a parent_of_pod?(own_data, pod_data) method that each subclass has to implement for this method to use.

end

def fetch_logs
container_names.each_with_object({}) do |container_name, container_logs|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the return {} unless @pods.present? # the kubectl command times out if no pods exist guard from RS

@current_generation = stateful_data["metadata"]["generation"]
@observed_generation = stateful_data["status"]["observedGeneration"]
@desired_replicas = stateful_data["spec"]["replicas"].to_i
@rollout_data = stateful_data["status"].slice("replicas")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slicing out the exact same thing you're using for desired_replicas, so if I'm not mistaken, your current success condition will always pass as soon as the first new replica is created. I see in the docs that SS have a distinctive set of fields, two of which reference Revision (that's probably referring to controllerRevision?). Please look into and document which combinations of these fields we should look at to most reliably detect rollout completion.

@karanthukral karanthukral force-pushed the statefulset branch 4 times, most recently from f023180 to 31a3a02 Compare November 6, 2017 21:00
@karanthukral
Copy link
Contributor Author

Skipped monitoring onDelete statefulsets

pod.sync(pod_data)
relevant_pods << pod
end
def parent_of_pod(set_data, pod_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: These methods should have ? on the end of their name since they return a boolean

context: context,
definition: pod_data,
logger: @logger,
parent: "#{self.class.name} set",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't what we want, for example, "Web Deployment"? If so, this should be "#{name.capitalize} #{self.class.name}"

if @found
stateful_data = JSON.parse(raw_json)
@status_data = stateful_data["status"]
@rollout_data = stateful_data["status"].slice("replicas")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only used by @status directly below, so let's not make it an ivar.

@status_data = stateful_data["status"]
@rollout_data = stateful_data["status"].slice("replicas")
@update_strategy = stateful_data["spec"]["updateStrategy"]["type"]
@status = @rollout_data.map { |state_replicas, num| "#{num} #{state_replicas.chop.pluralize(num)}" }.join(", ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should minimally include readyReplicas too... maybe even the other xReplicas fields as well, wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add currentReplicas and readyReplicas

@status = @rollout_data.map { |state_replicas, num| "#{num} #{state_replicas.chop.pluralize(num)}" }.join(", ")
@pods = find_pods(stateful_data)
else # reset
@rollout_data = { "replicas" => 0 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should reset @status_data instead in a way that will not pass deploy_succeeded?

if @update_strategy == ONDELETE
# Gem cannot monitor update since it doesn't occur until delete
unless @success_assumption_warning_shown
@logger.warn("WARNING: Your SS's updateStrategy is set to OnDelete, "\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Spell out "StatefulSet"
  • "its deleted" -> "its pods are deleted"
  • RollingUpdates -> rollingUpdate (nit: actual field name)

end
true
else
@status_data['current_generation'] == @status_data['observed_generation'] &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these field names seem wrong... I'm guessing the test passes because they are both nil.

The real field I see is .status.observedGeneration. The current generation is in .metadata.generation, not the status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that SS has so many different status fields is a bit confusing... I went and found an example of a SS that is having trouble rolling out an update to help clarify what we need. The following status belongs to a SS that is in a pretty messed up state:

status:
  currentReplicas: 1
  currentRevision: statefulset-3627988389 # there is one pod in this version
  observedGeneration: 4 # this is equal to .metadata.generation... useless?
  readyReplicas: 2 # these are from a revision not mentioned in the status
  replicas: 3
  updateRevision: statefulset-1264314452 # there are no pods in this version

And of a healthy SS:

status:
  currentReplicas: 2
  currentRevision: other-statefulset-335438845
  observedGeneration: 64
  readyReplicas: 2 
  replicas: 2
  updateRevision: other-statefulset-335438845

Based on that, I think we need:

  • .status.currentRevision == .status.updateRevision
  • .spec.replicas == .status.currentReplicas == .status.readyReplicas

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The ss fields are confusing in how they are layed out

else
@status_data['current_generation'] == @status_data['observed_generation'] &&
@status_data['currentRevision'] == @status_data['updateRevision'] &&
@status_data['replicas'].to_i == @status_data['readyReplicas'].to_i
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs, replicas is the number created by the controller, which doesn't sound like the same thing as the number we ultimately want. I think we need to compare to .spec.replicas instead. (see my other comment though)

require 'kubernetes-deploy/kubernetes_resource/pod_set_base'
module KubernetesDeploy
class StatefulSet < PodSetBase
TIMEOUT = 5.minutes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this is long enough for statefulsets? Since they roll out one by one, I'd tend to think they'd need something more generous. Maybe check with DataDelivery?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll check with them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did they say?

@karanthukral karanthukral force-pushed the statefulset branch 3 times, most recently from 46abc51 to fc55d37 Compare November 8, 2017 21:19
require 'kubernetes-deploy/kubernetes_resource/pod_set_base'
module KubernetesDeploy
class StatefulSet < PodSetBase
TIMEOUT = 5.minutes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did they say?

@desired_replicas = stateful_data["spec"]["replicas"].to_i
@status_data = stateful_data["status"]
rollout_data = stateful_data["status"].slice("replicas", "readyReplicas", "currentReplicas")
@update_strategy = stateful_data.dig('spec', 'updateStrategy', 'type') || ONDELETE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, why did you have to add a default here? I would have expected the server to always fill this in if it isn't present. The actual default will vary by groupversion; in apps/v1beta1 it is indeed OnDelete, but it is switching to RollingUpdate in v1beta2: https://kubernetes.io/docs/api-reference/v1.8/#statefulsetupdatestrategy-v1beta2-apps

# Gem cannot monitor update since it doesn't occur until delete
unless @success_assumption_warning_shown
@logger.warn("WARNING: Your StatefulSet's updateStrategy is set to OnDelete, "\
"which means updates will not be applied until its pods deleted. "\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the word "are" is missing in "until its pods deleted"

@status = rollout_data.map { |state_replicas, num| "#{num} #{state_replicas.chop.pluralize(num)}" }.join(", ")
@pods = find_pods(stateful_data)
else # reset
@status_data = { "replicas" => 0 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't even look at replicas, so resetting it isn't particularly useful. The important part is to make sure that deploy_succeeded? won't pass based on this reset. One way to do this is to set mismatched defaults (e.g. current -1 / ready -2); another is to add a guard to deploy_succeeded? checking that the keys we need are actually present.

@@ -0,0 +1,36 @@
---
apiVersion: v1
kind: Service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use of having a service in this fixture?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll remove it

skip if KUBE_CLIENT_VERSION < Gem::Version.new("1.7.0")
result = deploy_fixtures("hello-cloud", subset: ["stateful_set.yml"]) do |fixtures|
stateful_set = fixtures['stateful_set.yml']['StatefulSet'].first
stateful_set['spec']['updateStrategy'] = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: set { 'type' => 'RollingUpdate' } and save a line

skip if KUBE_CLIENT_VERSION < Gem::Version.new("1.7.0")
result = deploy_fixtures("hello-cloud", subset: ["stateful_set.yml"]) do |fixtures|
stateful_set = fixtures['stateful_set.yml']['StatefulSet'].first
stateful_set['spec']['updateStrategy'] = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same nit here

"Successfully deployed",
"Service/nginx-ss",
"Successful resources",
"StatefulSet/nginx-ss"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assert on the 2 replicas, 2 readyReplicas, 2 currentReplicas output here, since we'd normally do that in the "hello cloud" test but can't because it uses OnDelete.

@karanthukral karanthukral force-pushed the statefulset branch 2 times, most recently from 48a4d9e to f23ba92 Compare November 9, 2017 19:24
Copy link
Contributor

@KnVerey KnVerey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Please:

  • Uncommit the test/fixtures/.DS_Store file
  • Squash before merging
  • Check in on deploys that have statefulsets a few times in the days after shipping this internally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants