Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Custom Resource Deployments #376

Merged
merged 47 commits into from
Jan 21, 2019
Merged
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
154d6c0
initial dynamic discovery
timothysmith0609 Nov 8, 2018
bac0364
remove old @sync_mediator code
timothysmith0609 Dec 12, 2018
0371a7c
raise FatalDeploymentError, not RuntimeError in custom_resource_defin…
timothysmith0609 Dec 19, 2018
1ec320c
use group_by(&:kind) to aggregate crd defs
timothysmith0609 Dec 19, 2018
fdaff1e
Create JsonPath objects once on init
timothysmith0609 Dec 19, 2018
c48c43b
more jsonpath smoothing
timothysmith0609 Dec 19, 2018
c490593
use observedGeneration == generation in custom resource rollouts
timothysmith0609 Dec 20, 2018
9063569
fix raise FatalDeploymentError syntax when parsing JsonPaths
timothysmith0609 Dec 20, 2018
5f68443
Guard against kubernetes version
timothysmith0609 Jan 3, 2019
cc79f40
tests, cleanup, edits
timothysmith0609 Jan 4, 2019
3f3ac43
remove bad test
timothysmith0609 Jan 4, 2019
c8c0ff8
remove child_timeout_annotation. rename rollout-params annotation
timothysmith0609 Jan 4, 2019
8f69260
missed search/replace
timothysmith0609 Jan 4, 2019
add4b4b
refactor rollout configs to own class
timothysmith0609 Jan 7, 2019
27f61cb
rename tests (parmams->config)
timothysmith0609 Jan 7, 2019
67ca171
no more threaded test
timothysmith0609 Jan 7, 2019
de4510c
Fix unit tests (expects actual CRD object, not hash
timothysmith0609 Jan 7, 2019
bf032e1
Refactor rollout_config into separate file. More robust validation
Jan 7, 2019
03cb183
Add timeouts for CR instance. More test. Cleanup. More validation of
Jan 9, 2019
8dc350a
timeouts, README, renames
timothysmith0609 Jan 11, 2019
87de50b
README, tests, renaming + small edits
timothysmith0609 Jan 11, 2019
26c873c
typo
timothysmith0609 Jan 14, 2019
1f1b29a
README explain timeout syntax
timothysmith0609 Jan 15, 2019
4982c76
[ci skip] wip
KnVerey Jan 16, 2019
85d70cc
tweaks for test, rollout_conditions, cr, crd
timothysmith0609 Jan 16, 2019
85e8ab6
integration test when no failure_conditions present
timothysmith0609 Jan 16, 2019
ce6b919
out-of-band invalid CRD fails CR deploy. Annotation change (remove cr…
timothysmith0609 Jan 16, 2019
4628ffa
rubocop
timothysmith0609 Jan 16, 2019
4033ae1
Apply suggestions from code review
KnVerey Jan 16, 2019
6eea88e
merge PR suggestions
timothysmith0609 Jan 16, 2019
e57de3b
README updates
timothysmith0609 Jan 16, 2019
de7248b
small fixes
timothysmith0609 Jan 16, 2019
6c86654
fix test from moving CRD to predeploy sequence
timothysmith0609 Jan 17, 2019
5919593
No global resources (CRD) in predeploy sequence
timothysmith0609 Jan 17, 2019
9e2f1d2
pr review
timothysmith0609 Jan 18, 2019
b401f71
partial pr review
timothysmith0609 Jan 18, 2019
2e7e5bf
more pr review
timothysmith0609 Jan 18, 2019
44f890a
pr review
timothysmith0609 Jan 18, 2019
49d7d8e
Better timeout message, replace @statsd_tags with [] in relevant tests
timothysmith0609 Jan 18, 2019
a1e99b2
Final status handling
timothysmith0609 Jan 18, 2019
00b45cf
typo
timothysmith0609 Jan 18, 2019
afde0fd
remove old code
timothysmith0609 Jan 18, 2019
68a8dea
errors per CR test fix
timothysmith0609 Jan 18, 2019
3a43c06
RuntimeError -> StandardError
timothysmith0609 Jan 18, 2019
f1b8fed
pr review
timothysmith0609 Jan 18, 2019
81fab4f
police
timothysmith0609 Jan 18, 2019
d217b15
update changelog
timothysmith0609 Jan 21, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ This repo also includes related tools for [running tasks](#kubernetes-run) and [
* [Customizing behaviour with annotations](#customizing-behaviour-with-annotations)
* [Running tasks at the beginning of a deploy](#running-tasks-at-the-beginning-of-a-deploy)
* [Deploying Kubernetes secrets (from EJSON)](#deploying-kubernetes-secrets-from-ejson)
* [Deploying custom resources](#deploying-custom-resources)

**KUBERNETES-RESTART**
* [Usage](#usage-1)
Expand Down Expand Up @@ -312,7 +313,95 @@ Since their data is only base64 encoded, Kubernetes secrets should not be commit
}
```

### Deploying custom resources

By default, kubernetes-deploy does not check the status of custom resources; it simply assumes that they deployed successfully. In order to meaningfully monitor the rollout of custom resources, kubernetes-deploy supports configuring pass/fail conditions using annotations on CustomResourceDefinitions (CRDs).

>Note:
This feature is only available on clusters running Kubernetes 1.11+ since it relies on the `metadata.generation` field being updated when custom resource specs are changed.

*Requirements:*

* The custom resource must expose a `status` subresource with an `observedGeneration` field.
* The `kubernetes-deploy.shopify.io/instance-rollout-conditions` annotation must be present on the CRD that defines the custom resource.
* (optional) The `kubernetes-deploy.shopify.io/instance-timeout` annotation can be added to the CRD that defines the custom resource to override the global default timeout for all instances of that resource. This annotation can use ISO8601 format or unprefixed ISO8601 time components (e.g. '1H', '60S').

#### Specifying pass/fail conditions

The presence of a valid `kubernetes-deploy.shopify.io/instance-rollout-conditions` annotation on a CRD will cause kubernetes-deploy to monitor the rollout of all instances of that custom resource. Its value can either be `"true"` (giving you the defaults described in the next section) or a valid JSON string with the following format:
```
'{
"success_conditions": [
{ "path": <JsonPath expression>, "value": <target value> }
... more success conditions
],
"failure_conditions": [
{ "path": <JsonPath expression>, "value": <target value> }
... more failure conditions
]
}'
```

For all conditions, `path` must be a valid JsonPath expression that points to a field in the custom resource's status. `value` is the value that must be present at `path` in order to fulfill a condition. For a deployment to be successful, _all_ `success_conditions` must be fulfilled. Conversely, the deploy will be marked as failed if _any one of_ `failure_conditions` is fulfilled. `success_conditions` are mandatory, but `failure_conditions` can be omitted (the resource will simply time out if it never reaches a successful state).

In addition to `path` and `value`, a failure condition can also contain `error_msg_path` or `custom_error_msg`. `error_msg_path` is a JsonPath expression that points to a field you want to surface when a failure condition is fulfilled. For example, a status condition may expose a `message` field that contains a description of the problem it encountered. `custom_error_msg` is a string that can be used if your custom resource doesn't contain sufficient information to warrant using `error_msg_path`. Note that `custom_error_msg` has higher precedence than `error_msg_path` so it will be used in favor of `error_msg_path` when both fields are present.

**Warning:**

You **must** ensure that your custom resource controller sets `.status.observedGeneration` to match the observed `.metadata.generation` of the monitored resource once its sync is complete. If this does not happen, kubernetes-deploy will not check success or failure conditions and the deploy will time out.

#### Example

As an example, the following is the default configuration that will be used if you set `kubernetes-deploy.shopify.io/instance-rollout-conditions: "true"` on the CRD that defines the custom resources you wish to monitor:
```
'{
"success_conditions": [
{
"path": "$.status.conditions[?(@.type == \"Ready\")].status",
"value": "True",
},
],
"failure_conditions": [
{
"path": '$.status.conditions[?(@.type == \"Failed\")].status',
"value": "True",
"error_msg_path": '$.status.conditions[?(@.type == \"Failed\")].message',
},
],
}'
```

The paths defined here are based on the [typical status properties](https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#typical-status-properties) as defined by the Kubernetes community. It expects the `status` subresource to contain a `conditions` array whose entries minimally specify `type`, `status`, and `message` fields.

You can see how these conditions relate to the following resource:

```
apiVersion: stable.shopify.io/v1
kind: Example
metadata:
generation: 2
name: example
namespace: namespace
spec:
...
status:
observedGeneration: 2
conditions:
- type: "Ready"
status: "False"
reason: "exampleNotReady"
message: "resource is not ready"
- type: "Failed"
status: "True"
reason: "exampleFailed"
message: "resource is failed"
```

- `observedGeneration == metadata.generation`, so kubernetes-deploy will check this resource's success and failure conditions.
- Since `$.status.conditions[?(@.type == "Ready")].status == "False"`, the resource is not considered successful yet.
- `$.status.conditions[?(@.type == "Failed")].status == "True"` means that a failure condition has been fulfilled and the resource is considered failed.
- Since `error_msg_path` is specified, kubernetes-deploy will log the contents of `$.status.conditions[?(@.type == "Failed")].message`, which in this case is: `resource is failed`.

# kubernetes-restart

Expand Down
1 change: 1 addition & 0 deletions kubernetes-deploy.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Gem::Specification.new do |spec|
spec.add_dependency("statsd-instrument", '~> 2.3', '>= 2.3.2')
spec.add_dependency("oj", "~> 3.7")
spec.add_dependency("concurrent-ruby", "~> 1.1")
spec.add_dependency("jsonpath", "~> 0.9.6")

spec.add_development_dependency("bundler")
spec.add_development_dependency("rake", "~> 10.0")
Expand Down
41 changes: 23 additions & 18 deletions lib/kubernetes-deploy/deploy_task.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
require 'fileutils'
require 'kubernetes-deploy/kubernetes_resource'
%w(
custom_resource
cloudsql
config_map
deployment
Expand All @@ -24,7 +25,6 @@
elasticsearch
statefulservice
topic
bucket
timothysmith0609 marked this conversation as resolved.
Show resolved Hide resolved
stateful_set
cron_job
job
Expand All @@ -46,19 +46,6 @@ class DeployTask
include KubeclientBuilder
extend KubernetesDeploy::StatsD::MeasureMethods

PREDEPLOY_SEQUENCE = %w(
ResourceQuota
timothysmith0609 marked this conversation as resolved.
Show resolved Hide resolved
Cloudsql
Redis
Memcached
ConfigMap
PersistentVolumeClaim
ServiceAccount
Role
RoleBinding
Pod
)

PROTECTED_NAMESPACES = %w(
default
kube-system
Expand All @@ -73,6 +60,22 @@ class DeployTask
# extensions/v1beta1/ReplicaSet -- managed by deployments
# core/v1/Secret -- should not committed / managed by shipit

def predeploy_sequence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still have the issue open to implement predeploy by dependency. Might be time to action that before this gets more complicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we have an issue but feel free to open one if we don't. Originally that idea surfaced in the predecessor PR to this one. We decided it wasn't strictly required to implement this feature, which is why it is not part of this PR. I think it is a reasonable feature, but I'm not sure we actually need it in practice.

before_crs = %w(
ResourceQuota
)
after_crs = %w(
ConfigMap
PersistentVolumeClaim
ServiceAccount
Role
RoleBinding
Pod
)

before_crs + cluster_resource_discoverer.crds.map(&:kind) + after_crs
end

def prune_whitelist
wl = %w(
core/v1/ConfigMap
Expand Down Expand Up @@ -194,11 +197,11 @@ def cluster_resource_discoverer
end

def deploy_has_priority_resources?(resources)
resources.any? { |r| PREDEPLOY_SEQUENCE.include?(r.type) }
resources.any? { |r| predeploy_sequence.include?(r.type) }
end

def predeploy_priority_resources(resource_list)
PREDEPLOY_SEQUENCE.each do |resource_type|
predeploy_sequence.each do |resource_type|
matching_resources = resource_list.select { |r| r.type == resource_type }
next if matching_resources.empty?
deploy_resources(matching_resources, verify: true, record_summary: false)
Expand Down Expand Up @@ -254,12 +257,14 @@ def create_ejson_secrets(prune)

def discover_resources
resources = []
crds = cluster_resource_discoverer.crds.group_by(&:kind)
@logger.info("Discovering templates:")

TemplateDiscovery.new(@template_dir).templates.each do |filename|
split_templates(filename) do |r_def|
r = KubernetesResource.build(namespace: @namespace, context: @context, logger: @logger,
definition: r_def, statsd_tags: @namespace_tags)
crd = crds[r_def["kind"]]&.first
r = KubernetesResource.build(namespace: @namespace, context: @context, logger: @logger, definition: r_def,
statsd_tags: @namespace_tags, crd: crd)
resources << r
@logger.info(" - #{r.id}")
end
Expand Down
18 changes: 10 additions & 8 deletions lib/kubernetes-deploy/kubernetes_resource.rb
Original file line number Diff line number Diff line change
Expand Up @@ -31,24 +31,26 @@ class KubernetesResource
TIMEOUT_OVERRIDE_ANNOTATION = "kubernetes-deploy.shopify.io/timeout-override"

class << self
def build(namespace:, context:, definition:, logger:, statsd_tags:)
def build(namespace:, context:, definition:, logger:, statsd_tags:, crd: nil)
opts = { namespace: namespace, context: context, definition: definition, logger: logger,
statsd_tags: statsd_tags }
if definition["kind"].blank?
raise InvalidTemplateError.new("Template missing 'Kind'", content: definition.to_yaml)
end

begin
if KubernetesDeploy.const_defined?(definition["kind"])
klass = KubernetesDeploy.const_get(definition["kind"])
return klass.new(**opts)
end
rescue NameError
end

inst = new(**opts)
inst.type = definition["kind"]
inst
if crd
CustomResource.new(crd: crd, **opts)
timothysmith0609 marked this conversation as resolved.
Show resolved Hide resolved
else
inst = new(**opts)
inst.type = definition["kind"]
inst
end
end

def timeout
Expand Down Expand Up @@ -167,13 +169,13 @@ def exists?

def current_generation
return -1 unless exists? # must be different default than observed_generation
@instance_data["metadata"]["generation"]
@instance_data.dig("metadata", "generation")
end

def observed_generation
return -2 unless exists?
# populating this is a best practice, but not all controllers actually do it
@instance_data["status"]["observedGeneration"]
@instance_data.dig('status', 'observedGeneration')
end

def status
Expand Down
22 changes: 0 additions & 22 deletions lib/kubernetes-deploy/kubernetes_resource/bucket.rb

This file was deleted.

87 changes: 87 additions & 0 deletions lib/kubernetes-deploy/kubernetes_resource/custom_resource.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# frozen_string_literal: true
require 'jsonpath'

module KubernetesDeploy
class CustomResource < KubernetesResource
TIMEOUT_MESSAGE_DIFFERENT_GENERATIONS = <<~MSG
This resource's status could not be used to determine rollout success because it is not up-to-date
(.metadata.generation != .status.observedGeneration).
MSG
timothysmith0609 marked this conversation as resolved.
Show resolved Hide resolved

def initialize(namespace:, context:, definition:, logger:, statsd_tags: [], crd:)
super(namespace: namespace, context: context, definition: definition,
logger: logger, statsd_tags: statsd_tags)
@crd = crd
end

def timeout
timothysmith0609 marked this conversation as resolved.
Show resolved Hide resolved
timeout_override || @crd.timeout_for_instance || TIMEOUT
end

def deploy_succeeded?
return super unless rollout_conditions
return false unless observed_generation == current_generation

rollout_conditions.rollout_successful?(@instance_data)
end

def deploy_failed?
return super unless rollout_conditions
return false unless observed_generation == current_generation
KnVerey marked this conversation as resolved.
Show resolved Hide resolved

rollout_conditions.rollout_failed?(@instance_data)
end

def failure_message
return super unless rollout_conditions
messages = rollout_conditions.failure_messages(@instance_data)
messages.join("\n") if messages.present?
end

def timeout_message
if rollout_conditions && current_generation != observed_generation
TIMEOUT_MESSAGE_DIFFERENT_GENERATIONS
else
super
end
end

def status
if !exists? || rollout_conditions.nil?
super
elsif deploy_succeeded?
"Healthy"
elsif deploy_failed?
"Unhealthy"
else
"Unknown"
end
end

def type
kind
end

def validate_definition(kubectl)
super

@crd.validate_rollout_conditions
rescue RolloutConditionsError => e
@validation_errors << "The CRD that specifies this resource is using invalid rollout conditions. " \
"Kubernetes-deploy will not be able to continue until those rollout conditions are fixed.\n" \
"Rollout conditions can be found on the CRD that defines this resource (#{@crd.name}), " \
"under the annotation #{CustomResourceDefinition::ROLLOUT_CONDITIONS_ANNOTATION}.\n" \
"Validation failed with: #{e}"
end

private

def kind
@definition["kind"]
end

def rollout_conditions
@crd.rollout_conditions
end
end
end
Loading