-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic Custom Resource Deployments #376
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method seems valid to me. I like having the generic CR class instead of attempting to dynamically define classes
Example rollout.
|
bc6bf72
to
b8d249a
Compare
@@ -10,7 +10,7 @@ def initialize(namespace:, context:, logger:, namespace_tags:) | |||
end | |||
|
|||
def crds(sync_mediator) | |||
sync_mediator.get_all(CustomResourceDefinition.kind).map do |r_def| | |||
@crds ||= sync_mediator.get_all(CustomResourceDefinition.kind).map do |r_def| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you deploy a CRD and a CR at same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's one shortcoming of the current approach, unfortunately. There's a few solutions available to us here:
- Add discovered CRD specs to the top of the priority list. This seems risk-free since they should have no external dependencies
- In the future, think about using a dependency graph to produce the priority list (e.g. a CloudSQL has something that says
iDependOn: cloudsqls.stable.shopify.io
. I'd say this is out of scope for this PR.
It's unclear to me whether this issue is handled right now, anyway. Not saying we shouldn't fix it here, just a note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's one shortcoming of the current approach, unfortunately.
What is the exact behaviour though? It still falls back on a KubernetesResource with the "dunno what to do here" message, right?
I'm on the fence about introducing an annotation-driven dependency graph, but FWIW we did have someone request customized sequencing for another reason earlier this week. Regardless, I agree it doesn't need to be handled in this PR.
@@ -2,6 +2,8 @@ | |||
module KubernetesDeploy | |||
class CustomResourceDefinition < KubernetesResource | |||
TIMEOUT = 2.minutes | |||
CHILD_CR_TIMEOUT_ANNOTATION = "kubernetes-deploy.shopify.io/cr-timeout-override" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to update the README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why CHILD_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open to suggestions. I'm mainly trying to avoid the perennial issue of confusing CRDs with CRs by being more explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this part of the other child rollout annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this annotation for now as I'm more concerned with the status aspect of this PR. If necessary, the base timeout-override annotation can be used in the meantime.
f47b1aa
to
3454521
Compare
Configurable success/failure conditionsI've decided to add configurable success/failure statuses as part of this PR. Users can supply a JSON string to the
For convenience, default values (which conform to our buddies Status implementation) are used if such fields are missing. LimitationsCurrently, resources, such as CloudSQL, reference other Kubernetes objects to discern their readiness (in CloudSQLs case, its deployment + service). In the new implementation, deploying resources are only able to observe themselves. |
ffd511e
to
ad1d6ec
Compare
lib/kubernetes-deploy/kubernetes_resource/custom_resource_definition.rb
Outdated
Show resolved
Hide resolved
@definition["kind"] | ||
end | ||
|
||
def rollout_params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This concept of rollout_params
and its structure isn't super clear to me in my first read of this class, and it seems like we always use it one piece at a time. Is there a better abstraction we can come up with here? Maybe CRD has a private ChildConfiguration
object that it exposes, and that has methods like error_message_path
and failure_status_path
and such? (I dunno--don't take that specific suggestion too seriously)
lib/kubernetes-deploy/kubernetes_resource/custom_resource_definition.rb
Outdated
Show resolved
Hide resolved
@@ -2,6 +2,8 @@ | |||
module KubernetesDeploy | |||
class CustomResourceDefinition < KubernetesResource | |||
TIMEOUT = 2.minutes | |||
CHILD_CR_TIMEOUT_ANNOTATION = "kubernetes-deploy.shopify.io/cr-timeout-override" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this part of the other child rollout annotation?
@@ -10,7 +10,7 @@ def initialize(namespace:, context:, logger:, namespace_tags:) | |||
end | |||
|
|||
def crds(sync_mediator) | |||
sync_mediator.get_all(CustomResourceDefinition.kind).map do |r_def| | |||
@crds ||= sync_mediator.get_all(CustomResourceDefinition.kind).map do |r_def| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's one shortcoming of the current approach, unfortunately.
What is the exact behaviour though? It still falls back on a KubernetesResource with the "dunno what to do here" message, right?
I'm on the fence about introducing an annotation-driven dependency graph, but FWIW we did have someone request customized sequencing for another reason earlier this week. Regardless, I agree it doesn't need to be handled in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's room for improvement in the way we model the rollout configuration data, but the overall class CustomResource
approach seems good to me. When you start the tests, please make sure to include one that proves the correct classes get instantiated (KubernetesResource vs CustomResource vs hardcoded CR class).
lib/kubernetes-deploy/kubernetes_resource/custom_resource_definition.rb
Outdated
Show resolved
Hide resolved
params if validate_params(params) | ||
|
||
rescue JSON::ParserError | ||
raise FatalDeploymentError, "custom rollout params are not valid JSON: '#{rollout_params_string}'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we actually fail the whole deploy on this, or is there a more graceful fallback behaviour we could adopt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a value-judgement we'll need to make. My opinion is that, if users are opting-in to use this feature, we should consider it critical and fail fast if something goes wrong.
On the other hand, aborting a deploy because of some bad JSON might cause too much friction.
lib/kubernetes-deploy/kubernetes_resource/custom_resource_definition.rb
Outdated
Show resolved
Hide resolved
3c9cae8
to
d044ef7
Compare
Example of custom query parameters:
|
Two questions about the example:
|
84e997c
to
2355fa3
Compare
add cr- prefix to rollout-conditions annotation README edit rename config ->conditions logs for serial deploy test police
serial integration test tweak" test failure_conditions optional policial
Co-Authored-By: timothysmith0609 <31742287+timothysmith0609@users.noreply.github.com>
e4dbe90
to
44f890a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work. Thanks for sticking with this long-running but very impactful feature!
Motivation and Goals
required for #229
see #128
This PR represents the kubernetes-deploy side of our custom resource status implementation. The goal of this PR is to create a backwards compatible means of meaningfully monitoring the rollouts of custom resources. It has the following goals:
deploy_succeeded? == true
if theReady
condition on the CR status is truedeploy_failed? == true
if theFailed
condition on the CR status is trueReady
andFailed
are falseImplementation details
(OUTDATED) Changes to
PREDEPLOY_SEQUENCE
Since we cannot know, a priori, the types of custom resources in a cluster, we must dynamically find them during the discovery phase. As an additional concern, I argue that the common case is that custom resources must be deployed before other kubernetes objects. That is, we need a working
cloudsql
before we can think of running adb-migrate
pod, e.g. As it stands, we hardcode this priority inside thePREDEPLOY_SEQUENCE
constant in deploy_task.rb. In order to maintain the rough ordering of thePREDEPLOY_SEQUENCE
const while also handling the case of dynamic custom resource discovery, I have moved the creation ofPREDEPLOY_SEQUENCE
into 2 separate phases.BASE_PREDEPLOY_SEQUENCE
.BASE_PREDEPLOY_SEQUENCE
. Using the result of this union, we then set thePREDEPLOY_SEQUENCE
constant. See here(OUTDATED) Caching discovered CRDs
As an implementation detail, I have opted to cache the value of
ResourceDiscovery.crds
in an instance variable. Linking together CRs with their parent CRDs requires passing around the list of CRDs in a number of places, and it doesn't seem risky to avoid the extra work of rediscovering them for every call.TODO
Ready
andFailed
?cc @Shopify/cloudx