K8s 110 ci with fixes #308

dturn · 2018-06-29T17:28:58Z

In 1.10 progressDeadlineSeconds is in instance data even if its not in the spec. As a result, our logic for progressDeadlineSeconds is triggered instead of the hard-timeout logic.

This is #299 with a commit to fix the issue.

stefanmb · 2018-06-29T17:38:17Z

👀 Thanks for looking at this, I dropped the ball with following up.

stefanmb · 2018-06-29T17:57:10Z

lib/kubernetes-deploy/kubernetes_resource/deployment.rb

@@ -119,14 +119,14 @@ def rollout_data
    end

    def progress_condition
-      return unless exists?
+      return unless exists? && progress_deadline


I'm a bit rusty on this code - I understand why the progress_deadline body had to be changed, but why/how was progress_condition working with just the exists? check before this PR?

In 1.10 the Progressing condition (added by a k8s controller) is now being added to deployments even when progressDeadlineSeconds isn't in the spec deployed. This isn't happening in 1.9. (I'm not sure why it's happening. A guess, who knows if its valid since I'm not sure where in the k8s diff between 1.9-1.10 to look, is that the default value of progressDeadlineSeconds is getting merged in sooner than before?)

The test in question uses an extensions/v1beta1 Deployment. PDS is not supposed to get defaulted at all in that GV... maybe 1.10 added a bug? Can you confirm whether PDS itself is being defaulted now, or whether it's just the Progressing condition being added (and never expiring, I guess?)?

If we change our progress deadline to depend on if @definition['spec']['progressDeadlineSeconds'].present? as in this PR, we'll have a behaviour change: previously, if your Deployment got a PDS via defaulting (e.g. an apps/v1 Deployment with no PDS specified), we'd use a PDS-based timeout. After this change, cases like that would use the hard timeout.

Note that the 1.10 docs still claim there is no default value in extensions/v1beta1. And the controller code removes the Progressing condition if no PDS is set: https://github.com/kubernetes/kubernetes/blob/74bcefc8b2bf88a2f5816336999b524cc48cf6c0/pkg/controller/deployment/progress.go#L38-L41 😕

I took a look at an older deployment that didn't have PDS set and also using extensions/v1beta1. It does indeed get its PDS set with the default 600 seconds in its spec. Could it be that the apiserver is using apps/v1 under the hood and the default is being injected into the extensions/v1beta1 spec?

The spec definition: https://github.com/Shopify/u2/blob/73c0ed1f461b319bd5adb35a5865f9beacf82a9e/config/deploy/staging/web.yml.erb#L62

Inside the cluster (PDS has been set automatically somewhere):

spec: progressDeadlineSeconds: 600 replicas: 3 revisionHistoryLimit: 2 selector: ...

The controller code linked to by Katrina seems to suggest that you need to explicitly set PDS to nil to avoid progressing, not rely on the default when viewed in this light.

👍 that looks like good evidence. Can you open the issue upstream?

Issue created kubernetes/kubernetes#66135

@KnVerey

1 - If the resource has the kubernetes-deploy.shopify.io/timeout-override annotation on it, use that. This is currently supported for all resources except deployment.

I agree we should add this, but I'd like to keep the scope of this PR to be just whats needed to make CI work with 1.10

SGTM. We can just open an issue.

Issue created #315

…n the spec

dturn · 2018-07-12T17:37:50Z

test/integration/kubernetes_deploy_test.rb

+        %r{Deployment/bad-probe: TIMED OUT \(timeout: \d+s\)},
+        "Timeout reason: hard deadline for Deployment"
+      ]
+      end_bad_probe_logs = [%r{Unhealthy: Readiness probe failed: .* \(8 events \)}] # event


I think its ok to ignore Policial here since we've been using the %r{} syntax for regex and I don't think we want to start using the // syntax.

The rule it is enforcing is the "mixed" style if you look at the rubocop cache file. If we don't want that, we need to change our rubocop rules. But in general I'm inclined to stick with the standard style guide.

dturn · 2018-07-12T20:55:59Z

CI is passing, can I get 👀 again

timothysmith0609

Seems straightforward now

KnVerey · 2018-07-13T18:03:21Z

dev.yml

@@ -8,7 +8,7 @@ up:
  - custom:
      name: Minikube Cluster
      met?: test $(minikube status | grep Running | wc -l) -eq 2 && $(minikube status | grep -q 'Correctly Configured')
-      meet: minikube start --kubernetes-version=v1.9.4 --vm-driver=hyperkit
+      meet: minikube start --kubernetes-version=v1.10.4 --vm-driver=hyperkit


1.10.0 is still the latest available in minikube afaik

KnVerey · 2018-07-13T18:08:20Z

test/integration/kubernetes_deploy_test.rb

+        %r{Deployment/bad-probe: TIMED OUT \(timeout: \d+s\)},
+        "Timeout reason: hard deadline for Deployment"
+      ]
+      end_bad_probe_logs = [%r{Unhealthy: Readiness probe failed: .* \(8 events \)}] # event


The rule it is enforcing is the "mixed" style if you look at the rubocop cache file. If we don't want that, we need to change our rubocop rules. But in general I'm inclined to stick with the standard style guide.

KnVerey

CI is passing

It is, but it is waiting 600s for that deployment. This should not merge until it doesn't cause a regression in our CI times. How to do that will have to be case-by-case, depending on what the purpose of using that fixture was for a given test.

KnVerey · 2018-07-13T18:15:52Z

test/integration/kubernetes_deploy_test.rb

+    bad_probe_timeout = if KUBE_SERVER_VERSION < Gem::Version.new("1.10.0")
+      "Deployment/bad-probe: TIMED OUT (timeout: 5s)"
+    else
+      "Deployment/bad-probe: GLOBAL WATCH TIMEOUT (20 seconds)"


In this version, bad-probe has no purpose since missing-volumes is already covering the global watch timeout.

KnVerey · 2018-07-19T22:35:43Z

test/integration/kubernetes_deploy_test.rb

@@ -534,7 +534,11 @@ def test_pruning_of_existing_managed_secrets_when_ejson_file_has_been_deleted

  def test_deploy_result_logging_for_mixed_result_deploy
    subset = ["bad_probe.yml", "init_crash.yml", "missing_volumes.yml", "config_map.yml"]
-    result = deploy_fixtures("invalid", subset: subset)
+    result = deploy_fixtures("invalid", subset: subset) do |f|
+      if KUBE_SERVER_VERSION >= Gem::Version.new("1.10.0")


Can you please add an inline comment by the first of these changes in each test, linking to the issue you opened?

dturn requested review from KnVerey, stefanmb and klautcomputing June 29, 2018 17:35

stefanmb mentioned this pull request Jun 29, 2018

Enable Kubernetes 1.10 Tests #299

Closed

stefanmb reviewed Jun 29, 2018

View reviewed changes

stefanmb approved these changes Jun 29, 2018

View reviewed changes

stefanmb and others added 4 commits July 9, 2018 12:13

Test 110

3b464f0

Run on normal queue

0f347d1

Bump dev.yml

7ed8d14

In 1.10 progressDeadlineSeconds is in instance data even if its not i…

3e4416a

…n the spec

dturn force-pushed the k8s-110-ci-with-fixes branch 3 times, most recently from dc3f729 to 2dd27a2 Compare July 12, 2018 17:31

dturn commented Jul 12, 2018

View reviewed changes

dturn force-pushed the k8s-110-ci-with-fixes branch from 2dd27a2 to 778ba0b Compare July 12, 2018 17:41

Fix up tests

8b68bee

dturn force-pushed the k8s-110-ci-with-fixes branch from 778ba0b to 8b68bee Compare July 12, 2018 18:25

timothysmith0609 approved these changes Jul 12, 2018

View reviewed changes

KnVerey reviewed Jul 13, 2018

View reviewed changes

PR feedback

a0a96c8

KnVerey suggested changes Jul 13, 2018

View reviewed changes

dturn force-pushed the k8s-110-ci-with-fixes branch from 4683381 to 18b2724 Compare July 17, 2018 17:10

Fix test regressions

1f909ae

dturn force-pushed the k8s-110-ci-with-fixes branch from 18b2724 to 1f909ae Compare July 17, 2018 17:33

KnVerey approved these changes Jul 19, 2018

View reviewed changes

dturn added 2 commits July 19, 2018 16:35

Link to issue that requires code changes

4763051

Trigger CI

1b94f62

dturn merged commit e63a1a3 into master Jul 20, 2018

dturn deleted the k8s-110-ci-with-fixes branch July 20, 2018 00:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8s 110 ci with fixes #308

K8s 110 ci with fixes #308

dturn commented Jun 29, 2018 •

edited

Loading

stefanmb commented Jun 29, 2018

stefanmb Jun 29, 2018

dturn Jun 29, 2018

KnVerey Jul 9, 2018

KnVerey Jul 9, 2018

timothysmith0609 Jul 10, 2018 •

edited

Loading

KnVerey Jul 12, 2018

dturn Jul 12, 2018

dturn Jul 12, 2018

KnVerey Jul 12, 2018

dturn Jul 12, 2018

dturn Jul 12, 2018

KnVerey Jul 13, 2018

dturn commented Jul 12, 2018

timothysmith0609 left a comment

KnVerey Jul 13, 2018

KnVerey Jul 13, 2018

KnVerey left a comment

KnVerey Jul 13, 2018

KnVerey Jul 19, 2018

K8s 110 ci with fixes #308

K8s 110 ci with fixes #308

Conversation

dturn commented Jun 29, 2018 • edited Loading

stefanmb commented Jun 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

timothysmith0609 Jul 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dturn commented Jul 12, 2018

timothysmith0609 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KnVerey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dturn commented Jun 29, 2018 •

edited

Loading

timothysmith0609 Jul 10, 2018 •

edited

Loading