Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete and redeploy object upon error 'field is immutable' #940

Merged
merged 10 commits into from
Oct 10, 2018

Conversation

priyawadhwa
Copy link
Contributor

@priyawadhwa priyawadhwa commented Aug 29, 2018

As discussed in #891, when running skaffold dev certain immutable Kubernetes
objects (like Jobs) can't be redeployed. A 'field is immutable' error is
returned when this happens.

To fix this issue, we can check the error from kubectl apply for 'field
is immutable'. If we find it, we can delete the object and try to deploy
it again.

Adds an integration test for skaffold dev (#441)

As discussed in GoogleContainerTools#891, when running skaffold dev certain immutable Kubernetes
objects (like Jobs) can't be redeployed. A 'field is immutable' error is
returned when this happens.

To fix this issue, we can check the error from kubectl apply for 'field
is immutable'. If we find it, we can delete the object and try to deploy
it again.
As discussed in GoogleContainerTools#891, when running skaffold dev certain immutable Kubernetes
objects (like Jobs) can't be redeployed. A 'field is immutable' error is
returned when this happens.

To fix this issue, we can check the error from kubectl apply for 'field
is immutable'. If we find it, we can delete the object and try to deploy
it again.
@codecov-io
Copy link

codecov-io commented Aug 29, 2018

Codecov Report

Merging #940 into master will decrease coverage by 0.13%.
The diff coverage is 12.5%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #940      +/-   ##
==========================================
- Coverage   42.57%   42.44%   -0.14%     
==========================================
  Files          71       71              
  Lines        3239     3254      +15     
==========================================
+ Hits         1379     1381       +2     
- Misses       1727     1740      +13     
  Partials      133      133
Impacted Files Coverage Δ
pkg/skaffold/deploy/kubectl/cli.go 0% <0%> (ø) ⬆️
pkg/skaffold/deploy/kubectl/version.go 0% <0%> (ø) ⬆️
pkg/skaffold/kubernetes/wait.go 27.02% <0%> (-2.39%) ⬇️
pkg/skaffold/deploy/kubectl.go 52.38% <100%> (+1.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 95df153...eb06f39. Read the comment docs.

@bhack
Copy link

bhack commented Aug 30, 2018

/cc @lenlen

}
// If the output contains the string 'field is immutable', we want to delete the object and recreate it
// See Issue #891 for more information
if err := c.Detete(ctx, out, manifests); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will need a rebase on balint's PR

Copy link
Contributor

@balopat balopat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please write an integration test for this?

@priyawadhwa priyawadhwa force-pushed the job branch 2 times, most recently from 9744858 to 2e5ac25 Compare August 30, 2018 22:54
I added an integration test to make sure a Job is deleted and redeployed
upon changes when running via skaffold dev. The test sets up by creating
a file foo. It runs skaffold dev and make sure the Job is created. It
then changes foo so that skaffold redeploys, and makes sure the UID of
the new Job is different from the UID of the old job.
@priyawadhwa
Copy link
Contributor Author

@balopat for sure, I added one!

@dgageot
Copy link
Contributor

dgageot commented Aug 31, 2018

@priyawadhwa should we switch to applying object one by one so that if only one can't be recreated, we don't force delete the others?

for _, mfst := range manifests {
buf := bytes.NewBuffer([]byte{})
writer := bufio.NewWriter(buf)
ml := ManifestList{mfst}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to convert this back to a manifest list, mfst is just a []byte at this point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I had to convert it because the Delete function takes in a ManifestList

func (c *CLI) Delete(ctx context.Context, out io.Writer, manifests ManifestList) error

@dgageot
Copy link
Contributor

dgageot commented Aug 31, 2018

Is applying one by one slower? How much?

@priyawadhwa
Copy link
Contributor Author

@dgageot I'm not sure what the best way to figure this out would be, is there benchmarking for skaffold set up?

I would guess that it's probably faster to go one by one than it would be to (potentially) delete and recreate all objects.

@r2d4
Copy link
Contributor

r2d4 commented Aug 31, 2018

We can compare the integration test timings to the previous runs to get a rough idea

@priyawadhwa
Copy link
Contributor Author

Timings with this PR:

--- PASS: TestRun (180.37s)
    --- PASS: TestRun/getting-started_example (26.92s)
    --- PASS: TestRun/annotated_getting-started_example (8.94s)
    --- PASS: TestRun/getting-started_envTagger (14.37s)
    --- PASS: TestRun/gcb_builder_example (31.13s)
    --- PASS: TestRun/deploy_kustomize (4.46s)
    --- PASS: TestRun/bazel_example (47.35s)
    --- PASS: TestRun/kaniko_example (35.48s)
    --- PASS: TestRun/helm_example (11.73s)
--- PASS: TestDev (13.37s)
    --- PASS: TestDev/delete_and_redeploy_job (13.37s)
--- PASS: TestFix (14.83s)
PASS

A previous run:

--- PASS: TestRun (156.64s)
    --- PASS: TestRun/getting-started_example (18.55s)
    --- PASS: TestRun/annotated_getting-started_example (7.16s)
    --- PASS: TestRun/getting-started_envTagger (7.25s)
    --- PASS: TestRun/gcb_builder_example (29.19s)
    --- PASS: TestRun/deploy_kustomize (3.27s)
    --- PASS: TestRun/bazel_example (48.78s)
    --- PASS: TestRun/kaniko_example (33.81s)
    --- PASS: TestRun/helm_example (8.62s)
--- PASS: TestFix (16.07s)
PASS

So it does take a few seconds more for all of the tests except for the bazel test. Since this change only applies to developing with certain objects (Jobs, CronJobs) maybe it would be better to delete all objects and recreate instead?

@priyawadhwa
Copy link
Contributor Author

Update: @r2d4 suggested we use the --force flag with kubectl apply which will delete and recreate an object if patching fails:

  --force=false: Delete and re-create the specified resource, when PATCH encounters conflict and has retried for 5 times.

Support for using this flag with the immutable error was added in this PR, which hasn't been released officially. As discussed offline, we'll wait for kubectl v1.12.0 to come out so that we can use this flag.

@balopat
Copy link
Contributor

balopat commented Sep 11, 2018

One more comment: we should be careful about version support for kubectl and make it clear in the docs what works. Also we could print a warning for lower than supported kubectl versions.

@dgageot dgageot added the wip label Sep 29, 2018
The --force flag will delete and redeploy a deployment if 'kubectl
apply' doesn't work because a field is immutable. Updated the skaffold
deploy Dockerfile to reflect this change, added a note in the docs that
kubectl > 1.12.0 is recommended, and added a check in the kubectl
deployer for the version.
@priyawadhwa
Copy link
Contributor Author

kubectl v1.12.0 is out, this should be RFAL :)

@priyawadhwa priyawadhwa removed the wip label Oct 1, 2018
Copy link
Contributor

@nkubala nkubala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple nits, but otherwise this seems fine

@@ -28,6 +28,8 @@ tools for deployment, for example `kubectl` or `helm`.
Each deployment type has parameters that allow you to
define how you want your app to be installed and updated.

_Note: kubectl version 1.12.0 or greater is recommended for use with skaffold._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth documenting the known issues for earlier kubectl versions for users

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for sure, would this be the right spot for that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, sorry missed your reply on this. FWIW this is a fine place to do it :)

func (c *CLI) CheckVersion() error {
m, err := strconv.Atoi(c.Version().Minor)
if err != nil {
return fmt.Errorf("couldn't get kubectl minor version: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: errors.Wrapf(err, "retrieving kubectl minor version")

return fmt.Errorf("couldn't get kubectl minor version: %v", err)
}
if m < 12 {
return fmt.Errorf("kubectl version 1.12.0 or greater is recommended for use with skaffold")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: errors.New("...")

@dgageot
Copy link
Contributor

dgageot commented Oct 2, 2018

@priyawadhwa I wonder if we could use that feature only when kubectl is 1.12+

@priyawadhwa
Copy link
Contributor Author

@dgageot so the flag exists in lower versions it just doesn't work as expected, so if a user has a lower version then the same bug from the issue will occur.

@albertkang
Copy link

Any ideas when this will get merged?

@dgageot dgageot dismissed balopat’s stale review October 10, 2018 16:07

I think your feedback was taken into account

@dgageot dgageot merged commit ce01bd7 into GoogleContainerTools:master Oct 10, 2018
@balopat balopat mentioned this pull request Nov 13, 2018
@haf-afa
Copy link

haf-afa commented Oct 29, 2019

Shouldn't it be the same for skaffold run? Because for me that errors.

@haf
Copy link

haf commented Feb 2, 2020

Ping, @priyawadhwa @dgageot , see the comment from last year: this is still broken

@priyawadhwa priyawadhwa deleted the job branch February 4, 2020 00:27
@priyawadhwa
Copy link
Contributor Author

hey @haf -- what version of kubectl are you using? This feature only works with kubectl >1.12; if you're already on that, would you mind opening an issue so this bug can be tracked?

@balopat
Copy link
Contributor

balopat commented Mar 24, 2020

hi @haf - you can override the behavior for skaffold run with --force - does that work for you?

@haf
Copy link

haf commented Mar 24, 2020

I’m using latest of everything.

I never do force on almost anything. Only interested in happy path for the team.

@nkubala nkubala added the triage/discuss Items for discussion label May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/discuss Items for discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants