Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

Disable coreOS auto-updates #1241

Merged
merged 1 commit into from
May 1, 2018
Merged

Disable coreOS auto-updates #1241

merged 1 commit into from
May 1, 2018

Conversation

jorge07
Copy link
Contributor

@jorge07 jorge07 commented Apr 16, 2018

Fixes #1240

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 16, 2018
@jorge07
Copy link
Contributor Author

jorge07 commented Apr 16, 2018

Not tested yet

@codecov-io
Copy link

codecov-io commented Apr 16, 2018

Codecov Report

Merging #1241 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1241   +/-   ##
=======================================
  Coverage   36.63%   36.63%           
=======================================
  Files          63       63           
  Lines        3882     3882           
=======================================
  Hits         1422     1422           
  Misses       2242     2242           
  Partials      218      218
Impacted Files Coverage Δ
core/controlplane/config/config.go 62.81% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5bb948d...1e8c5eb. Read the comment docs.

Copy link
Contributor

@mumoshu mumoshu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this. I'm trying to understand... Perhaps you manually rebooted your node so that the update applied to your node?

I'm saying this because we already disable automatic "reboot" normally scheduled after the update has downloaded. And it practically allowed us to prevent automatic updates being actually applied:

#cloud-config
coreos:
  update:
    reboot-strategy: "off"

Anyway, your change LGTM overall. Please notify me once you've finished testing on your side. Thanks!

@@ -13,6 +13,9 @@ s3URI: {{.S3URI}}
# The AMI ID of CoreOS.
amiId: "{{.AmiId}}"

# CoreOS has automatic updates https://coreos.com/os/docs/latest/update-strategies.html. This can be a risk in certain situations. Set this param to true to disable it in all instances.
disableCoreOSUpdates: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer disableContainerLinuxAutomaticUpdates as it has officially renamed to Container Linux a year or so ago 😃

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 19, 2018
@jorge07
Copy link
Contributor Author

jorge07 commented Apr 19, 2018

@mumoshu I updated the PR renaming this. I did not have time to test it yet, but I'll try to do it asap.

@davidmccormick
Copy link
Contributor

I think that, once tested, this should be on by default - not behind a feature gate. :)

@mumoshu
Copy link
Contributor

mumoshu commented Apr 23, 2018

@davidmccormick Ah! That's definitely a good point. I agree.

@jorge07
Copy link
Contributor Author

jorge07 commented Apr 24, 2018

Sorry for this waiting for a week. I've the remote team in the office and it's a super busy week. I hope find some time for rebase the branch and launch the cluster next week as I think it would be great have it ready for 0.9.10 final release

@mumoshu
Copy link
Contributor

mumoshu commented Apr 25, 2018

@jorge07 No worry but I just want to sync up with you that I am about to cut v0.9.10 this week! Do you have any reason you want this before #1233 which is I'm going to merge soon after cutting v0.9.10?

@jorge07
Copy link
Contributor Author

jorge07 commented Apr 26, 2018

I did the QA today and this is not working at all.

The update-engine.service is running event is sysctemctl status show it as masked.

If I restart the service it fails, obviously because is masked. Any suggestions?

I followed this: https://coreos.com/os/docs/latest/update-strategies.html#disable-automatic-updates-daemon

@jorge07
Copy link
Contributor Author

jorge07 commented Apr 27, 2018

With this last change I've it stopped... but:

Update Strategy: No Reboots
Failed Units: 1
  update-engine.service
core@xxx ~ $ systemctl status update-engine
● update-engine.service - Update Engine
   Loaded: loaded (/usr/lib/systemd/system/update-engine.service; disabled; vend
   Active: failed (Result: exit-code) since Fri 2018-04-27 12:14:01 UTC; 4min 27
 Main PID: 789 (code=exited, status=1/FAILURE)

Apr 27 12:13:44 localhost systemd[1]: Starting Update Engine...
Apr 27 12:13:44 localhost update_engine[789]: I0427 12:13:44.362311   789 main.cc:89] CoreOS Update Engine starting
Apr 27 12:13:44 localhost systemd[1]: Started Update Engine.
Apr 27 12:13:44 localhost update_engine[789]: I0427 12:13:44.378746   789 update_check_scheduler.cc:74] Next update check in 11m58s
Apr 27 12:14:01 ip-xxx.eu-west-1.compute.internal systemd[1]: Stopping Update Engine...
Apr 27 12:14:01 ip-xxx.eu-west-1.compute.internal systemd[1]: update-engine.service: Main process exited, code=exited, status=1/FAILURE
Apr 27 12:14:01 ip-xxx.eu-west-1.compute.internal systemd[1]: update-engine.service: Failed with result 'exit-code'.
Apr 27 12:14:01 ip-xxx.eu-west-1.compute.internal systemd[1]: Stopped Update Engine.

Help wanted here...

@mumoshu
Copy link
Contributor

mumoshu commented Apr 27, 2018

@jorge07 Thanks for your efforts. Unfortunately, it seems like we need ignition to actually disable update-engine according to coreos/bugs#1646 and linked issues...

An alternative idea is that to automatically remove update-engine related systemd unit file somewhere in the filesystem so that update-engine would never start in case of a reboot.

@mumoshu
Copy link
Contributor

mumoshu commented Apr 27, 2018

Hm, or maybe just specifying both command: stop and mask: true would do the job?

ref: coreos/bugs#1982 (comment)

@jorge07
Copy link
Contributor Author

jorge07 commented Apr 27, 2018

I tried masking the update engine with the stop option too. Same issue, it ends as failed. The cluster its created and it works, but having a failed unit its not nice. Should I start considering refactor into ignition? Sounds like a huge epic

@mumoshu
Copy link
Contributor

mumoshu commented Apr 29, 2018

@jorge07 Thanks for your efforts. If you still want to make it in v0.9.10, I'd recommend my suggestion of removing update-engine(and possibly locksmithd related units).

It would look something like introducing disable-automatic-update.service which would basically run:

for u in update-engine locksmithd; do
  find / | grep lib64/systemd | grep $u | xargs rm
  systemctl stop ${u}.service
  systemctl disable ${u}.service
fi

You probably need this to clear failed units:

systemctl daemon-reload
systemctl reset-failed

But beware that this is completely untested!

@mumoshu
Copy link
Contributor

mumoshu commented Apr 29, 2018

Of course, I'd appreciate it if you could work on the migration to ignition 👍

@jorge07
Copy link
Contributor Author

jorge07 commented Apr 29, 2018

I thought the same, I think that create a work around unit that can stop it and clean the failures it's the best option meanwhile migrate into ignition. I'll work on this next week. Thanks for the suggestions!

@jorge07
Copy link
Contributor Author

jorge07 commented Apr 30, 2018

It works

ssh core@...

Container Linux by CoreOS stable (1688.5.3)
Update Strategy: No Reboots
core@ip-w-e-b-a ~ $ systemctl status update-engine
● update-engine.service
   Loaded: masked (/dev/null; bad)
   Active: inactive (dead) since Mon 2018-04-30 15:22:00 UTC; 42s ago

@mumoshu
Copy link
Contributor

mumoshu commented May 1, 2018

@jorge07 LGTM. Thank you very much for your effort 🍻

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants