Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Fix transient CSE exit 9 #3230

Merged
merged 4 commits into from
Jun 8, 2018

Conversation

CecileRobertMichon
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon commented Jun 8, 2018

What this PR does / why we need it: This makes the cluster provisioning wait until cloud-init configuration has completed prior to updating the cache. A part of the cloud-init configuration includes adding apt source (templating), and it appears as though there was a race condition causing transient failures. (creds: @trstringer)

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #3204

Special notes for your reviewer:

If applicable:

  • documentation
  • unit tests
  • tested backward compatibility (ie. deploy with previous version, upgrade with this branch)

Release note:

@ghost ghost added the in progress label Jun 8, 2018
@acs-bot acs-bot added the size/S label Jun 8, 2018
# See https://github.com/kubernetes/kubernetes/blob/master/build/debian-hyperkube-base/Dockerfile#L25-L44
apt_get_install 20 30 180 apt-transport-https ca-certificates iptables iproute2 socat util-linux mount ebtables ethtool init-system-helpers nfs-common ceph-common conntrack glusterfs-client ipset jq || exit $ERR_APT_INSTALL_TIMEOUT
apt_get_install 20 30 300 apt-transport-https ca-certificates iptables iproute2 socat util-linux mount ebtables ethtool init-system-helpers nfs-common ceph-common conntrack glusterfs-client ipset jq || exit $ERR_APT_INSTALL_TIMEOUT
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackfrancis how do we feel about increasing the timeout here?

@CecileRobertMichon CecileRobertMichon changed the title Fix transient vmss CSE exit 9 Fix transient CSE exit 9 Jun 8, 2018
echo `date`,`hostname`, apt-get_update_begin>>/opt/m
apt_get_update || exit $ERR_APT_INSTALL_TIMEOUT
echo `date`,`hostname`, apt-get_update_end>>/opt/m
# make sure walinuxagent doesn't get updated in the middle of running this script
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: whitespace indent variance

@jackfrancis
Copy link
Member

This makes the cluster provisioning wait until cloud-init configuration has completed prior to updating the cache. A part of the cloud-init configuration includes adding apt source (templating), and it appears as though there was a race condition causing transient failures (for more information, please see #3204).

@codecov
Copy link

codecov bot commented Jun 8, 2018

Codecov Report

Merging #3230 into master will decrease coverage by 0.06%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3230      +/-   ##
==========================================
- Coverage   52.37%   52.31%   -0.07%     
==========================================
  Files         103      103              
  Lines       15430    15458      +28     
==========================================
+ Hits         8082     8087       +5     
- Misses       6621     6643      +22     
- Partials      727      728       +1

@CecileRobertMichon
Copy link
Contributor Author

@jackfrancis added as PR description

@ghost ghost removed the in progress label Jun 8, 2018
@ghost ghost added the in progress label Jun 8, 2018
@jackfrancis
Copy link
Member

/approve
/lgtm

@acs-bot
Copy link

acs-bot commented Jun 8, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon, jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [CecileRobertMichon,jackfrancis]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@CecileRobertMichon CecileRobertMichon merged commit 01da509 into Azure:master Jun 8, 2018
@ghost ghost removed the in progress label Jun 8, 2018
jackfrancis pushed a commit that referenced this pull request Jun 12, 2018
* increase timeout for apt-get install

* add apt_get_update

* move apt_get update block

* fix indents

# Conflicts:
#	parts/k8s/kubernetescustomscript.sh
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transient CSE exit code 9 on VMSS with k8s 1.10
3 participants