Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

chore: don't re-run provision.sh if already called. #2843

Merged
merged 6 commits into from
Mar 10, 2020

Conversation

paulgmiller
Copy link
Member

@paulgmiller paulgmiller commented Mar 4, 2020

Because commandToExecute changes with every upgrade (since the kubernetes version env variable changes) CSE are run when AKS tried to do an instance upgrade then reimage (which vmss is recommending as preferred over creating and deleting repeatedly).

Rerunning the CSE's seems to hit a file watch timeout maybe 1:8 or 1:16 times. I have not found the file responsible just the error code.

However it seems like the CSE's were not made to be idempotent and maybe shoudl just abort if they've already run to completion. So maybe it's a good idea in general to check the provision.complete sentinal file that already gets emited at the end of the script?

Reason for Change:

Issue Fixed:

Requirements:

Notes:

Still trying to gigure out what actually changes here that causes us to rerun CSE's on model update before reimage. 
Nathan Kutcha says any change under settings/protected setting will trigger it (we don't set timestamp or forceupdate so it's not that)
Dumping templates  at create and upgrade code paths to see if any of the arm variables change  (provisionScriptParametersCommon, provisionScriptParametersMaster)
@paulgmiller
Copy link
Member Author

Still investigating here but creating a PR as it was easier and more informative than an issue maybe?

@jackfrancis jackfrancis changed the title Leave a note that changes to command to execute cause rerun. docs: code comment about changing value of commandToExecute Mar 5, 2020
@paulgmiller paulgmiller changed the title docs: code comment about changing value of commandToExecute Don't re-run provision.sh if already called. Mar 10, 2020
@jackfrancis jackfrancis changed the title Don't re-run provision.sh if already called. chore: don't re-run provision.sh if already called. Mar 10, 2020
@jackfrancis
Copy link
Member

@paulgmiller make generate and commit generated code to appease UT check

@codecov
Copy link

codecov bot commented Mar 10, 2020

Codecov Report

Merging #2843 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2843      +/-   ##
==========================================
- Coverage   72.47%   72.46%   -0.01%     
==========================================
  Files         140      140              
  Lines       25589    25675      +86     
==========================================
+ Hits        18545    18606      +61     
- Misses       5976     5993      +17     
- Partials     1068     1076       +8

@paulgmiller
Copy link
Member Author

don't suppose there's unittests for the bash script I can improve :)

@jackfrancis
Copy link
Member

@paulgmiller feel free to add some! :) j/k

We do use shellcheck, so there's something

Copy link
Member

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@jackfrancis jackfrancis merged commit bb68c7f into master Mar 10, 2020
@jackfrancis
Copy link
Member

Thanks @paulgmiller!

@jackfrancis jackfrancis deleted the paulgmiller-CSEBadRerun branch March 10, 2020 22:41
paulgmiller added a commit to paulgmiller/aks-engine that referenced this pull request Mar 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants