Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

fix: fix ARM dependency issues with vm user-specified extensions on node pools #2398

Merged
merged 3 commits into from
Dec 19, 2019

Conversation

marosset
Copy link
Contributor

@marosset marosset commented Dec 5, 2019

Reason for Change:

Issue Fixed:

Fixes issue #2392

Requirements:

Notes:

@marosset
Copy link
Contributor Author

marosset commented Dec 5, 2019

Code changes are functional but I want to improve the unit tests and add possibly add a new config to the jenkins matrix.

@codecov
Copy link

codecov bot commented Dec 5, 2019

Codecov Report

Merging #2398 into master will increase coverage by <.01%.
The diff coverage is 88.23%.

@@            Coverage Diff             @@
##           master    #2398      +/-   ##
==========================================
+ Coverage   72.57%   72.58%   +<.01%     
==========================================
  Files         130      130              
  Lines       23913    23921       +8     
==========================================
+ Hits        17354    17362       +8     
  Misses       5530     5530              
  Partials     1029     1029

@acs-bot acs-bot added size/L and removed size/M labels Dec 5, 2019
@acs-bot acs-bot added size/M and removed size/L labels Dec 6, 2019
@marosset marosset changed the title [WIP] fix: fix ARM dependency issues with vm extensions on node pools fix: fix ARM dependency issues with vm extensions on node pools Dec 6, 2019
@marosset
Copy link
Contributor Author

marosset commented Dec 6, 2019

I'll need to follow up with a Jenkin's job in a separate PR.
I'm having issues finding extensions that are both function and don't conflict with test content in the e2e tests (most extensions muck with dns which causes the dns livenessprobe test case to fail).
Also, specifying SKIP_TEST=true in the test cluster config does not cause the tests to skip.

@marosset
Copy link
Contributor Author

/cc @adelina-t

@acs-bot
Copy link

acs-bot commented Dec 10, 2019

@marosset: GitHub didn't allow me to request PR reviews from the following users: adelina-t.

Note that only Azure members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @adelina-t

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mboersma
mboersma previously approved these changes Dec 12, 2019
Copy link
Member

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

if properties.MasterProfile != nil {
// The first extension needs to depend on the billing extension created for all pools
// Each proceeding extension needs to depend on the previous one to avoid ARM conflicts in the Compute RP
nextDependsOn := "[concat('Microsoft.Compute/virtualMachines/', variables('masterVMNamePrefix'), copyIndex(variables('masterOffset')), '/extensions/computeAksLinuxBilling')]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't the billing extension change names depending on AKS/AKS Engine? Also it can be disabled. I don't think we can hardcode a dependency on computeAksLinuxBilling here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name of the extension doesn't change (which is weird)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that the billing extension can be disabled, is there a way to detect that in the properties @CecileRobertMichon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could disable the extension though. See IsAKSBillingEnabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll take a look

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah looks like it's the type that changes, not the name. AKS did change the vmss one at some point https://github.com/Azure/aks-engine/pull/1561/files but that was for VMSS so shouldn't affect this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we only need to serialize the deployments of the 'CustomScriptExtension' extensions. The billing extensions are their own type of extension so we don't need to worry about those.

It also looks like the way aks-engine is using custom script extensions for additional extensions is a bit fragile. According to https://azure.microsoft.com/en-us/blog/automate-linux-vm-customization-tasks-using-customscript-extension/ you are supposed to refernce all of the scripts you want to run in a single ARM resource since there can only be on 'CustomScriptExtension' active on a machine/scale set at any given time.

A larger refactor should probably be considered when we have time and/or if we need to support additional extensions for scale sets.

This is working pretty reliably as is in my local testing.

@marosset
Copy link
Contributor Author

/hold

@marosset marosset changed the title fix: fix ARM dependency issues with vm extensions on node pools fix: fix ARM dependency issues with vm user-specified extensions on node pools Dec 12, 2019
@acs-bot acs-bot added size/L and removed size/M labels Dec 14, 2019
@acs-bot acs-bot added size/M and removed size/L labels Dec 17, 2019
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@marosset
Copy link
Contributor Author

I'm going to do a little more testing before i remove the hold.

@acs-bot
Copy link

acs-bot commented Dec 17, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon, marosset

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [CecileRobertMichon,marosset]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@marosset
Copy link
Contributor Author

/hold cancel

@marosset marosset merged commit 666053e into Azure:master Dec 19, 2019
@marosset marosset deleted the bugfix/2392 branch December 19, 2019 16:49
acs-bot pushed a commit that referenced this pull request Jan 9, 2020
* Rashmi/ciprod12042019 (#3)

improvement: Update Containermonitoring addon for december release (#1)

* Fixing yaml indentation issues for omsagent (#4)

Fixing yaml indentation issues for omsagent

* Fixing more indentation issues in omsagent yaml (#5)

Fixing more indentation issues in omsagent yaml

* updating templates_generated file

* Fix: Updating generated file to fix unit tests (#6)

fix: Updating generated file to fix unit tests

* Adding test coverage for 1.16 and 1.17 (#7)

Adding test coverage for 1.16 and 1.17

* Merging changes for omsagent latest release - ciprod01072020 after syncing from remote master(#8)

* chore: use go template comments for generate proxy certs script (#2336)

* fix: fix ARM dependency issues with vm user-specified extensions on node pools (#2398)

* fix: fix ARM dependency issues if many extensions are specified for a node profile

* fix scale up case for windows vhd case. (#2483)

* refactor: make cilium addon user-configurable (#2480)

* refactor: make cilium addon user-configurable

* chore: clarify that cilium doesn't work w/ 1.16 and above, add validation

* test: addons UT

* test: go template UT

* ci: use Standard_D8_v3 for cilium test, only run NetworkPolicy tests

* fix: error message language

* chore: remove debug fmt.Println

* ci: revert back to Standard_D2_v3

* chore: upgrade cni-plugins to v0.7.6 (#2484)

* fix: hard-coding hyper-v generation when using VHD URls as a quick unblock (#2487)

* feat: Configuring docker log rotation for Windows nodes (#2478)

* feat: Antrea plugin support in AKS Engine (#2407)

* Antrea plugin support in AKS Engine

* chore: clean up

* chore: use ContainerImage

* chore: generated code

* refactor: Updating antrea yaml to 0.2.0

Co-authored-by: Jack Francis <jackfrancis@gmail.com>

* chore: lint (#2493)

* test: revert change to default kubernetes.json api model example (#2494)

* chore: update cloud-provider-azure components to v0.4.0 (#2473)

* chore: update cloud-provider-azure components to v0.4.0

See https://github.com/kubernetes-sigs/cloud-provider-azure/releases/tag/v0.4.0

* refactor: strip MCR constant to base hostname of URL

* fix: fetch Azure cloud-manager images from /oss/kubernetes/

* refactor: make audit-policy and azure-cloud-provider addons user-configurable (#2496)

* chore: pre-pull k8s v1.15.7-azs (#2490)

* fix: Fix some path handling in collect-windows-logs script (#2488)

* docs: remove mentions of old orchestrators (#2501)

* chore: Targeting dec patches for windows VHD (#2505)

* refactor: move StorageClass into azure-cloud-provider addon (#2497)

* add "Standard_DS3_v2" to "AcceleratedNetworking" supported list (#2509)

* ci: collect logs during E2E runs (#2520)

* refactor: user-configurable flannel and scheduled maintenance addons (#2517)

* chore: update Azure NPM to v1.0.31 (#2521)

* feat: add support for Kubernetes 1.18.0-alpha.1 (#2503)

* feat: add support for Kubernetes 1.18.0-alpha.1

See https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.18.md#changelog-since-v1170

* test: add 1.18 to Jenkinsfile

* ci: disable kms for 1.18

* chore: move flannel 1.18 spec to containeraddons

* chore: generated code

* fix: use new cloudprovider implementation for 1.18

Co-authored-by: Jack Francis <jackfrancis@gmail.com>

* test: don't test non-working >= 1.16 flannel + docker (#2524)

* fix: apply new master node labels for k8s v1.18+ compatibility (#2467)

* fix: apply new master node labels for k8s v1.18+ compatibility

* test: check master labels in the future for back-compat

* feat: cleaning up old kubelet/kubeproxy logs for Windows nodes (#2504)

* feat: cleaning up old kubelet/kubeproxy logs for Windows nodes

* Fixing path to look for logs

* generated files

* refactor: standardize to "addons", deprecate "containeraddons" (#2525)

* fix: configure addons before setting kubelet config (#2513)

* chore: update addon-resizer (#2527)

See https://github.com/kubernetes/autoscaler/releases/tag/addon-resizer-1.8.7

* fix: aci-connector region is ignored (#2535)

* test: use LOCATION env var for api model in E2E tests (#2542)

* fix: promote system addons to system-cluster-critical (#2533)

* test: use northeurope for byok testing (#2536)

* Changes for omsagent-version-ciprod01072020

* Committing generated file

Co-authored-by: Jack Francis <jackfrancis@gmail.com>
Co-authored-by: Mark Rossetti <marosset@microsoft.com>
Co-authored-by: Rohit <rjaini@microsoft.com>
Co-authored-by: Rahul Jain <58573065+reachjainrahul@users.noreply.github.com>
Co-authored-by: Matt Boersma <Matt.Boersma@microsoft.com>
Co-authored-by: Javier Darsie <44655727+jadarsie@users.noreply.github.com>
Co-authored-by: Patrick Lang <PatrickLang@users.noreply.github.com>
Co-authored-by: Wenjun Wu <wenjun.wu@live.com>
Co-authored-by: Jaeryn <13284103+jaer-tsun@users.noreply.github.com>
Co-authored-by: Anish Ramasekar <anish.ramasekar@gmail.com>

* deleting github merge auto-generated files

* Adding back 1.17 omsagent yaml changes

* Updating generated file to address build failures

Co-authored-by: Jack Francis <jackfrancis@gmail.com>
Co-authored-by: Mark Rossetti <marosset@microsoft.com>
Co-authored-by: Rohit <rjaini@microsoft.com>
Co-authored-by: Rahul Jain <58573065+reachjainrahul@users.noreply.github.com>
Co-authored-by: Matt Boersma <Matt.Boersma@microsoft.com>
Co-authored-by: Javier Darsie <44655727+jadarsie@users.noreply.github.com>
Co-authored-by: Patrick Lang <PatrickLang@users.noreply.github.com>
Co-authored-by: Wenjun Wu <wenjun.wu@live.com>
Co-authored-by: Jaeryn <13284103+jaer-tsun@users.noreply.github.com>
Co-authored-by: Anish Ramasekar <anish.ramasekar@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants