Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

fix: Re-ordering HNS policy removal due to 10c change in behavior and consolidating logic in Windows cleanup scripts #4002

Merged
merged 3 commits into from
Nov 6, 2020

Conversation

jsturtevant
Copy link
Contributor

Reason for Change:

First step in refactoring shared logic in startup and windows node reset. We learned in 10c update (#3956) adds a breaking change that requires Get-HnsPolicyList | Remove-HnsPolicyList to be removed before HNS-Network. We also need turn off kubeproxy while doing this.

This also adds removed containerd images during the kubelet startup.

We should be able to have a follow up with a change that shares logic between startup and windowsnodereset. This needs a little more consideration so is not included here.

Issue Fixed:

Credit Where Due:

Does this change contain code from or inspired by another project?

  • No
  • Yes

If "Yes," did you notify that project's maintainers and provide attribution?

  • No
  • Yes

Requirements:

Notes:
Requires signed scripts before kicking off tests
/hold

@codecov
Copy link

codecov bot commented Nov 4, 2020

Codecov Report

Merging #4002 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #4002   +/-   ##
=======================================
  Coverage   73.69%   73.69%           
=======================================
  Files         147      147           
  Lines       23164    23164           
=======================================
  Hits        17070    17070           
  Misses       4979     4979           
  Partials     1115     1115           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d36d436...475822a. Read the comment docs.

@@ -49,6 +49,17 @@ if ($global:EnableHostsConfigAgent) {
# Perform cleanup
#

Write-Log "Cleaning up persisted HNS policy lists"
# Workaround for https://github.com/kubernetes/kubernetes/pull/68923 in < 1.14,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove these two workaround comments or say that the October patch impact ALL versions to make sure this doesn't get accidently cleaned up in the future?

@marosset
Copy link
Contributor

marosset commented Nov 4, 2020

@jsturtevant should we figure out a way to have kubeletstart.ps1 call into a portion of the windowsnodereset script so the contents will not drift again?

I'm thinking maybe add a -CleanupOnly flag to windowsnodereset.ps1 that just stops services and performs the cleanup.
kubeletstart.ps1 can call windowsnodereset.ps1 -CleanupOnly then start kubelet.exe

Either that or create a cleanup script that both of these existing scripts calls.

@jsturtevant
Copy link
Contributor Author

Adding a -CleanupOnly flag to windowsnodereset.ps1 should work. I initially didn't do this because I wanted to keep the refactor small. I will give this a go since the I've already factored out the function

@jsturtevant jsturtevant force-pushed the fix/provisioningscripts branch 2 times, most recently from 757d491 to ce8d812 Compare November 5, 2020 00:14
@jsturtevant
Copy link
Contributor Author

added -CleanupOnly flag, Running tests against new scripts now will report back

@jsturtevant
Copy link
Contributor Author

After testing, the approach of using -CleanupOnly didn't work well because of two things:

  • log file contention between different processes
  • NodeReset starts/stops several services including kubelet. If we are in the startup of kubelet then we shouldn't stop our selves.

@jsturtevant jsturtevant force-pushed the fix/provisioningscripts branch from 843c218 to 72cbca1 Compare November 5, 2020 17:07
@jsturtevant
Copy link
Contributor Author

I've updated to use a common file that only cleans up the network and other files. It is up to the caller to turn off the services required.

@jsturtevant
Copy link
Contributor Author

These changes passed a local test run on 1.19 on a 10c image

Ran 45 of 64 Specs in 3187.352 seconds
SUCCESS! -- 45 Passed | 0 Failed | 0 Pending | 19 Skipped
PASS

@jsturtevant jsturtevant changed the title fix: provisioningscripts fix: Windows provisioning scripts Nov 5, 2020
@marosset
Copy link
Contributor

marosset commented Nov 5, 2020

/lgtm

@acs-bot acs-bot added the lgtm label Nov 5, 2020
@marosset marosset changed the title fix: Windows provisioning scripts fix: Re-ordering HNS policy removal due to 10c change in behavior and consolidating logic in Windows cleanup scripts Nov 5, 2020
@jsturtevant jsturtevant force-pushed the fix/provisioningscripts branch from 72cbca1 to 25666ab Compare November 5, 2020 21:22
@acs-bot acs-bot removed the lgtm label Nov 5, 2020
@marosset
Copy link
Contributor

marosset commented Nov 5, 2020

/lgtm

@acs-bot acs-bot added the lgtm label Nov 5, 2020
@acs-bot acs-bot removed the lgtm label Nov 5, 2020
@marosset
Copy link
Contributor

marosset commented Nov 6, 2020

Whoo -
image

@marosset
Copy link
Contributor

marosset commented Nov 6, 2020

/lgtm

@acs-bot acs-bot added the lgtm label Nov 6, 2020
@marosset marosset merged commit 1d2b73c into Azure:master Nov 6, 2020
@acs-bot
Copy link

acs-bot commented Nov 6, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsturtevant, marosset

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jsturtevant,marosset]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jsturtevant jsturtevant deleted the fix/provisioningscripts branch November 6, 2020 00:29
jackfrancis pushed a commit that referenced this pull request Nov 13, 2020
… consolidating logic in Windows cleanup scripts (#4002)

* remove policy list before network

* using common network clean up logic

* add timeout for e2e tests to run
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants