Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

feat: Use NSSM for containerd and collect containerd logs #4219

Merged
merged 2 commits into from
Mar 25, 2021

Conversation

jsturtevant
Copy link
Contributor

Reason for Change:

We need to collect the containerd logs for analysis.

Issue Fixed:

Credit Where Due:
This WPR file was shared via @kevpar on slack

Does this change contain code from or inspired by another project?

  • No
  • Yes

If "Yes," did you notify that project's maintainers and provide attribution?

  • No
  • Yes

Requirements:

Notes:

@jsturtevant
Copy link
Contributor Author

@kevpar any suggestions on the wprp profile?

@jsturtevant jsturtevant changed the title Add better logging for containerd chore: Add better logging for containerd Feb 3, 2021
@jsturtevant jsturtevant force-pushed the log-containerd-containers-running branch from 87c9737 to 5a888ab Compare February 3, 2021 00:22
@jsturtevant jsturtevant requested a review from marosset February 3, 2021 00:24
@codecov
Copy link

codecov bot commented Feb 3, 2021

Codecov Report

Merging #4219 (0eddeb7) into master (41a716a) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #4219   +/-   ##
=======================================
  Coverage   72.07%   72.07%           
=======================================
  Files         141      141           
  Lines       21640    21640           
=======================================
  Hits        15596    15596           
  Misses       5093     5093           
  Partials      951      951           
Impacted Files Coverage Δ
pkg/engine/templates_generated.go 43.56% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 41a716a...0eddeb7. Read the comment docs.

@jsturtevant
Copy link
Contributor Author

This is working (truncated output). Tests after a node restart as well.

Compressing all logs to 2018k8s01000000-20210204-115313_logs.zip 

C:\Users\azureuser\AppData\Local\Temp\20210204-115313-containerd.txt
C:\Users\azureuser\AppData\Local\Temp\20210204-115313-containerd.etl




    Directory: C:\Users\azureuser


Mode          LastWriteTime Length Name
----          ------------- ------ ----
-a----   2/4/2021  11:53 PM 199260 2018k8s01000000-20210204-115313_logs.zip

note: if there is an unplanned node crashed, logs prior to the crash will not be retained. If required a logging agent should be configured to listen to the ETW events and persist longer term.

@jsturtevant
Copy link
Contributor Author

and the etl file:

image

@jsturtevant
Copy link
Contributor Author

Discussed this further and we will not be turning the scripts on by default. We will provide a script to turn on the logging when needed and if folks want to collect logging for containerd full time they will need to provide a logging agent that listens to the etw events.

holding for updates to not turn this on by defualt
/hold

@jsturtevant jsturtevant force-pushed the log-containerd-containers-running branch from 2fa377f to 27e926a Compare March 22, 2021 23:40
{
& ctr.exe -n k8s.io c ls > "$ENV:TEMP\$timeStamp-containerd-containers.txt"
& ctr.exe -n k8s.io t ls > "$ENV:TEMP\$timeStamp-containerd-tasks.txt"
& wpr.exe -stop "$ENV:TEMP\$timeStamp-containerd.etl"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New to wpr, just curious, where do we start the trace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't starting the trace. Someone would need to start the trace, repo the issue and then run this script to collect all the logs. If there is no trace running this would report that, and continue on. We are going to try a different approach all together here so this will change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We moved to using nssm to enable consistent logging story across our components

@jsturtevant jsturtevant force-pushed the log-containerd-containers-running branch 2 times, most recently from bf9b301 to a4befe3 Compare March 24, 2021 18:49
@jsturtevant jsturtevant changed the title chore: Add better logging for containerd feature: Add better logging for containerd Mar 24, 2021
@jsturtevant jsturtevant changed the title feature: Add better logging for containerd feat: Add better logging for containerd Mar 24, 2021
@jsturtevant jsturtevant force-pushed the log-containerd-containers-running branch from a4befe3 to abd852c Compare March 24, 2021 18:50
Copy link
Contributor

@marosset marosset left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly looks good, had a few minor comments tho.

parts/k8s/containerdtemplate.toml Show resolved Hide resolved
parts/k8s/kuberneteswindowssetup.ps1 Outdated Show resolved Hide resolved
& "$KubeDir\nssm.exe" set containerd ObjectName LocalSystem | RemoveNulls
& "$KubeDir\nssm.exe" set containerd Type SERVICE_WIN32_OWN_PROCESS | RemoveNulls
& "$KubeDir\nssm.exe" set containerd AppThrottle 1500 | RemoveNulls
& "$KubeDir\nssm.exe" set containerd AppStdout C:\k\containerd.log | RemoveNulls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log to the same directory as containerd or to c:\k?
If we log to c:\k\ logs will all be together but if we log next to containerd (c:\program files\contaienrd) it might be less confusing for someone looking for the logs...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should we use $KubeDir instead of c:\k?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was leaning towards c:\k but don't have strong feelings. I think the hardcoded c:\k was copy past from testing. I will update that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set the working dir to c:\k as well to fix the azure cni logging issue

@jsturtevant jsturtevant changed the title feat: Add better logging for containerd feat: Use NSSM for containerd and collect containerd logs Mar 24, 2021
@@ -159,19 +160,6 @@ function Install-ContainerD {
$newPath = [Environment]::GetEnvironmentVariable("Path", [EnvironmentVariableTarget]::Machine) + ";$installDir"
[Environment]::SetEnvironmentVariable("Path", $newPath, [EnvironmentVariableTarget]::Machine)
$env:Path += ";$installDir"

Write-Log "Registering containerd as a service"
& containerd.exe --register-service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to install containerd in the VHD in order to pull nanoserver/servercore/pause image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or at least start it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. I was hoping we wouldn't need to duplicate the configuration set up but I guess we will need that

Copy link
Contributor

@marosset marosset Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably get by with a Start-Job and not need to register this as a service.
This won't persist across reboots tho so start it right before we try and download the VHDs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, i wonder why the Windows VHD CI job didn't trigger here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made this update. PTAL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
Let's make sure the containerD VHD job passes the CI job!

@jsturtevant jsturtevant force-pushed the log-containerd-containers-running branch from 1ebb886 to 0eddeb7 Compare March 25, 2021 16:14
@acs-bot
Copy link

acs-bot commented Mar 25, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jsturtevant, marosset

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [jsturtevant,marosset]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants