-
Notifications
You must be signed in to change notification settings - Fork 519
feat: Use NSSM for containerd and collect containerd logs #4219
feat: Use NSSM for containerd and collect containerd logs #4219
Conversation
@kevpar any suggestions on the wprp profile? |
87c9737
to
5a888ab
Compare
Codecov Report
@@ Coverage Diff @@
## master #4219 +/- ##
=======================================
Coverage 72.07% 72.07%
=======================================
Files 141 141
Lines 21640 21640
=======================================
Hits 15596 15596
Misses 5093 5093
Partials 951 951
Continue to review full report at Codecov.
|
This is working (truncated output). Tests after a node restart as well.
note: if there is an unplanned node crashed, logs prior to the crash will not be retained. If required a logging agent should be configured to listen to the ETW events and persist longer term. |
Discussed this further and we will not be turning the scripts on by default. We will provide a script to turn on the logging when needed and if folks want to collect logging for containerd full time they will need to provide a logging agent that listens to the etw events. holding for updates to not turn this on by defualt |
2fa377f
to
27e926a
Compare
scripts/collect-windows-logs.ps1
Outdated
{ | ||
& ctr.exe -n k8s.io c ls > "$ENV:TEMP\$timeStamp-containerd-containers.txt" | ||
& ctr.exe -n k8s.io t ls > "$ENV:TEMP\$timeStamp-containerd-tasks.txt" | ||
& wpr.exe -stop "$ENV:TEMP\$timeStamp-containerd.etl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New to wpr, just curious, where do we start the trace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't starting the trace. Someone would need to start the trace, repo the issue and then run this script to collect all the logs. If there is no trace running this would report that, and continue on. We are going to try a different approach all together here so this will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We moved to using nssm to enable consistent logging story across our components
bf9b301
to
a4befe3
Compare
a4befe3
to
abd852c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly looks good, had a few minor comments tho.
pkg/engine/templates_generated.go
Outdated
& "$KubeDir\nssm.exe" set containerd ObjectName LocalSystem | RemoveNulls | ||
& "$KubeDir\nssm.exe" set containerd Type SERVICE_WIN32_OWN_PROCESS | RemoveNulls | ||
& "$KubeDir\nssm.exe" set containerd AppThrottle 1500 | RemoveNulls | ||
& "$KubeDir\nssm.exe" set containerd AppStdout C:\k\containerd.log | RemoveNulls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we log to the same directory as containerd or to c:\k?
If we log to c:\k\ logs will all be together but if we log next to containerd (c:\program files\contaienrd) it might be less confusing for someone looking for the logs...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, should we use $KubeDir instead of c:\k?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was leaning towards c:\k
but don't have strong feelings. I think the hardcoded c:\k
was copy past from testing. I will update that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I set the working dir to c:\k as well to fix the azure cni logging issue
@@ -159,19 +160,6 @@ function Install-ContainerD { | |||
$newPath = [Environment]::GetEnvironmentVariable("Path", [EnvironmentVariableTarget]::Machine) + ";$installDir" | |||
[Environment]::SetEnvironmentVariable("Path", $newPath, [EnvironmentVariableTarget]::Machine) | |||
$env:Path += ";$installDir" | |||
|
|||
Write-Log "Registering containerd as a service" | |||
& containerd.exe --register-service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need to install containerd in the VHD in order to pull nanoserver/servercore/pause image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or at least start it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense. I was hoping we wouldn't need to duplicate the configuration set up but I guess we will need that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably get by with a Start-Job
and not need to register this as a service.
This won't persist across reboots tho so start it right before we try and download the VHDs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, i wonder why the Windows VHD CI job didn't trigger here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made this update. PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Let's make sure the containerD VHD job passes the CI job!
1ebb886
to
0eddeb7
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jsturtevant, marosset The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Reason for Change:
We need to collect the containerd logs for analysis.
Issue Fixed:
Credit Where Due:
This WPR file was shared via @kevpar on slack
Does this change contain code from or inspired by another project?
If "Yes," did you notify that project's maintainers and provide attribution?
Requirements:
Notes: