-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add missing file system metrics in containerd handler #2936
Conversation
Hi @qiutongs. Thanks for your PR. I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign yyrdl |
/ok-to-test |
19c267b
to
28c1a9d
Compare
snapshotDir := "/var/lib/containerd" | ||
// Example: upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/5001/fs | ||
if len(mounts) > 0 { | ||
for _, option := range mounts[0].Options { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe check that types.Mount.Type == overlay
or overlay2
? Since only overlay will have upper/lowerdir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good call. Let me update.
container/containerd/handler.go
Outdated
// Example: upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/5001/fs | ||
if len(mounts) > 0 { | ||
for _, option := range mounts[0].Options { | ||
if strings.Index(option, "upperdir=") == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe prefer strings.HasPrefix(option, "upperdir=")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right. I will change.
container/common/fsHandler.go
Outdated
type FsUsageProvider interface { | ||
// Usage returns the fs usage | ||
Usage() (*FsUsage, error) | ||
// Targets returns where the fs usage metric is collected,it maybe a directory ,a file or some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: space after comma and ,a
@@ -125,7 +113,8 @@ func (fh *realFsHandler) trackUsage() { | |||
// if the long duration is persistent either because of slow | |||
// disk or lots of containers. | |||
longOp = longOp + time.Second | |||
klog.V(2).Infof("fs: disk usage and inodes count on following dirs took %v: %v; will not log again for this container unless duration exceeds %v", duration, []string{fh.rootfs, fh.extraDir}, longOp) | |||
klog.V(2).Infof(`fs: disk usage and inodes count on targets took %v: %v; `+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is vendored in kubelet V(2)
logging verbosity is enabled by default. If this is can be noisy, consider dropping to (V3)
or higher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe this log is useful. After all, it was V(2) before.
container/common/fsHandler.go
Outdated
|
||
// Combine errors into a single error to return | ||
if rootErr != nil || extraErr != nil { | ||
return nil, fmt.Errorf("rootDiskErr: %v, extraDiskErr: %v", rootErr, extraErr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add to error a little more context if this error get's surfaced in some log somewhere:
fmt.Errorf("failed to obtain filesystem usage; rootDiskErr: %v, extraDiskErr: %v", rootErr, extraErr)
container/containerd/handler.go
Outdated
if includedMetrics.Has(container.DiskUsageMetrics) && cntr.Labels["io.cri-containerd.kind"] != "sandbox" { | ||
mounts, err := client.SnapshotMounts(ctx, cntr.Snapshotter, cntr.SnapshotKey) | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a little context here maybe?
return nil, fmt.Errorf("failed to obtain containerd snapshot mounts for disk usage metrics: %v", err)
container/containerd/handler.go
Outdated
// Default to top directory if the specific upperdir snapshot is not found | ||
snapshotDir := "/var/lib/containerd" | ||
// Example: upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/5001/fs | ||
if len(mounts) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the behavior here is to get the first mount and then break
.
Is this intentional? Can we have multiple mounts? Should we sum them or is the top upper dir enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, after checking containerd code, it only returns one mount for overlay.
client: client, | ||
containerID: id, | ||
// Path of logs, e.g. /var/log/pods/XXX | ||
logPath: status.LogPath, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this path to actual log file, or directory?
crio-o handler makes it seem like path returned there is actual log file, xref https://github.com/google/cadvisor/blob/master/container/crio/handler.go#L121-L128
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the actual log file.
Signed-off-by: Qiutong Song <songqt01@gmail.com>
cd13162
to
6012838
Compare
6012838
to
050a3bd
Compare
LGTM, thanks |
Unfortunately this only accounts for
But does not count |
Signed-off-by: Qiutong Song songqt01@gmail.com
Overview
File system metrics are not supported in cadvisor-containerd. This PR attempts to fix it.
The majority of code is copied from #2872
Fixed:
Testing
Query prometheus endpoint
Find the upperdir(writable layer)
94208
and 92k are close.After counting the size of logs, the reported number of
fs_usage_bytes
is 98304. The log is roughly 4KB.fs_limit_bytes
is reportedOpen Questions
1. Should we count the log file as #2872 (comment)? e.g./var/log/pods/default_nginx-pod_9de12299-f7c6-4404-8483-b40636023b5e/nginx-pod/0.log
2. Can we manually construct the rootfs path so that we can get the device name -/run/containerd/io.containerd.runtime.v2.task/k8s.io/7c6e000abee310fc05bab29ef9ef58cf27a51e8652197dffcfa8c84656270a38/rootfs
? With the device name, we can calculatefs_limit_bytes
.Appendix