-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mon_group support for resctrl. #2793
Conversation
/ok-to-test |
bead82b
to
1a4f1f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general remarks:
- We do not support any Golang version lower than 1.15 so you can safely use
%w
instead of%v
when wrapping errors.
@@ -77,6 +77,8 @@ var rawCgroupPrefixWhiteList = flag.String("raw_cgroup_prefix_whitelist", "", "A | |||
|
|||
var perfEvents = flag.String("perf_events_config", "", "Path to a JSON file containing configuration of perf events to measure. Empty value disabled perf events measuring.") | |||
|
|||
var resctrlInterval = flag.Duration("resctrl_interval", 0, "Resctrl mon groups updating interval. Zero value disables updating mon groups.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "updating mon groups"? Are you going to add and/or remove pids from them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think that --housekeeping_interval
is not enough and that we need another flag? We still can exclude restctrl metrics using disable_metrics
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, watching mon groups and making sure that have correct pids.
Thanks to that flag we can monitor containers that changed their control group during the cAdvisor run.
Setting --resctrl_interval
to 0, disables updating mon groups.
This is good behavior for users that want to use only monitoring features of Resctrl.
We cannot depend on --housekeeping_interval
because cAdvisor will be updating containers instantly:
...
I0315 12:54:47.268474 22054 container.go:593] [/some_cgroup] Housekeeping took 331.572µs
I0315 12:54:47.268948 22054 container.go:593] [/some_cgroup] Housekeeping took 429.79µs
I0315 12:54:47.269348 22054 container.go:593] [/some_cgroup] Housekeeping took 356.188µs
I0315 12:54:47.269363 22054 container.go:593] [/machine.slice/libpod-conmon-e361822894402b35f7230ccd133725e8d52e97c8eddd3ba77cd0dc3fc9ee46ad.scope] Housekeeping took 1.506114ms
I0315 12:54:47.269385 22054 container.go:593] [/system.slice/thermos.service] Housekeeping took 1.48471ms
I0315 12:54:47.269919 22054 container.go:593] [/some_cgroup] Housekeeping took 528.066µs
I0315 12:54:47.270333 22054 container.go:593] [/some_cgroup] Housekeeping took 371.664µs
I0315 12:54:47.270611 22054 container.go:593] [/some_cgroup] Housekeeping took 220.11µs
I0315 12:54:47.270855 22054 container.go:593] [/some_cgroup] Housekeeping took 188.26µs
I0315 12:54:47.271074 22054 container.go:593] [/machine.slice/libpod-conmon-df33bd5e05ec42afd4c39cc0a1417c227553de2bd7fc510de00fb7341d59d263.scope] Housekeeping took 1.682142ms
I0315 12:54:47.271163 22054 container.go:593] [/some_cgroup] Housekeeping took 250.002µs
I0315 12:54:47.271407 22054 container.go:593] [/some_cgroup] Housekeeping took 185.349µs
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate more about
Setting --resctrl_interval to 0, disables updating mon groups.
This is good behavior for users that want to use only monitoring features of Resctrl.
I'm afraid that it won't be enough in case where other components (cri-rm, runc, manually) manage ctr/mon groups manually. At least this comment suggest that "enabling rdt" + restctrl_intreval=0 is enough, which is not true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to consider the necessity of --restrcl_interval flag, because it makes another cmdline options that is hard to understand but gives a chance to limit overhead in case that there is no other manager of resctrl hierarchy. I'm not yet full convinced it is worh. Pawel can you provide rough estimation of overhead - I mean e.g. number of syscalls (listdir,readfile, cpuusage) in typical setup (e.g. 20 pods + system cgroups).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If something will change the control group, the cAdvisor will continue measuring in the new group.
cAdvisor manipulates only mon groups. Now it's aware of existing groups, even own prepared have "cadvisor" prefix,
e.g. cadvisor-system.slice-docker-0f07bb8ed38c0f40c9424875f2cf417d19b27ca93b631640a2039b3478ee5858.scope
So will manipulate only the groups with that prefix.
@Creatone do not force-push, please. Ability to browse history of your changes will be appreciated. |
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
df6d638
to
3452cf3
Compare
Only for rebase. |
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
/retest |
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
/retest |
There is something with k8s test infrastructure.
|
/retest |
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at my comments, please. Sorry for long delay.
/retest |
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
@iwankgb PTAL |
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
} | ||
|
||
processThreads, err := getAllProcessThreads(threadFiles) | ||
properContainerName := fmt.Sprintf("%s-%s", monGroupPrefix, strings.Replace(containerName, "/", "-", -1)) | ||
monGroupPath = filepath.Join(controlGroupPath, monitoringGroupDir, properContainerName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a risk that e.g.
- kubepods/pod-x
- kubepods/pod-x/container-y
will be created and then you have an race condition, that only of those will contain pids (other will be stealed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8c734a0 will be sufficient for you?
} | ||
for _, thread := range processThreads { | ||
err = intelrdt.WriteIntelRdtTasks(monGroupPath, thread) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if just of the pids disapperead but rest are ok? why are you removing all the group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will be a race condition which --resctrl_interval
fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we shouldn't monitor something that cannot be set by us.
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Creatone thank you for your patience!
@Creatone one last request: squash the commits to some reasonable amount and content 🙇 |
Can be a single one? :) |
* Add mon_group support for resctrl. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Do not try to setup the root container. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Update klog version to avoid errors from golangci-lint. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Update klog version in cmd to avoid errors from golangci-lint. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Fix go.sum Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Check if container moved between control groups only if its running. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Get NUMA nodes from MachineInfo.Topology. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Make code thread safe again. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Fix typo. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Refactor resctrl collector setup. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Refactor resctrl utilies. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Better name vars. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Add missing python3 in Dockerfile. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Add missing procps in Dockerfile. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Use const instead of magic value. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Delete an unnecessary setting of c.running to false. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Do not wrap the error from cAdvisor. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Use path in error message. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Avoid goroutine looping. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Do not use fscommon package from runc/libcontainer. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Fix const ASCII names. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Use same operator in func. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Introduce const variables. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Introduce vendor_id in MachineInfo. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Extend files which should be omitted when searching control group. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Add info about possible bug when reading resctrl values on AMD. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Use empty struct map instead of boolean. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Move reading file logic. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Use scanner to read tasks file. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Change the way of searching for the control group. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Add comments. Use const value. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Comment function. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Fix typo. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Refactor getAllProcessThreads. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Refactor GetVendorID. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Rename VendorID. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Resctrl collector should be aware of existing mon groups. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Optimization for finding control/monitoring group. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Avoid having ugly errors. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Use strings.HasPrefix(). Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Add comments. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Rename variables. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Fix test. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Use string map instead of int. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Now there is no need to use procps in Dockerfile. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Update to go 1.17. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Add information about possible race condition. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Add warning when docker_only is not set. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com> * Fix typo. Signed-off-by: Paweł Szulik <pawel.szulik@intel.com>
On a system with RDT monitoring features, the root directory, and other top levels
directories contain a folder named "mon_groups" in which additional
"MON" groups can be created to monitor subsets of tasks from a "CTRL_MON" group that is their ancestor.
There is a limitation of the number of "CTRL_MON" groups.
So in case when there are so many containers and as a user, you only want to use monitoring features, it's easy to reach this limit.
That's why "MON" groups are introduced.
This PR adds monitoring RDT features by mon_groups.
Signed-off-by: Paweł Szulik pawel.szulik@intel.com