Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mon_group support for resctrl. #2793

Merged
merged 53 commits into from
Sep 21, 2021
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
0f93f1a
Add mon_group support for resctrl.
Jan 28, 2021
3452cf3
Do not try to setup the root container.
Mar 15, 2021
18d297d
Update klog version to avoid errors from golangci-lint.
Mar 15, 2021
3feb264
Update klog version in cmd to avoid errors from golangci-lint.
Mar 15, 2021
7aaaff7
Fix go.sum
Mar 15, 2021
58a2e01
Check if container moved between control groups only if its running.
Mar 15, 2021
5561bd9
Get NUMA nodes from MachineInfo.Topology.
Mar 16, 2021
9a1e6cb
Make code thread safe again.
Mar 16, 2021
873cc47
Fix typo.
Mar 16, 2021
d07084c
Refactor resctrl collector setup.
Mar 17, 2021
801e883
Refactor resctrl utilies.
Mar 17, 2021
75fabd7
Better name vars.
Mar 17, 2021
a1619a7
Add missing python3 in Dockerfile.
Mar 18, 2021
0f274e4
Add missing procps in Dockerfile.
Mar 18, 2021
73c71b6
Merge branch 'master' of github.com:google/cadvisor into creatone/res…
May 10, 2021
03f8571
Use const instead of magic value.
May 10, 2021
29e1e19
Delete an unnecessary setting of c.running to false.
May 10, 2021
fc3b8ce
Do not wrap the error from cAdvisor.
May 10, 2021
ecb0156
Use path in error message.
May 10, 2021
e1f5e9b
Avoid goroutine looping.
May 11, 2021
3951953
Do not use fscommon package from runc/libcontainer.
May 11, 2021
7d36305
Fix const ASCII names.
May 11, 2021
ee42de2
Use same operator in func.
May 12, 2021
7c1eeb2
Introduce const variables.
May 12, 2021
a1ce724
Merge branch 'master' of github.com:google/cadvisor into creatone/res…
May 12, 2021
3ffe422
Introduce vendor_id in MachineInfo.
Jun 18, 2021
ab3ee30
Extend files which should be omitted when searching control group.
Jun 18, 2021
a986999
Add info about possible bug when reading resctrl values on AMD.
Jun 18, 2021
88b6b7c
Use empty struct map instead of boolean.
Aug 16, 2021
8bf947a
Move reading file logic.
Aug 16, 2021
c54e007
Use scanner to read tasks file.
Aug 16, 2021
dbb54d2
Change the way of searching for the control group.
Aug 16, 2021
731606d
Add comments. Use const value.
Aug 16, 2021
15be02e
Comment function.
Aug 16, 2021
d24ece6
Fix typo.
Aug 18, 2021
c9da6fb
Refactor getAllProcessThreads.
Aug 18, 2021
8162197
Refactor GetVendorID.
Aug 18, 2021
9c675e4
Rename VendorID.
Aug 18, 2021
a76478c
Resctrl collector should be aware of existing mon groups.
Aug 19, 2021
852f755
Optimization for finding control/monitoring group.
Aug 20, 2021
97987d8
Avoid having ugly errors.
Aug 20, 2021
c675d56
Merge branch 'master' of github.com:google/cadvisor into creatone/res…
Aug 24, 2021
e7628e7
Use strings.HasPrefix().
Aug 25, 2021
1db6ca2
Add comments.
Aug 25, 2021
fa9b5db
Rename variables.
Aug 25, 2021
b3f311b
Fix test.
Aug 25, 2021
bbde636
Use string map instead of int.
Aug 25, 2021
ea39458
Now there is no need to use procps in Dockerfile.
Sep 9, 2021
0ec4da2
Merge branch 'master' of github.com:google/cadvisor into creatone/res…
Sep 9, 2021
b76ee15
Update to go 1.17.
Sep 9, 2021
8c734a0
Add information about possible race condition.
Sep 9, 2021
fb4b9c8
Add warning when docker_only is not set.
Sep 10, 2021
00b42cf
Fix typo.
Sep 10, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cmd/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,6 @@ require (
golang.org/x/oauth2 v0.0.0-20200902213428-5d25da1a8d43
google.golang.org/api v0.34.0
gopkg.in/olivere/elastic.v2 v2.0.12
k8s.io/klog/v2 v2.2.0
k8s.io/klog/v2 v2.8.0
k8s.io/utils v0.0.0-20201110183641-67b214c5f920
)
8 changes: 4 additions & 4 deletions cmd/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -156,8 +156,8 @@ github.com/go-logfmt/logfmt v0.3.0/go.mod h1:Qt1PoO58o5twSAckw1HlFXLmHsOX5/0LbT9
github.com/go-logfmt/logfmt v0.4.0/go.mod h1:3RMwSq7FuexP4Kalkev3ejPJsZTpXXBr9+V4qmtdjCk=
github.com/go-logfmt/logfmt v0.5.0/go.mod h1:wCYkCAKZfumFQihp8CzCvQ3paCTfi41vtzG1KdI/P7A=
github.com/go-logr/logr v0.1.0/go.mod h1:ixOQHD9gLJUVQQ2ZOR7zLEifBX6tGkNJF4QyIY7sIas=
github.com/go-logr/logr v0.2.0 h1:QvGt2nLcHH0WK9orKa+ppBPAxREcH364nPUedEpK0TY=
github.com/go-logr/logr v0.2.0/go.mod h1:z6/tIYblkpsD+a4lm/fGIIU9mZ+XfAiaFtq7xTgseGU=
github.com/go-logr/logr v0.4.0 h1:K7/B1jt6fIBQVd4Owv2MqGQClcgf0R266+7C/QjRcLc=
github.com/go-logr/logr v0.4.0/go.mod h1:z6/tIYblkpsD+a4lm/fGIIU9mZ+XfAiaFtq7xTgseGU=
github.com/go-sql-driver/mysql v1.4.0/go.mod h1:zAC/RDZ24gD3HViQzih4MyKcchzm+sOG5ZlKdlhCg5w=
github.com/go-stack/stack v1.8.0/go.mod h1:v0f6uXyyMGvRgIKkXu+yp6POWl0qKG85gN/melR3HDY=
github.com/godbus/dbus/v5 v5.0.3 h1:ZqHaoEF7TBzh4jzPmqVhE/5A1z9of6orkAe5uHoAeME=
Expand Down Expand Up @@ -822,8 +822,8 @@ honnef.co/go/tools v0.0.1-2019.2.3/go.mod h1:a3bituU0lyd329TUQxRnasdCoJDkEUEAqEt
honnef.co/go/tools v0.0.1-2020.1.3/go.mod h1:X/FiERA/W4tHapMX5mGpAtMSVEeEUOyHaw9vFzvIQ3k=
honnef.co/go/tools v0.0.1-2020.1.4/go.mod h1:X/FiERA/W4tHapMX5mGpAtMSVEeEUOyHaw9vFzvIQ3k=
k8s.io/klog/v2 v2.0.0/go.mod h1:PBfzABfn139FHAV07az/IF9Wp1bkk3vpT2XSJ76fSDE=
k8s.io/klog/v2 v2.2.0 h1:XRvcwJozkgZ1UQJmfMGpvRthQHOvihEhYtDfAaxMz/A=
k8s.io/klog/v2 v2.2.0/go.mod h1:Od+F08eJP+W3HUb4pSrPpgp9DGU4GzlpG/TmITuYh/Y=
k8s.io/klog/v2 v2.8.0 h1:Q3gmuM9hKEjefWFFYF0Mat+YyFJvsUyYuwyNNJ5C9Ts=
k8s.io/klog/v2 v2.8.0/go.mod h1:hy9LJ/NvuK+iVyP4Ehqva4HxZG/oXyIS3n3Jmire4Ec=
k8s.io/utils v0.0.0-20201110183641-67b214c5f920 h1:CbnUZsM497iRC5QMVkHwyl8s2tB3g7yaSHkYPkpgelw=
k8s.io/utils v0.0.0-20201110183641-67b214c5f920/go.mod h1:jPW/WVKK9YHAvNhRxK0md/EJ228hCsBRufyofKtW8HA=
rsc.io/binaryregexp v0.2.0/go.mod h1:qTv7/COck+e2FymRvadv62gMdZztPaShugOCi3I+8D8=
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,6 @@ require (
google.golang.org/grpc v1.27.1
google.golang.org/protobuf v1.25.0 // indirect
gotest.tools/v3 v3.0.3 // indirect
k8s.io/klog/v2 v2.2.0
k8s.io/klog/v2 v2.8.0
k8s.io/utils v0.0.0-20201110183641-67b214c5f920
)
8 changes: 4 additions & 4 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -89,8 +89,8 @@ github.com/go-kit/kit v0.9.0/go.mod h1:xBxKIO96dXMWWy0MnWVtmwkA9/13aqxPnvrjFYMA2
github.com/go-logfmt/logfmt v0.3.0/go.mod h1:Qt1PoO58o5twSAckw1HlFXLmHsOX5/0LbT9GBnD5lWE=
github.com/go-logfmt/logfmt v0.4.0/go.mod h1:3RMwSq7FuexP4Kalkev3ejPJsZTpXXBr9+V4qmtdjCk=
github.com/go-logr/logr v0.1.0/go.mod h1:ixOQHD9gLJUVQQ2ZOR7zLEifBX6tGkNJF4QyIY7sIas=
github.com/go-logr/logr v0.2.0 h1:QvGt2nLcHH0WK9orKa+ppBPAxREcH364nPUedEpK0TY=
github.com/go-logr/logr v0.2.0/go.mod h1:z6/tIYblkpsD+a4lm/fGIIU9mZ+XfAiaFtq7xTgseGU=
github.com/go-logr/logr v0.4.0 h1:K7/B1jt6fIBQVd4Owv2MqGQClcgf0R266+7C/QjRcLc=
github.com/go-logr/logr v0.4.0/go.mod h1:z6/tIYblkpsD+a4lm/fGIIU9mZ+XfAiaFtq7xTgseGU=
github.com/go-stack/stack v1.8.0/go.mod h1:v0f6uXyyMGvRgIKkXu+yp6POWl0qKG85gN/melR3HDY=
github.com/godbus/dbus/v5 v5.0.3 h1:ZqHaoEF7TBzh4jzPmqVhE/5A1z9of6orkAe5uHoAeME=
github.com/godbus/dbus/v5 v5.0.3/go.mod h1:xhWf0FNVPg57R7Z0UbKHbJfkEywrmjJnf7w5xrFpKfA=
Expand Down Expand Up @@ -489,8 +489,8 @@ honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc/go.mod h1:rf3lG4BRIbNafJWh
honnef.co/go/tools v0.0.1-2019.2.3/go.mod h1:a3bituU0lyd329TUQxRnasdCoJDkEUEAqEt0JzvZhAg=
honnef.co/go/tools v0.0.1-2020.1.3/go.mod h1:X/FiERA/W4tHapMX5mGpAtMSVEeEUOyHaw9vFzvIQ3k=
k8s.io/klog/v2 v2.0.0/go.mod h1:PBfzABfn139FHAV07az/IF9Wp1bkk3vpT2XSJ76fSDE=
k8s.io/klog/v2 v2.2.0 h1:XRvcwJozkgZ1UQJmfMGpvRthQHOvihEhYtDfAaxMz/A=
k8s.io/klog/v2 v2.2.0/go.mod h1:Od+F08eJP+W3HUb4pSrPpgp9DGU4GzlpG/TmITuYh/Y=
k8s.io/klog/v2 v2.8.0 h1:Q3gmuM9hKEjefWFFYF0Mat+YyFJvsUyYuwyNNJ5C9Ts=
k8s.io/klog/v2 v2.8.0/go.mod h1:hy9LJ/NvuK+iVyP4Ehqva4HxZG/oXyIS3n3Jmire4Ec=
k8s.io/utils v0.0.0-20201110183641-67b214c5f920 h1:CbnUZsM497iRC5QMVkHwyl8s2tB3g7yaSHkYPkpgelw=
k8s.io/utils v0.0.0-20201110183641-67b214c5f920/go.mod h1:jPW/WVKK9YHAvNhRxK0md/EJ228hCsBRufyofKtW8HA=
rsc.io/binaryregexp v0.2.0/go.mod h1:qTv7/COck+e2FymRvadv62gMdZztPaShugOCi3I+8D8=
Expand Down
2 changes: 1 addition & 1 deletion manager/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -959,7 +959,7 @@ func (m *manager) createContainerLocked(containerName string, watchSource watche
if m.includedMetrics.Has(container.ResctrlMetrics) {
cont.resctrlCollector, err = m.resctrlManager.GetCollector(containerName, func() ([]string, error) {
return cont.getContainerPids(true)
})
}, len(m.machineInfo.Topology))
if err != nil {
klog.V(4).Infof("resctrl metrics will not be available for container %s: %s", cont.info.Name, err)
}
Expand Down
91 changes: 49 additions & 42 deletions resctrl/collector.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,46 +28,50 @@ import (
)

type collector struct {
id string
interval time.Duration
getContainerPids func() ([]string, error)
resctrlPath string
running bool
id string
interval time.Duration
getContainerPids func() ([]string, error)
resctrlPath string
running bool
numberOfNUMANodes int
}

func newCollector(id string, getContainerPids func() ([]string, error), interval time.Duration) *collector {
return &collector{id: id, interval: interval, getContainerPids: getContainerPids}
func newCollector(id string, getContainerPids func() ([]string, error), interval time.Duration, numberOfNUMANodes int) *collector {
return &collector{id: id, interval: interval, getContainerPids: getContainerPids, numberOfNUMANodes: numberOfNUMANodes}
}

func (c *collector) setup() error {
err := c.prepareMonGroup()
if c.id != rootContainer {
// There is no need to prepare or update "/" container.
err := c.prepareMonGroup()

if c.interval != 0 && c.id != rootContainer {
if err != nil {
klog.Errorf("Failed to setup container %q resctrl collector: %v \n Trying again in next intervals!", c.id, err)
}
go func() {
for {
if c.running {
err = c.prepareMonGroup()
if err != nil {
klog.Errorf("checking %q resctrl collector but: %v", c.id, err)
}
} else {
err = c.clear()
if err != nil {
klog.Errorf("trying to end %q resctrl collector interval but: %v", c.id, err)
if c.interval != 0 {
Creatone marked this conversation as resolved.
Show resolved Hide resolved
if err != nil {
klog.Errorf("Failed to setup container %q resctrl collector: %w \n Trying again in next intervals!", c.id, err)
}
go func() {
for {
time.Sleep(c.interval)
if c.running {
err = c.prepareMonGroup()
if err != nil {
klog.Errorf("checking %q resctrl collector but: %w", c.id, err)
}
} else {
err = c.clear()
if err != nil {
klog.Errorf("trying to end %q resctrl collector interval but: %w", c.id, err)
}
break
}
break
}
time.Sleep(c.interval)
}()
} else {
// There is no interval set, if setup fail, stop.
if err != nil {
c.running = false
return err
}
}()
} else {
// There is no interval set, if setup fail, stop.
if err != nil {
c.running = false
return err
}
}

Expand All @@ -80,18 +84,22 @@ func (c *collector) prepareMonGroup() error {
return fmt.Errorf("couldn't obtain mon_group path: %v", err)
}

// Check if container moved between control groups.
if newPath != c.resctrlPath {
err = c.clear()
if err != nil {
c.running = false
return fmt.Errorf("couldn't clear previous mon group: %v", err)
if c.running {
// Check if container moved between control groups.
if newPath != c.resctrlPath {
err = c.clear()
if err != nil {
c.running = false
return fmt.Errorf("couldn't clear previous mon group: %v", err)
}
c.resctrlPath = newPath
}
} else {
// Mon group prepared, the collector is running correctly.
c.resctrlPath = newPath
c.running = true
}

// Mon group prepared, the collector is running correctly.
c.running = true
return nil
}

Expand All @@ -103,10 +111,9 @@ func (c *collector) UpdateStats(stats *info.ContainerStats) error {
if err != nil {
return err
}
numberOfNUMANodes := len(*resctrlStats.MBMStats)

stats.Resctrl.MemoryBandwidth = make([]info.MemoryBandwidthStats, 0, numberOfNUMANodes)
stats.Resctrl.Cache = make([]info.CacheStats, 0, numberOfNUMANodes)
stats.Resctrl.MemoryBandwidth = make([]info.MemoryBandwidthStats, 0, c.numberOfNUMANodes)
stats.Resctrl.Cache = make([]info.CacheStats, 0, c.numberOfNUMANodes)

for _, numaNodeStats := range *resctrlStats.MBMStats {
stats.Resctrl.MemoryBandwidth = append(stats.Resctrl.MemoryBandwidth,
Expand Down
6 changes: 3 additions & 3 deletions resctrl/collector_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ func TestNewCollectorWithSetup(t *testing.T) {
expectedID := "container"
expectedResctrlPath := filepath.Join(rootResctrl, monGroupsDirName, expectedID)

collector := newCollector(expectedID, mockGetContainerPids, 0)
collector := newCollector(expectedID, mockGetContainerPids, 0, 2)
err := collector.setup()

assert.NoError(t, err)
Expand All @@ -58,7 +58,7 @@ func TestUpdateStats(t *testing.T) {
processPath = mockProcFs()
defer os.RemoveAll(processPath)

collector := newCollector("container", mockGetContainerPids, 0)
collector := newCollector("container", mockGetContainerPids, 0, 2)
err := collector.setup()
assert.NoError(t, err)

Expand Down Expand Up @@ -96,7 +96,7 @@ func TestDestroy(t *testing.T) {
processPath = mockProcFs()
defer os.RemoveAll(processPath)

collector := newCollector("container", mockGetContainerPids, 0)
collector := newCollector("container", mockGetContainerPids, 0, 2)
err := collector.setup()
if err != nil {
t.Fail()
Expand Down
8 changes: 4 additions & 4 deletions resctrl/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,16 @@ import (

type Manager interface {
Destroy()
GetCollector(containerName string, getContainerPids func() ([]string, error)) (stats.Collector, error)
GetCollector(containerName string, getContainerPids func() ([]string, error), numberOfNUMANodes int) (stats.Collector, error)
}

type manager struct {
stats.NoopDestroy
interval time.Duration
}

func (m *manager) GetCollector(containerName string, getContainerPids func() ([]string, error)) (stats.Collector, error) {
collector := newCollector(containerName, getContainerPids, m.interval)
func (m *manager) GetCollector(containerName string, getContainerPids func() ([]string, error), numberOfNUMANodes int) (stats.Collector, error) {
collector := newCollector(containerName, getContainerPids, m.interval, numberOfNUMANodes)
err := collector.setup()
if err != nil {
return &stats.NoopCollector{}, err
Expand Down Expand Up @@ -64,6 +64,6 @@ type NoopManager struct {
stats.NoopDestroy
}

func (np *NoopManager) GetCollector(_ string, _ func() ([]string, error)) (stats.Collector, error) {
func (np *NoopManager) GetCollector(_ string, _ func() ([]string, error), _ int) (stats.Collector, error) {
return &stats.NoopCollector{}, nil
}
2 changes: 1 addition & 1 deletion resctrl/manager_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,6 @@ func TestGetCollector(t *testing.T) {
manager, err := NewManager(0, setup)
assert.NoError(t, err)

_, err = manager.GetCollector(expectedID, mockGetContainerPids)
_, err = manager.GetCollector(expectedID, mockGetContainerPids, 2)
assert.NoError(t, err)
}