Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mon groups for resctrl. #2523

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 61 additions & 21 deletions libcontainer/SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,32 +158,38 @@ init process will block waiting for the parent to finish setup.
### IntelRdt

Intel platforms with new Xeon CPU support Resource Director Technology (RDT).
Cache Allocation Technology (CAT) and Memory Bandwidth Allocation (MBA) are
two sub-features of RDT.
Cache Allocation Technology (CAT), Cache Monitoring Technology (CMT),
Memory Bandwidth Allocation (MBA) and Memory Bandwidth Monitoring (MBM) are
four sub-features of RDT.

Cache Allocation Technology (CAT) provides a way for the software to restrict
cache allocation to a defined 'subset' of L3 cache which may be overlapping
with other 'subsets'. The different subsets are identified by class of
service (CLOS) and each CLOS has a capacity bitmask (CBM).

Cache Monitoring Technology (CMT) supports monitoring of the last-level cache (LLC) occupancy
for each running thread simultaneously.

Memory Bandwidth Allocation (MBA) provides indirect and approximate throttle
over memory bandwidth for the software. A user controls the resource by
indicating the percentage of maximum memory bandwidth or memory bandwidth limit
in MBps unit if MBA Software Controller is enabled.
indicating the percentage of maximum memory bandwidth or memory bandwidth
limit in MBps unit if MBA Software Controller is enabled.
Comment on lines -171 to +176
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there not any change? Only adjust wrapping width?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nearly entire SPEC.md is copy from huge comment in libcontainer/intelrdt/intelrdt.go. So I want to unify this.
This is just the adjusted text.


Memory Bandwidth Monitoring (MBM) supports monitoring of total and local memory bandwidth
for each running thread simultaneously.

It can be used to handle L3 cache and memory bandwidth resources allocation
for containers if hardware and kernel support Intel RDT CAT and MBA features.
More details about Intel RDT CAT and MBA can be found in the section 17.18 and 17.19, Volume 3
of Intel Software Developer Manual:
https://software.intel.com/en-us/articles/intel-sdm

In Linux 4.10 kernel or newer, the interface is defined and exposed via
About Intel RDT kernel interface:
In Linux 4.14 kernel or newer, the interface is defined and exposed via
"resource control" filesystem, which is a "cgroup-like" interface.

Comparing with cgroups, it has similar process management lifecycle and
interfaces in a container. But unlike cgroups' hierarchy, it has single level
filesystem layout.

CAT and MBA features are introduced in Linux 4.10 and 4.12 kernel via
"resource control" filesystem.

Intel RDT "resource control" filesystem hierarchy:
```
Creatone marked this conversation as resolved.
Show resolved Hide resolved
mount -t resctrl resctrl /sys/fs/resctrl
Expand All @@ -194,25 +200,46 @@ tree /sys/fs/resctrl
| | |-- cbm_mask
| | |-- min_cbm_bits
| | |-- num_closids
| |-- L3_MON
| | |-- max_threshold_occupancy
| | |-- mon_features
Creatone marked this conversation as resolved.
Show resolved Hide resolved
| | |-- num_rmids
| |-- MB
| |-- bandwidth_gran
| |-- delay_linear
| |-- min_bandwidth
| |-- num_closids
|-- ...
|-- mon_groups
|-- <rmid>
|-- ...
|-- mon_data
|-- mon_L3_00
|-- llc_occupancy
|-- mbm_local_bytes
|-- mbm_total_bytes
|-- ...
Creatone marked this conversation as resolved.
Show resolved Hide resolved
|-- tasks
|-- schemata
|-- tasks
|-- <container_id>
|-- <clos>
|-- ...
|-- schemata
|-- mon_data
|-- mon_L3_00
|-- llc_occupancy
|-- mbm_local_bytes
|-- mbm_total_bytes
|-- ...
|-- tasks
Creatone marked this conversation as resolved.
Show resolved Hide resolved
|-- schemata
|-- ...
```
Creatone marked this conversation as resolved.
Show resolved Hide resolved

For runc, we can make use of `tasks` and `schemata` configuration for L3
cache and memory bandwidth resources constraints.
cache and memory bandwidth resources constraints, `mon_data` directory for
CMT and MBM statistics.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the task ID
"<clos>" group). Tasks can be added to a group by writing the task ID
to the "tasks" file (which will automatically remove them from the previous
group to which they belonged). New tasks created by fork(2) and clone(2) are
added to the same group as their parent.
Expand All @@ -224,7 +251,7 @@ L3 cache schema:
It has allocation bitmasks/values for L3 cache on each socket, which
contains L3 cache id and capacity bitmask (CBM).
```
Creatone marked this conversation as resolved.
Show resolved Hide resolved
Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
```
Creatone marked this conversation as resolved.
Show resolved Hide resolved
For example, on a two-socket machine, the schema line could be "L3:0=ff;1=c0"
which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.
Expand All @@ -240,7 +267,7 @@ Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which contains
L3 cache id and memory bandwidth.
```
Creatone marked this conversation as resolved.
Show resolved Hide resolved
Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
```
Creatone marked this conversation as resolved.
Show resolved Hide resolved
For example, on a two-socket machine, the schema line could be "MB:0=20;1=70"

Expand All @@ -251,8 +278,10 @@ that is allocated is also dependent on the CPU model and can be looked up at
min_bw + N * bw_gran. Intermediate values are rounded to the next control
step available on the hardware.

If MBA Software Controller is enabled through mount option "-o mba_MBps"
If MBA Software Controller is enabled through mount option "-o mba_MBps":
```
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl
Creatone marked this conversation as resolved.
Show resolved Hide resolved
```
We could specify memory bandwidth in "MBps" (Mega Bytes per second) unit
instead of "percentages". The kernel underneath would use a software feedback
mechanism or a "Software Controller" which reads the actual bandwidth using
Expand All @@ -263,11 +292,12 @@ For example, on a two-socket machine, the schema line could be
"MB:0=5000;1=7000" which means 5000 MBps memory bandwidth limit on socket 0
and 7000 MBps memory bandwidth limit on socket 1.

For more information about Intel RDT kernel interface:
For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

```

An example for runc:
```
Consider a two-socket machine with two L3 caches where the default CBM is
0x7ff and the max CBM length is 11 bits, and minimum memory bandwidth of 10%
with a memory bandwidth granularity of 10%.
Expand All @@ -281,7 +311,17 @@ maximum memory bandwidth of 20% on socket 0 and 70% on socket 1.
"closID": "guaranteed_group",
Creatone marked this conversation as resolved.
Show resolved Hide resolved
"l3CacheSchema": "L3:0=7f0;1=1f",
"memBwSchema": "MB:0=20;1=70"
}
}
}
```
Another example:
```
We only want to monitor memory bandwidth and llc occupancy.
"linux": {
"intelRdt": {
"enableMBM": true,
"enableCMT": true
}
}
```

Expand Down
2 changes: 1 addition & 1 deletion libcontainer/configs/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ type Config struct {
NoNewKeyring bool `json:"no_new_keyring"`

// IntelRdt specifies settings for Intel RDT group that the container is placed into
// to limit the resources (e.g., L3 cache, memory bandwidth) the container has available
// to limit the resources (e.g., L3 cache, memory bandwidth) the container has available.
IntelRdt *IntelRdt `json:"intel_rdt,omitempty"`

// RootlessEUID is set when the runc was launched with non-zero EUID.
Expand Down
8 changes: 8 additions & 0 deletions libcontainer/configs/intelrdt.go
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add rmid field for special monitor group name in the future?

Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,12 @@ type IntelRdt struct {
// The unit of memory bandwidth is specified in "percentages" by
// default, and in "MBps" if MBA Software Controller is enabled.
MemBwSchema string `json:"memBwSchema,omitempty"`

// The flag to indicate if Intel RDT CMT is enabled. CMT (Cache Monitoring Technology) supports monitoring of
// the last-level cache (LLC) occupancy for the container.
EnableCMT bool `json:"enableCMT,omitempty"`
Creatone marked this conversation as resolved.
Show resolved Hide resolved

// The flag to indicate if Intel RDT MBM is enabled. MBM (Memory Bandwidth Monitoring) supports monitoring of
// total and local memory bandwidth for the container.
EnableMBM bool `json:"enableMBM,omitempty"`
}
42 changes: 42 additions & 0 deletions libcontainer/configs/intelrdt_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
package configs_test

import (
"encoding/json"
"reflect"
"testing"

"github.com/opencontainers/runc/libcontainer/configs"
)

func TestUnmarshalIntelRDT(t *testing.T) {
testCases := []struct {
JSON string
Expected configs.IntelRdt
}{
{
"{\"enableMBM\": true}",
configs.IntelRdt{EnableMBM: true, EnableCMT: false},
},
{
"{\"enableMBM\": true,\"enableCMT\": false}",
configs.IntelRdt{EnableMBM: true, EnableCMT: false},
},
{
"{\"enableMBM\": false,\"enableCMT\": true}",
configs.IntelRdt{EnableMBM: false, EnableCMT: true},
},
}

for _, tc := range testCases {
got := configs.IntelRdt{}

err := json.Unmarshal([]byte(tc.JSON), &got)
if err != nil {
t.Fatal(err)
}

if !reflect.DeepEqual(tc.Expected, got) {
t.Errorf("expected unmarshalled IntelRDT config %+v, got %+v", tc.Expected, got)
}
}
}
10 changes: 8 additions & 2 deletions libcontainer/configs/validate/validator.go
Original file line number Diff line number Diff line change
Expand Up @@ -219,12 +219,18 @@ func intelrdtCheck(config *configs.Config) error {
return fmt.Errorf("invalid intelRdt.ClosID %q", config.IntelRdt.ClosID)
}

if !intelrdt.IsCATEnabled() && config.IntelRdt.L3CacheSchema != "" {
if config.IntelRdt.L3CacheSchema != "" && !intelrdt.IsCATEnabled() {
return errors.New("intelRdt.l3CacheSchema is specified in config, but Intel RDT/CAT is not enabled")
}
if !intelrdt.IsMBAEnabled() && config.IntelRdt.MemBwSchema != "" {
if config.IntelRdt.MemBwSchema != "" && !intelrdt.IsMBAEnabled() {
return errors.New("intelRdt.memBwSchema is specified in config, but Intel RDT/MBA is not enabled")
}
if config.IntelRdt.EnableCMT && !intelrdt.IsCMTEnabled() {
return errors.New("intelRdt.enableCMT is specified in config, but Intel RDT/CMT is not enabled")
}
if config.IntelRdt.EnableMBM && !intelrdt.IsMBMEnabled() {
return errors.New("intelRdt.enableMBM is specified in config, but Intel RDT/MBM is not enabled")
}
}

return nil
Expand Down
1 change: 1 addition & 0 deletions libcontainer/container_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -2009,6 +2009,7 @@ func (c *Container) currentState() (*State, error) {
if c.intelRdtManager != nil {
intelRdtPath = c.intelRdtManager.GetPath()
}

Creatone marked this conversation as resolved.
Show resolved Hide resolved
state := &State{
BaseState: BaseState{
ID: c.ID(),
Expand Down
Loading