-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove range limit which depend on kernel #780
Conversation
72d7f35
to
29caf6b
Compare
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
29caf6b
to
ce55de2
Compare
1 similar comment
@@ -352,14 +352,14 @@ For more information, see [the kernel cgroups documentation about blkio][cgroup- | |||
|
|||
The following parameters can be specified to setup the controller: | |||
|
|||
* **`blkioWeight`** *(uint16, OPTIONAL)* - specifies per-cgroup weight. This is default weight of the group on all devices until and unless overridden by per-device rules. The range is from 10 to 1000. | |||
* **`blkioWeight`** *(uint16, OPTIONAL)* - specifies per-cgroup weight. This is default weight of the group on all devices until and unless overridden by per-device rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Punting to the kernel is fine when it's clear which kernel parameter you're punting to. In this case, the spec-reader has to figure out that blkioWeight
is (MUST be?) implemented with blkio.weight
in the blkio
controller, which has:
Currently allowed range of weights is from 10 to 1000
The current docs make it clear that blockIO
represents the blkio
controller, but mapping between the individual properties and the controller API is left up to the reader's imagination (with no RFC 2119 language for compliance testing, #746). That leaves the semantics of these values pretty wiggly, and outside of validation and compliance testing.
Also the kernel's “Currently allowed” wording is troublesome. If we punt to the kernel for semantics, it's not clear to me how we handle evolving kernel semantics unless we pin the property to a particular kernel version or trust the kernel to maintain backwards compatibility for as long as this version of the spec is useful.
This is a bit awkward, since: * It's not a direct wrapper around mknod(2) (which, for example, does not use the c/b/u/p characters). * The runtime doesn't have to use mknod, so binding it to mknod(1)-ish invocations doesn't make much sense. Instead, I've bound it to POSIX's stat(3) to show what compliance testing (and anything else inside the container) can expect the results (however the runtime accomplishes them) to look like. The previous wording wasn't clear on whether symlinks were an allowed approach. The new wording explicitly allows them by using stat(1)-like symlink resolution. I've also clarified relative 'path' handling and explicitly declared the appropriate mount namespace (impacts 'path') and PID namespace (impacts 'uid' and 'gid'). Because we're focused on post-create stat calls, I've also added new wording about handling duplicate 'path' values. I've used POSIX reference where possible (vs. Linux man pages), because they contain sufficient detail for this section, have well-versioned URLs, and are more likely to be portable if this section ever applies to non-Linux configs (BSD? Solaris?). Related to recent discussion around punting to the kernel [1,2], although in this case we're not changing the JSON Schema because the existing local validation (valid 'type' characters and the 'fileMode' range) both feed into a single mode_t integer in the stat(3) and mknod(2) APIs. For a cleaner kernel punt, we could drop 'type', lift the range limit on 'fileMode', and map it directly to st.st_mode. But that seemed like a big backwards-compat shift for this commit. [1]: opencontainers#780 [2]: opencontainers#690 (comment) Signed-off-by: W. Trevor King <wking@tremily.us>
Since ce55de2 (Remove range limit which depend on kernel, 2017-04-26, opencontainers#780), the spec has been more aggressively punting to the kernel APIs (vs. carrying local versions of kernel limitations). For the properties touched by this commit, a pull request to reflect our old valid values (e.g. requiring 'type' to match ^[acb]$) was rejected as part of this punting approach. However, before this commit, it wasn't clear exactly what kernel interface was being punted to. With this commit, we replace the old inline docs with an explicit punt to the device whitelist controller, listing the exact actions that the runtime MUST take for given config values. This allows for compliance-testing runtimes [2] (ensuring config portability between compliant runtimes) and makes it possible to validate a given config against a given kernel (e.g. Linux 4.11.1 only accepts 'a', 'b', and 'c' as type characters [3]). [1]: opencontainers#690 (comment) [2]: opencontainers#746 [3]: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/security/device_cgroup.c?h=v4.11.1#n618 Signed-off-by: W. Trevor King <wking@tremily.us>
This is a bit awkward, since: * It's not a direct wrapper around mknod(2) (which, for example, does not use the c/b/u/p characters). * The runtime doesn't have to use mknod, so binding it to mknod(1)-ish invocations doesn't make much sense. Instead, I've bound it to POSIX's stat(3) to show what compliance testing (and anything else inside the container) can expect the results (however the runtime accomplishes them) to look like. The previous wording wasn't clear on whether symlinks were an allowed approach. The new wording explicitly allows them by using stat(1)-like symlink resolution. I've also clarified relative 'path' handling and explicitly declared the appropriate mount namespace (impacts 'path') and PID namespace (impacts 'uid' and 'gid'). Because we're focused on post-create stat calls, I've also added new wording about handling duplicate 'path' values. I've used POSIX reference where possible (vs. Linux man pages), because they contain sufficient detail for this section, have well-versioned URLs, and are more likely to be portable if this section ever applies to non-Linux configs (BSD? Solaris?). Related to recent discussion around punting to the kernel [1,2], although in this case we're not changing the JSON Schema because the existing local validation (valid 'type' characters and the 'fileMode' range) both feed into a single mode_t integer in the stat(3) and mknod(2) APIs. For a cleaner kernel punt, we could drop 'type', lift the range limit on 'fileMode', and map it directly to st.st_mode. But that seemed like a big backwards-compat shift for this commit. [1]: opencontainers#780 [2]: opencontainers#690 (comment) Signed-off-by: W. Trevor King <wking@tremily.us>
Since ce55de2 (Remove range limit which depend on kernel, 2017-04-26, opencontainers#780), the spec has been more aggressively punting to the kernel APIs (vs. carrying local versions of kernel limitations). For the properties touched by this commit, a pull request to reflect our old valid values (e.g. requiring 'type' to match ^[acb]$) was rejected as part of this punting approach. However, before this commit, it wasn't clear exactly what kernel interface was being punted to. With this commit, we replace the old inline docs with an explicit punt to the device whitelist controller, listing the exact actions that the runtime MUST take for given config values. This allows for compliance-testing runtimes [2] (ensuring config portability between compliant runtimes) and makes it possible to validate a given config against a given kernel (e.g. Linux 4.11.1 only accepts 'a', 'b', and 'c' as type characters [3]). [1]: opencontainers#690 (comment) [2]: opencontainers#746 [3]: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/tree/security/device_cgroup.c?h=v4.11.1#n618 Signed-off-by: W. Trevor King <wking@tremily.us>
Signed-off-by: Qiang Huang h.huangqiang@huawei.com