-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add racct limit option #11
base: main
Are you sure you want to change the base?
Conversation
I'm a little bit nervous that this seems to directly leak the RACCT resource names directly into the config file format. This means that if FreeBSD 14 adds a new RACCT resource that isn't in FreeBSD 13 then it would still be supported by I think it would make sense to start by defining a subset of RACCT that definitely makes sense and proposing that for inclusion in the FreeBSD-specific bit of the spec. It's easy to add other actions and resources if they make sense, it's much harder to remove things. |
It doesn't look like that to me; @cyrilzhangfreebsd is proposing adding the following structs: // FreeBSD contains platform-specific configuration for FreeBSD based containers.
type FreeBSD struct {
// RacctLimits specifies racct rules to apply to this jail.
RacctLimits []RacctLimit `json:"racct,omitempty"`
}
// RacctLimit is a racct rule to apply to a jail.
type RacctLimit struct {
// Resource is the resource to set a limit on.
Resource string `json:"resource"`
// Action is what will happen if a process exceeds the allowed amount.
Action string `json:"action"`
// Amount is the allowed amount of the resource.
Amount string `json:"amount"`
// Per defines the entity that the amount applies to.
Per string `json:"per,omitempty"`
} This doesn't encode any of the resource names; the names are represented as the Am I reading this incorrectly? |
I spent some time reading
@davidchisnall I can see a bit of what you're saying here. On the Linux side, the spec breaks down individual limits like this in the I'm not as familiar with Windows or Solaris, but on Linux this definitely makes sense. While the resource limits are expressed through cgroups, each cgroup controller has its own set of configuration pseudofiles and has to be treated separately. The OCI spec reflects this even though it's not necessarily requiring that the resource limits be implemented with cgroups. FreeBSD does seem to be a bit different; the An open format like this PR would allow the full set of |
Here's a potential alternative sketch: type FreeBSD struct {
Resources *FreeBSDResources `json:"resources,omitempty"`
}
type FreeBSDResources struct {
// Memory is the memory restriction configuration
Memory *FreeBSDMemory `json:"memory,omitempty"`
// FSIO is the filesystem IO restriction configuration
FSIO *FreeBSDFSIO `json:"fsIO,omitempty"`
}
type FreeBSDMemory struct {
// Limit is the memory limit (in bytes)
Limit *int64 `json:"limit,omitempty"`
// Warning is the amount of memory (in bytes) where a warning is sent to devd(8)
Warning *int64 `json:"warning,omitempty"`
// Swap is the amount of swap that may be used (in bytes)
Swap *int64 `json:"swap,omitempty"`
// SwapWarning is the amount of swap (in bytes) where a warning is sent to devd(8)
SwapWarning *int64 `json:"swapWarning,omitempty"`
// etc...
}
type FreeBSDFSIO struct {
// ReadBPS is the number of filesystem reads (in bytes per second) before throttling occurs
ReadBPS *int64 `json:"readbps,omitempty"`
// WriteBPS is the number of filesystem writes (in bytes per second) before throttling occurs
WriteBPS *int64 `json:"writebps,omitempty"`
// etc...
} I think I like this better, but I haven't tested it and reserve the right to change my mind again 😄 @davidchisnall, some questions:
For
Why would the signal behavior be configured inside the jail? If we registered a rule like |
This is more what I had in mind. This way, if a future version of FreeBSD changes the names of the resources or actions, the format can be stable, and we can guarantee that every option in the format is fully supported on every version of FreeBSD that
I don't really know how this plugs into higher-level orchestration frameworks on Linux. At the moment, I'm not quite sure about the separation of concerns here though: Should this be something that's configured per container, or would the orchestration framework want to provide a global setting of the form 'this is the mechanism that you should use to notify me if the resource limits (whatever they are) for this container are exhausted'? The
I believe this will signal every process in the jail, but I'm not 100% sure. My original comment was on the assumption that you register for signals if you want to handle them and recover: if you don't handle them, the process is killed anyway and so you may as well just kill the process / jail. If a process wants to gracefully handle hitting resource limits then I would expect it to register for the notification at the same time it installs the signal handler. The |
Is this likely? Have the resources or actions changed names previously?
I didn't realize
Since we're speaking in super theoretical terms: I'd anticipate it being a more flexible design to specify this per-container. That would enable multiple distinct orchestrators to function at the same time, even if one of them wants to use the same mechanism to be notified if any container exceeds its limits.
Can a non-root process request limits/notifications for itself through
Yep, this is the kind of use-case that I'd like to support. |
I don't believe so, but there is no guarantee that they will remain stable between major releases.
The socket just forwards all of the events from the raw device to other processes, so the consumer needs to parse them all and ignore the ones that it doesn't want.
Sorry, I meant: what is responsible for handling the event? If it's configured in the container config, then the person writing the container configuration needs to also also set up whatever manages is. If the container configuration describes only the resource limit then the orchestration framework is responsible for handling the notification and can make global decisions about what to do.
I'm not sure, but don't forget that each jail has a root user, so processes within the jail can start as root and set the resource limits if they want to. |
Thank you, a lot of this feedback is useful and I think many good questions are being asked. I initially designed the configuration to mimic the So with this in mind, it probably makes sense to define specific resource limits in an implementation-agnostic way so that we can omit those racct limits that don't make sense. @samuelkarp, if you are still happy with your alternative sketch, I could make a new commit using that configuration style. |
I am, though I'd love to hear from others as well. I've talked to @tianon (one of the runtime-spec maintainers) a bit and hopefully he doesn't mind me asking for his thoughts publicly here: Tianon, what do you think? |
I'm in the process of making a new commit with the new data structure. In doing so, some things came to mind. For instance, I noticed that in the sketch you provided, there is no longer the option to send a signal to a process in a jail if it attempts to go over the memory limit. Is this something we would still like to support, or is it perhaps nonsensical? If we want to support this, one way might be to have a Another feature that has been lost is the "per" option, which would allow the limit to be defined per each process in the jail rather than the whole jail. For example, we could limit the number of threads per process, stack size per process, cpu usage per process, etc. I am not sure if this would be useful, however, so perhaps it is worth omitting? |
Hey everyone, I've added a second commit that uses the new format, so we can discuss if we like how this works out. Going this route, we'd have to separately discuss which resource limits make sense to add. For now I've additionally selected shared memory objects, CPU usage, and process count. |
I don't mind at all -- I'm honestly very much on the fence here. Linux support in the spec takes a very intentionally "raw" approach to make sure we're not limiting what's possible without spec changes too much, but it really punts that complexity upstream instead. This seems like a really powerful low-level interface, but I have to imagine this structure (and the associated "FreeBSD version to correct controls" translation table) is inevitably going to live somewhere and I guess it makes as much sense for it to live at this level as it does anywhere else. The only argument I have for pushing this all the way up to the users instead is that if some future version of FreeBSD happens to change one of these, users can "self remediate" by updating their deployments vs updating installed components to get the new translation tables automatically. As long as the tables get updated in a timely manner during the prep for the new release such that the new release's So the TLDR of that braindump is that I'm still a bit unsure, but leaning more towards the struct. If there are use cases that aren't covered by the struct, it might make sense to add an escape hatch for users to specify them manually, but that would also end up discouraging updates to the struct to make those more "officially" supported in the same way. |
Issue number:
#8
Description of changes:
This adds a FreeBSD -specific struct to the configuration, populated with an array of racct limit structs which define rctl rules to be applied to the jail. It also adds the function that adds the rules using rctl when the jail is created, and deletes the rules when the jail is deleted.
Testing done:
I tested by creating some jails with various racct rules, as well as with an empty "freebsd" configuration. The unit tests also pass.
Terms of contribution:
By submitting this pull request, I agree that this contribution is licensed under the terms found in the LICENSE file.