-
Notifications
You must be signed in to change notification settings - Fork 29
scx: Make exit debug dump buffer resizable #158
Conversation
Actual feature is not implemented yet.
The default dump buffer size of 32k is okay on smaller and mostly idle systems but it's not difficult to run over. Userspace can already communicate the desired size through sched_ext_ops.exit_dump_len. This commit actually implements dynamic buffer sizing. This unfortunately makes user_exit_info interface macro-fest but the usage in schedulers doesn't get more complicated at least.
Is there perhaps a middle ground between having a completely static dump size, and having to require the user to pass a maximum buffer size? For example, having a maximum amount of memory per core for the system, so that we do scale with size (or something to that effect)? My worry with this is that it's a difficult knob for a user to know how to set correctly, and it's memory that we actually are allocating when the scheduler is loaded so going with a large buffer would be wasteful. |
We can always make the default amount adaptive but I'm not sure what that the criteria should be. The problem is that the amount of dump generated is highly dependent on what's running on the system, which has some correlation with machine size but that's not very strong. Also, the amount allocated for dump buffering is likely completely wasted in most cases, especially for users, and this just isn't something that end users would really care about. The dump code should probably do a better job of printing out more pertinent information at the top tho. |
Hmm, it seems to me like we'd be better at judging that than a user though, no? E.g. being able to scale it by num cores, max # of tasks to print per rq, etc. Though I do agree that it would be sad if whatever we chose was less than someone was willing to give, and ended up losing output because of it.
Makes sense. E.g. some pathological overcommit cases could need a lot more buffer than another workload on the same host.
Agreed that it's not a knob that most people will care about. It just feels like a very leaky API though. But also, it's not very uncommon to allow people to configure log buffer sizes. Thanks for talking through it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable, thanks for adding all of the BC macros as well. One request: would you mind please adding a testcase to the sefltests dir that verifies loading a scheduler with a large and/or small buffer? Just so that we exercise the custom path a bit.
UEI_SET_SIZE(__skel, __ops_name, __uei_name); \ | ||
if (__COMPAT_struct_has_field("sched_ext_ops", "exit_dump_len")) { \ | ||
bpf_map__set_autocreate((__skel)->maps.__ops_name, true); \ | ||
bpf_map__set_autocreate((__skel)->maps.__ops_name##___no_exit_dump_len, false); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully down the road we'll be able to figure out a way to do this that doesn't require us to hard-code the BC struct-ops types like this. LGTM for now though
Schedulers can now request custom debug dump buffer size by setting
sched_ext_ops.exit_dump_len
. This will help debugging on bigger and busier systems.