Skip to content
This repository has been archived by the owner on Jun 18, 2024. It is now read-only.

scx: Make exit debug dump buffer resizable #158

Merged
merged 5 commits into from
Mar 8, 2024
Merged

scx: Make exit debug dump buffer resizable #158

merged 5 commits into from
Mar 8, 2024

Conversation

htejun
Copy link
Collaborator

@htejun htejun commented Mar 7, 2024

Schedulers can now request custom debug dump buffer size by setting sched_ext_ops.exit_dump_len. This will help debugging on bigger and busier systems.

htejun added 4 commits March 6, 2024 16:02
The default dump buffer size of 32k is okay on smaller and mostly idle
systems but it's not difficult to run over. Userspace can already
communicate the desired size through sched_ext_ops.exit_dump_len. This
commit actually implements dynamic buffer sizing. This unfortunately makes
user_exit_info interface macro-fest but the usage in schedulers doesn't get
more complicated at least.
@htejun htejun requested a review from Byte-Lab March 7, 2024 09:53
@Byte-Lab
Copy link
Collaborator

Byte-Lab commented Mar 7, 2024

Is there perhaps a middle ground between having a completely static dump size, and having to require the user to pass a maximum buffer size? For example, having a maximum amount of memory per core for the system, so that we do scale with size (or something to that effect)?

My worry with this is that it's a difficult knob for a user to know how to set correctly, and it's memory that we actually are allocating when the scheduler is loaded so going with a large buffer would be wasteful.

@htejun
Copy link
Collaborator Author

htejun commented Mar 7, 2024

We can always make the default amount adaptive but I'm not sure what that the criteria should be. The problem is that the amount of dump generated is highly dependent on what's running on the system, which has some correlation with machine size but that's not very strong. Also, the amount allocated for dump buffering is likely completely wasted in most cases, especially for users, and this just isn't something that end users would really care about. The dump code should probably do a better job of printing out more pertinent information at the top tho.

@Byte-Lab
Copy link
Collaborator

Byte-Lab commented Mar 8, 2024

We can always make the default amount adaptive but I'm not sure what that the criteria should be.

Hmm, it seems to me like we'd be better at judging that than a user though, no? E.g. being able to scale it by num cores, max # of tasks to print per rq, etc. Though I do agree that it would be sad if whatever we chose was less than someone was willing to give, and ended up losing output because of it.

The problem is that the amount of dump generated is highly dependent on what's running on the system, which has some correlation with machine size but that's not very strong.

Makes sense. E.g. some pathological overcommit cases could need a lot more buffer than another workload on the same host.

Also, the amount allocated for dump buffering is likely completely wasted in most cases, especially for users, and this just isn't something that end users would really care about. The dump code should probably do a better job of printing out more pertinent information at the top tho.

Agreed that it's not a knob that most people will care about. It just feels like a very leaky API though. But also, it's not very uncommon to allow people to configure log buffer sizes.

Thanks for talking through it.

Copy link
Collaborator

@Byte-Lab Byte-Lab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable, thanks for adding all of the BC macros as well. One request: would you mind please adding a testcase to the sefltests dir that verifies loading a scheduler with a large and/or small buffer? Just so that we exercise the custom path a bit.

UEI_SET_SIZE(__skel, __ops_name, __uei_name); \
if (__COMPAT_struct_has_field("sched_ext_ops", "exit_dump_len")) { \
bpf_map__set_autocreate((__skel)->maps.__ops_name, true); \
bpf_map__set_autocreate((__skel)->maps.__ops_name##___no_exit_dump_len, false); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully down the road we'll be able to figure out a way to do this that doesn't require us to hard-code the BC struct-ops types like this. LGTM for now though

@htejun htejun merged commit 13f2f03 into sched_ext Mar 8, 2024
1 check failed
@htejun htejun deleted the htejun branch March 8, 2024 17:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants