-
Notifications
You must be signed in to change notification settings - Fork 29
Conversation
e7f9e0c
to
e6a13d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass -- will do a complete review tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great, this will be super useful. Left a few comments
kernel/sched/ext.c
Outdated
INIT_LIST_HEAD(&kit->cursor.list); | ||
RB_CLEAR_NODE(&kit->cursor.priq); | ||
kit->cursor.flags = SCX_TASK_DSQ_CURSOR; | ||
kit->dsq_seq = kit->dsq->seq; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
READ_ONCE()
?
@@ -258,7 +260,7 @@ static void update_core_sched_head_seq(struct task_struct *p) | |||
void BPF_STRUCT_OPS(qmap_dispatch, s32 cpu, struct task_struct *prev) | |||
{ | |||
struct cpu_ctx *cpuc; | |||
u32 zero = 0; | |||
u32 zero = 0, batch = dsp_batch ?: 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it'd be slightly clearer if we just initialized dsp_batch
to 1?
tools/sched_ext/scx_qmap.bpf.c
Outdated
bpf_for(i, 0, sizeof(exp_prefix)) { | ||
if (exp_prefix[i] == '\0') | ||
break; | ||
if (exp_prefix[i] != comm[i]) { | ||
match = false; | ||
break; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bpf_strncmp()
might be useful here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't take p->comm
unfortunately. :(
kf_mask manipulations shouldn't be reordered when seen from CPU itself. Tell compiler about it.
Factor out sched_ext_entity initialization into init_scx_entity() which memsets it to zero at the beginning to avoid needing 0 inits.
- Relocate scx_task_iter definition so that it's together with iter functions. - Add find_user_dsq() instead of open-coding rhashtable_lookup_fast(). - Relocate scx_bpf_nr_cpu_ids() close to its cpumask friends. - Reorder kfunc decls in scx/common.bpf.h to match kernel def order. - Remove unnecessary 0 inits from init_task initializer. While init_task initializers have other 0 inits, it's far from being consistent and e.g. rb_node doesn't even have usable initializer. Let's just initialize what's necessary. No functional changes.
The function takes @Mask but ignores it and uses hard-coded __SCX_KF_RQ_LOCKED instead. Fix it. The only user was calling with __SCX_KF_RQ_LOCKED, so this doesn't cause any behavior changes.
It's currently marked ops-only but can actually be called from any context. Let's make it safe.
None of the functions in scx_kfunc_set_ops_only set are ops-only. scx_bpf_select_cpu_dfl() is select_cpu-only and the rest can be called from anywhere. Remove the set and move the kfuncs to scx_kfunc_set_any and the new scx_kfunc_set_select_cpu. No functional changes intended.
dsq->nr is read without locking from scx_bpf_dsq_nr_queued(). Use WRITE_ONCE() to update and READ_ONCE() from scx_bpf_dsq_nr_queued().
We'll use the list_head for both fifo and priq. Let's rename it to list. No functional changes intended.
priq will updated so that the tasks are added to the list in vtime order too. Thus, caching the first node won't be necessary. Let's switch to uncached variant.
When a DSQ is used as a PRIQ, tasks were only kept on dsq->priq. This requires all other paths to check both dsq->priq and dsq->list. A task's scx.dsq_node.list is available when the task is on a PRIQ DSQ anyway, let's always put tasks on dsq->list whether it's used as a PRIQ or FIFO by inserting into dsq->list in vtime order on PRIQs. This means that dsq->list is always in dispatch order whether the DSQ is FIFO or PRIQ and the only paths that need to worry about the DSQ type are dispatch_enqueue() and task_unlink_from_dsq().
and give p->scx.dsq_node's type the name of scx_dsq_node. This will be used to implement DSQ iterator. This wastes 8 bytes on 64bit. Oh well.
Implement dsq->list iteration helper nldsq_next_task() and nldsq_for_each_task(). As the name suggests, these are to be used only with non-local DSQs. Currently, these don't have any functional differences to raw list iteration. Future BPF DSQ iteration addition will modify the iteration behavior.
- Can iterate non-local DSQs. - Can be used from any context. - Only tasks which are already queued at the time of iterator init are visited. - Can iterate in either dispatch or reverse dispatch order. For PRIQs, vtime order is guaranteed to monotonically increase or decrease depending on the walk direction. - scx_qmap updated to demonstrate the usage.
- Factor out consume_local_task() and consume_remote_task() - s/task_can_run_on_rq/task_can_run_on_remote_rq()/ and improve UP handling. This will be used to implement targeted task consumption while iterating DSQs.
- Unfurl the conditions for readability. - Add missing task_cpu_possible() test. - Comments.
This should behave the same. This is to prepare for the next patch which will add a usage example of targeted consumption.
This allows consuming a specific task while iterating a DSQ using the BPF iterator significantly increasing DSQ's flexibility. It has to jump through some hoops to work around BPF restrictions which hopefully can be removed in the future. scx_qmap is updated with a silly priority boost mechanism to demonstrate the usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, looks fantastic. Left one optional comment but this looks good to go
This allows iterating tasks in a non-local DSQ in [reverse] dispatch order from any context. While iterating in
ops.dispatch()
, specific tasks can be consumed in any order too.