Skip to content
This repository has been archived by the owner on Jun 18, 2024. It is now read-only.

Bug fixes and a lot of atropos updates #21

Merged
merged 25 commits into from
Jun 21, 2023
Merged

Bug fixes and a lot of atropos updates #21

merged 25 commits into from
Jun 21, 2023

Conversation

htejun
Copy link
Collaborator

@htejun htejun commented Jun 20, 2023

Fixes for a couple tricky bugs and an improvement to select_cpu_dfl().

  • All schedulers w/ vtime scheduling wasn't updating p->dsq.vtime on scheduler reloads or, if applicable, cgroup migrations. This could get some unlucky tasks stuck way in the future stalling them severely. It's likely that we haven't noticed this with web benchmarks because web machines never completely peg themselves.
  • ttwu_queue path skips select_cpu() which can confuse some of the SCX schedulers. Fixed by disabling ttwu_queue for tasks on SCX.
  • It turned out that going for a different CPU whose SMT siblings are all idle is usually better than sticking with partially free @prev_cpu. select_cpu_dfl() is updated accordingly.

Atropos saw a lot of updates:

  • Execution stall bug caused by queueing on and waking different domains fixed.
  • Load balancer updated so that it reliably and gradually reaches equilibrium.
  • Tuner which runs every 100ms and tells the BPF portion to adapt to the current per-domain utilization level has been added. This allows turning on/off certain greedy execution optimizations based on CPU saturation state.
  • PUSH/KICK_GREEDY optimizations implemented. Observed to improve work conservation and thus bandwidth w/ kcryptd benchmark. Would likely make latencies more consistent too.
  • Improve handling of possible but not online CPUs.
  • Other cleanups and improvements.

With these improvements, on an AMD zen2 CPU, atropos reliably outperforms CFS when running fio randrw workloads on a dm-crypted SSD across different concurrency levels.

htejun added 19 commits June 19, 2023 16:03
This caches whether there are any idle CPUs which we used to test a lot more
frequently. Now, the only remaining user is WAKE_SYNC path in
scx_pick_idle_cpu(), which is not high-frequency enough to warrant track and
cache the test result. Remove it.
Add @flags to scx_bpf_pick_idle_cpu() and implement SCX_PICK_IDLE_CPU_WHOLE
flag which makes the function try to pick a CPU whose SMT siblings are also
idle. This will be used to improve idle CPU selection.
Use the newly added SCX_PICK_IDLE_CPU_WHOLE to make scx_select_cpu_dfl()
prefer CPUs which are wholly idle over prev_cpu that's only partially idle.

While at it, add SMT conditionals to scx_pick_idle_cpu().
In simple and flatcg, p->scx.dsq_vtime wasn't initialized on load or when a
task is moved across cgroups. As vtimes start at zero, the bug is less
noticeable on the first load; however, on subsequent loads and after cgroup
migrations, some tasks may end up with vtime far into the future and stall
for extended period of time.

Re-init dsq_vtime on load and cgroup migrations.
* Drop short optoins for two less used options and always explicitly specify
  the char used for short option.

* Use more compact code form for looking up a map and checking its result.

* Make error messages more consistent.

* s/dq/dsq/g

* Other cosmetic & misc changes.
Drop unused LAST_TASK and re-group the rest.
task_set_dsq() is a bit of misnomer and inconsistent with other functions
including pick_task_domain(). The names are really confusing. Clean up.

* task_set_dsq() -> task_set_domain()
* pick_task_domain() -> task_pick_domain()
* task_set_domain() -> task_pick_and_set_domain()
to prepare for future changes. No functional changes.
The BPF scheduler may depend on select_task_rq() being invoked during
wakeups and @p may end up executing on a different CPU regardless of what
happens in the wakeup path making the ttwu_queue optimization ineffective.
Skip if on SCX.

Combined with an atropos bug, this caused significant execution stalls in
low to moderate load conditions.
_padding's size is off by 4 bytes. This was okay in practice because struct
alignment would still put it at the same size but let's still fix it.
…reign CPU

When not dispatching directly, ->select_cpu() would always put the waking
task on one of the domestic CPUs which guarantees that a CPU will soon try
to dispatch from the DSQ. However, a task can be enqueued outside the
enqueue path and thus without going through ->select_cpu(). If the task was
on a foreign CPU before, this could lead to the task being queued on its
domain's DSQ while none of its CPUs being woken up leading to execution
bubbles.

Fix it by explicitly kicking a domestic CPU if the task being enqueued to a
domain DSQ is on a foreign CPU.
p->scx.dsq_vtime wasn't being initialized on scheduler load which could put
some tasks far into the future leading to long stalls. Fix it.
To avoid stepping over the L3-aware behavior of the new workqueue code which
sets permissive cpus_allowed and uses ->wake_cpu to steer kworkers.
W/ fio on dm-crypt + L3-aware workqueue benchmark, it was observed that
LOAD_IMBAL_REDUCTION_MIN_RATIO can often persist significant load imbalance
for a very long preiod because it rejects a task transfer if the load imbal
reduction from that particular transfer is less than 10%. If there is a
large load imbalance which is caused by combination of many smaller loads,
they'll never get balanced.

However, if we remove LOAD_IMBAL_REDUCTION_MIN_RATIO, the load balancer
becomes prone to oscillations because it tries to balance fully on each
round. Instead, add LOAD_IMBAL_XFER_TARGET_RATIO which dictates the target
transfer amount on each push/pull pair and set it to 50%. ie. Now the load
balancer tries transfer candidate search targeting 50% load imbalance
reduction. However, the final transfer candidate selection is still done
based on the eventual load imbalance reduction so that we can eventually
converge on balanced load.

After the above changes, we always converge to the balanced state (as
defined by LOAD_IMBAL_HIGH_RATIO) but gradually. To ensure work
conservation, let's set the default --greedy-threshold to 1. This does have
negative performance implications but this will be addressed in the future
patches.
Neither procfs or fb_procfs can determine per-CPU utilization reliably with
CPU hot[un]plugs. Roll our own.

  eminence/procfs#274
  facebookincubator/below#8190
To enable adding a mechanism which runs at a different frequency than the
load balancer:

* Refactor scheduling loop into Scheduler::run().

* Refactor topology handling into struct Topology.
Target CPU selection in select_cpu() is more challenging for atropos as
there are tradeoffs between keeping a task local and trying to exploit a
less occupied CPU in a foreign domain. atropos didn't consider CPU
utilizations when making target CPU selections which limited its
effectiveness. This patch improves the situation by:

* Implement a new high-level userland subsystem called Tuner which runs more
  frequently than LoadBalancer (100ms by default) and updates the parameters
  in struct tune_input to modify the BPF scheduler's behavior. Currently,
  Tuner only consider CPU utilizatoin and tells the BPF scheduler whether a
  given CPU is eligible for the new DIRECT_GREEDY and KICK_GREEDY
  optimizations.

* For domestic CPUs, select_cpu() is updated to use the new
  SCX_PICK_IDLE_CPU_WHOLE to prefer wholly-idle CPUs over partially idle
  @prev_cpu.

* If @prev_cpu was in a foreign domain which has enough unused CPU cycles
  and @prev_cpu is wholly idle (all its siblings are idle too), keep
  @prev_cpu by putting the task directly into @prev_cpu's DSQ which is
  counted as DIRECT_GREEDY. Keeping @prev_cpu only when it's wholly idle
  removes the negative performance impact of this mechanism observed in fio
  on kcryptd benchmark.

* Implement similar pattern of preferring wholly idle CPUs and gating
  foreign DIRECT_GREEDY according to the domain's CPU utilization down the
  CPU preference ladder.

* If no direct dispatch path could be taken and if there are any idle CPUs
  in the system, domestic or foreign, kick it after enqueueing to the
  domain. This doesn't seem to have negative performance impact while
  improving work conservation in some cases.

With these optimizations, atropos matches or outperforms CFS at various load
levels when running fio on top of a dm-crypt device.
If a CPU's cache information can't be determined, atropos falls back to DOM0
after generating a warning. This can be jarring and confusing on many modern
AMD processors which have some cores disabled and report them as possible
but present.

Update the rust side so that it uses Option::None in Topology::cpu_dom to
clearly distinguish the CPUs that are on DOM0 because cache information
couldn't be determined. From the BPF side, this doesn't make any difference;
however, the userland side now can differentiate those CPUs and handle them
appropriate (e.g. not setting them in direct/kick_greedy_cpumask).

As userland can now detect CPUs that have come up after atropos is
initialized and thus doesn't have a proper domain assigned, it now should be
able to possible to reload itself to reflect the new hardware configuration.
This is left for the future for now.
@htejun htejun requested a review from Byte-Lab June 20, 2023 02:16
Copy link
Collaborator

@Byte-Lab Byte-Lab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job with all the debugging. Overall the changes look great, just left some comments on a first read-through. Haven't looked at main.rs yet

kernel/sched/ext.c Outdated Show resolved Hide resolved
kernel/sched/ext.c Show resolved Hide resolved
kernel/sched/ext.c Show resolved Hide resolved
kernel/sched/ext.h Show resolved Hide resolved
kernel/sched/ext.c Show resolved Hide resolved
tools/sched_ext/atropos/src/bpf/atropos.bpf.c Outdated Show resolved Hide resolved
tools/sched_ext/atropos/src/bpf/atropos.bpf.c Outdated Show resolved Hide resolved
tools/sched_ext/atropos/src/bpf/atropos.bpf.c Show resolved Hide resolved
tools/sched_ext/atropos/src/bpf/atropos.bpf.c Show resolved Hide resolved
tools/sched_ext/atropos/src/bpf/atropos.bpf.c Show resolved Hide resolved
Copy link
Collaborator

@Byte-Lab Byte-Lab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 1 very minor nit for main.rs. Honestly kinda skimmed over it, but it looks reasonable

tools/sched_ext/atropos/src/main.rs Outdated Show resolved Hide resolved
htejun added 6 commits June 20, 2023 16:40
…ctly

Suggested by David Vernet. This also allows removing explicit #ifdef's.
Whenever a CPU's idle state is cleared, the matching SMT mask should be
updated accordingly; however, scx_pick_idle_cpu() open coded this logic and
other users of test_and_clear_idle_cpu() wasn't updating idle_masks.smt
correctly. Fix it by moving SMT mask clearing from scx_pick_idle_cpu() to
test_and_clear_idle_cpu().

This issue was noticed by David Vernet.
On some error exit paths, atropos_select_cpu() was releasing idle_smtmask
before it was acquired. For some reason, the BPF verifier didn't notice
this. It didn't actually break anything because the release operation is
noop in this specific case.

Fix it by acquiring idle_smtmask at the beginning of function and always
releasing it before returning. While at it, make error return values more
consistent.
scx_bpf_pick_idle_cpu() returning -EBUSY doesn't mean that the caller can
assume that there will be future dispatch events because idle tracking may
race with actual scheduler state transitions. The only way to guarantee
timely future dispatch is kicking one of the target CPUs. Document it and
add scx_bpf_pick_any_cpu() which can be used to pick a CPU, idle or not.

atropos is updated to use the new function instead of using cpumask_any()
directly on the allowed CPUs.
atropos_enqueue() currently kicks the previous domain if the task has
dispatch_local set with the comment explaining that we need to do so because
we own the idle state of the task. This doesn't seem to make sense because
if select_cpu() picked prev_cpu and set dispatch_local, the CPU is already
woken up and enqueue() would be running on that CPU. Whether the task is
dispatched locally or not doesn't the CPU to stall. It will go check all the
queues after this and reassert idle. As there were other stall issues, it
could just be that I was confused. Preliminary testing doesn't show any
problems after removing.

However, on the new domain side, it could miss waking up a CPU when no idle
bits are set which is racy and has some chance of causing a temporary stall.
Fix it using scx_bpf_pick_any_cpu().
@Byte-Lab Byte-Lab merged commit 6b5b8b2 into sched_ext Jun 21, 2023
std::thread::sleep(
next_sched_at
.min(next_tune_at)
.duration_since(std::time::SystemTime::now())?,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htejun if I run rcutorture with atropos, the scheduler gets kicked out after a minute or so with "Error: second time provided was later than self" from, presumably, here. It looks like SystemTime.duration_since() isn't really safe to use according to https://doc.rust-lang.org/std/time/struct.SystemTime.html#method.duration_since

This function may fail because measurements taken earlier are not guaranteed to always be before later measurements (due to anomalies such as the system clock being adjusted either forwards or backwards). Instant can be used to measure elapsed time without this risk of failure.

Should we rework this to use Instant and Duration? Or to make it resilient to clock skew?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think use of Instant seems more reasonable here. As code don't actually require "full" time to check if interval elapsed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vnepogodin yep, that was merged in 52db350

Copy link
Collaborator

@dschatzberg dschatzberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't finished looking through main.rs - but went through the bpf logic and it makes sense - few comments inline

* If offline, @cpu is not its own sibling and
* scx_pick_idle_cpu() can get caught in an infinite loop as
* @cpu is never cleared from idle_masks.smt. Ensure that @cpu
* is eventually cleared.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand this change a bit more? What does this test and clear have to do with smt siblings?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there are two levels of idle masks, cpumaks and smtmask. cpumask is the real one and tracks the idle cpus. smtmask is the hint mask that tracks whether the whole core is idle. Let's say the toplogy is [0, 1], [2, 3] where the bracketed pairs are SMT siblings.

  • If all are idle, both cpumask and smtmask would be 1111.
  • If then CPU0 becomes busy, cpumask=0111 smtmask=0011
  • If then CPU1 becomes busy, cpumask=0011 smtmask=0011
  • If then CPU0 becomes idle, cpumask=1011 smtmask=0011
  • If then CPU1 becomes idle, cpumask=1111 smtmask=1111

ie. the bits for the smtmask get cleared for the whole physical core when any of its logical threads are busy. Now, this is implemented by clearing what the kernel thinks SMT mask is - topology_sibling_cpumask() - whenever a CPU becomes busy and asserting the same mask on idle transition if all siblings are idle.

Now, select_cpu does sth like the following:

retry:
    if (there's a whole core idle) {
        clear the sibling group in the smtmask;
        try test_and_set that CPU in cpumask, if fails, goto retry;
    }
    if (there's an idle cpu)
        try test_and_set that CPU in cpumask, if fails, goto retry;

The problem is that topology_sibling_cpumask() of a CPU becomes empty when a CPU is offline. So, the "clearing the sibling group" part doesn't do anything and the the test_and_set part may fail because smtmask can go out of sync with cpumask, which puts us in an infinite loop. So, we clear cpumask_of(cpu) instead in that case to guarantee forward progress.

* @flags: %SCX_PICK_IDLE_CPU_* flags
*
* Pick and claim an idle cpu in @cpus_allowed. If none is available, pick any
* CPU in @cpus_allowed. Guaranteed to suceed and returns the picked idle cpu
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: suceed => succeed

} tune_input;

__u64 tune_params_gen;
private(A) struct bpf_cpumask __kptr *direct_greedy_cpumask;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this private(A) syntax - what does it do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It hides it from the automatic global mmapped access from userland. Otherwise, bpf gets unhappy with putting kptrs (allocated cpumask) there because userspace can clobber the value.

stat_add(ATROPOS_STAT_PREV_IDLE, 1);
cpu = prev_cpu;
goto local;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming - you say that removing this prev-idle check improved perf?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, yeah, that was my initial observation. For whatever reason, removing that noticeably improved performance. However, what the new code does now is to keep the foreign prev_idle if the whole smt pair is idle and that actually gains performance.

// with CPU hot[un]plugs. Roll our own.
//
// https://github.com/eminence/procfs/issues/274
// https://github.com/facebookincubator/below/issues/8190
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brianc118 fixed this in below this week. I'll move us back onto fb_procfs unless you have any concerns.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fine but let's do that next week after sending out the patchset.

dom_cpus[dom_id].set(cpu, true);
cpu_dom.push(Some(dom_id));
}
None => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case can you lack a cpu -> cache mapping?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, on my zen2 test machine:

root@rbw /s/d/s/cpu# cat /proc/cpuinfo | grep name | head -1
model name      : AMD Ryzen 9 3900X 12-Core Processor
root@rbw ~# cat /sys/devices/system/cpu/possible
0-31
root@rbw ~# cat /sys/devices/system/cpu/present
0-23
root@rbw ~# cat /sys/devices/system/cpu/online
0-23
root@rbw ~# cd /sys/devices/system/cpu
root@rbw /s/d/s/cpu# cat possible present online
0-31
0-23
0-23
root@rbw /s/d/s/cpu# ls -d cpu*/cache | wc -l
24

The CCXs have some CPUs disabled. They can't be brought online but the firmware still reports them as possible but not present as if they can become present. So, we end up with possible CPUs without cache information. The same thing happens with some CCXs on belgamos IIRC. No idea why firmware is reporting it like that. It's silly because we end up allocating for the impossible possible CPUs and iterating over them too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I mean, in general, if hotplug is possible, then, the cache information may not be known until the CPU is plugged in. Even CPUs which are plugged in but brought down by echoing 0 to online don't report cache information. So, we need to be able to handle them. Eventually, I think what we should do is just reloading the scheduler when hotplugged CPUs are detected.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cat /sys/devices/system/cpu/possible

@Decave

Seems to be the same issue as I have reported to you with my 5900X.

@htejun
I remember that this "issue" was not present in 6.2 Kernels, but in 6.1 and 6.3/6.4.

But good there is a solution for atropos! I give in the next time a testing again. Thanks for working on it!

htejun pushed a commit that referenced this pull request Oct 7, 2023
commit 133466c ("net: stmmac: use per-queue 64 bit statistics
where necessary") caused one regression as found by Uwe, the backtrace
looks like:

	INFO: trying to register non-static key.
	The code is fine but needs lockdep annotation, or maybe
	you didn't initialize this object before use?
	turning off the locking correctness validator.
	CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.5.0-rc1-00449-g133466c3bbe1-dirty #21
	Hardware name: STM32 (Device Tree Support)
	 unwind_backtrace from show_stack+0x18/0x1c
	 show_stack from dump_stack_lvl+0x60/0x90
	 dump_stack_lvl from register_lock_class+0x98c/0x99c
	 register_lock_class from __lock_acquire+0x74/0x293c
	 __lock_acquire from lock_acquire+0x134/0x398
	 lock_acquire from stmmac_get_stats64+0x2ac/0x2fc
	 stmmac_get_stats64 from dev_get_stats+0x44/0x130
	 dev_get_stats from rtnl_fill_stats+0x38/0x120
	 rtnl_fill_stats from rtnl_fill_ifinfo+0x834/0x17f4
	 rtnl_fill_ifinfo from rtmsg_ifinfo_build_skb+0xc0/0x144
	 rtmsg_ifinfo_build_skb from rtmsg_ifinfo+0x50/0x88
	 rtmsg_ifinfo from __dev_notify_flags+0xc0/0xec
	 __dev_notify_flags from dev_change_flags+0x50/0x5c
	 dev_change_flags from ip_auto_config+0x2f4/0x1260
	 ip_auto_config from do_one_initcall+0x70/0x35c
	 do_one_initcall from kernel_init_freeable+0x2ac/0x308
	 kernel_init_freeable from kernel_init+0x1c/0x138
	 kernel_init from ret_from_fork+0x14/0x2c

The reason is the rxq|txq_stats structures are not what expected
because stmmac_open() -> __stmmac_open() the structure is overwritten
by "memcpy(&priv->dma_conf, dma_conf, sizeof(*dma_conf));"
This causes the well initialized syncp member of rxq|txq_stats is
overwritten unexpectedly as pointed out by Johannes and Uwe.

Fix this issue by moving rxq|txq_stats back to stmmac_extra_stats. For
SMP cache friendly, we also mark stmmac_txq_stats and stmmac_rxq_stats
as ____cacheline_aligned_in_smp.

Fixes: 133466c ("net: stmmac: use per-queue 64 bit statistics where necessary")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20230917165328.3403-1-jszhang@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Byte-Lab pushed a commit that referenced this pull request Jan 8, 2024
When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a
cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when
removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be
dereferenced as wrong struct in irdma_free_pending_cqp_request().

  PID: 3669   TASK: ffff88aef892c000  CPU: 28  COMMAND: "kworker/28:0"
   #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34
   #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2
   #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f
   #3 [fffffe0000549eb8] do_nmi at ffffffff81079582
   #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4
      [exception RIP: native_queued_spin_lock_slowpath+1291]
      RIP: ffffffff8127e72b  RSP: ffff88aa841ef778  RFLAGS: 00000046
      RAX: 0000000000000000  RBX: ffff88b01f849700  RCX: ffffffff8127e47e
      RDX: 0000000000000000  RSI: 0000000000000004  RDI: ffffffff83857ec0
      RBP: ffff88afe3e4efc8   R8: ffffed15fc7c9dfa   R9: ffffed15fc7c9dfa
      R10: 0000000000000001  R11: ffffed15fc7c9df9  R12: 0000000000740000
      R13: ffff88b01f849708  R14: 0000000000000003  R15: ffffed1603f092e1
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  -- <NMI exception stack> --
   #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b
   #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4
   #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363
   #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma]
   #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma]
   #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma]
   #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma]
   #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb
   #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6
   #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278
   #15 [ffff88aa841efb88] device_del at ffffffff82179d23
   #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice]
   #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice]
   #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a
   #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff
   #20 [ffff88aa841eff10] kthread at ffffffff811d87a0
   #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f

Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions")
Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn
Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com>
Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
htejun pushed a commit that referenced this pull request Jun 17, 2024
If function-like macros do not utilize a parameter, it might result in a
build warning.  In our coding style guidelines, we advocate for utilizing
static inline functions to replace such macros.  This patch verifies
compliance with the new rule.

For a macro such as the one below,

 #define test(a) do { } while (0)

The test result is as follows.

 WARNING: Argument 'a' is not used in function-like macro
 #21: FILE: mm/init-mm.c:20:
 +#define test(a) do { } while (0)

 total: 0 errors, 1 warnings, 8 lines checked

Link: https://lkml.kernel.org/r/20240507032757.146386-3-21cnbao@gmail.com
Signed-off-by: Xining Xu <mac.xxn@outlook.com>
Tested-by: Barry Song <v-songbaohua@oppo.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Acked-by: Joe Perches <joe@perches.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Huacai Chen <chenhuacai@loongson.cn>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mark Brown <broonie@kernel.org>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Jeff Johnson <quic_jjohnson@quicinc.com>
Cc: Charlemagne Lasse <charlemagnelasse@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants