-
Notifications
You must be signed in to change notification settings - Fork 24
add support for Galaxy Note 20 5G #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Elchanz3
wants to merge
1
commit into
fcuzzocrea:lineage-20
Choose a base branch
from
Elchanz3:lineage-20
base: lineage-20
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sota4Ever
pushed a commit
to Sota4Ever/linux-android-samsung-exynos990
that referenced
this pull request
Dec 14, 2024
This fixes a warning and kernel panic when connecting to a hotspot hosted by the device: [ 568.838882] arm-smmu 15000000.apps-smmu: FAR = 0x00000000a0b51228 [ 568.838908] arm-smmu 15000000.apps-smmu: PAR = 0x0000000000000000 [ 568.838917] arm-smmu 15000000.apps-smmu: FSR = 0x40000402 [TF R SS ] [ 568.838924] arm-smmu 15000000.apps-smmu: TTBR0 = 0x0000000000000000 [ 568.838930] arm-smmu 15000000.apps-smmu: TTBR1 = 0x0000000000000000 [ 568.838938] arm-smmu 15000000.apps-smmu: SCTLR = 0x00c000e7 ACTLR = 0x00000003 [ 568.838973] arm-smmu 15000000.apps-smmu: CBAR = 0x0001f300 [ 568.838980] arm-smmu 15000000.apps-smmu: MAIR0 = 0xf404ff44 MAIR1 = 0x00000000 [ 568.838999] arm-smmu 15000000.apps-smmu: Unhandled context fault: iova=0xa0b51228, cb=46, fsr=0x40000402, fsynr0=0x4f0003, fsynr1=0x0 [ 568.839009] arm-smmu 15000000.apps-smmu: soft iova-to-phys=0x0000000000000000 [ 568.839018] arm-smmu 15000000.apps-smmu: SOFTWARE TABLE WALK FAILED! Looks like 15000000.apps-smmu accessed an unmapped address! [ 568.839025] arm-smmu 15000000.apps-smmu: hard iova-to-phys (ATOS) failed [ 568.839032] arm-smmu 15000000.apps-smmu: SID=0x521 [ 568.839038] arm-smmu 15000000.apps-smmu: Unhandled arm-smmu context fault! [ 568.839068] ------------[ cut here ]------------ [ 568.839074] kernel BUG at ../drivers/iommu/arm-smmu.c:1583! [ 568.839084] Internal error: Oops - BUG: 0 [fcuzzocrea#1] PREEMPT SMP [ 568.839095] Process irq/405-arm-smm (pid: 557, stack limit = 0x0000000071ffe420) [ 568.839107] CPU: 2 PID: 557 Comm: irq/405-arm-smm Tainted: G S 4.14.138-Proton-g26234daa #38 [ 568.839114] Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 KIRIN MP (DT) [ 568.839120] task: 00000000d8caf9cc task.stack: 0000000071ffe420 [ 568.839140] pc : arm_smmu_context_fault+0x580/0x600 [ 568.839146] lr : arm_smmu_context_fault+0x580/0x600 [ 568.839151] sp : ffffff8014efbce0 pstate : 60c00145 [ 568.839157] x29: ffffffcff335ac80 x28: 0000000000028000 [ 568.839165] x27: ffffff800c4ae058 x26: 0000000000000000 [ 568.839172] x25: ffffff800c4ae000 x24: ffffffcfe6578818 [ 568.839178] x23: 0000000000000521 x22: ffffffcff335ac80 [ 568.839185] x21: 00000000ffffffda x20: ffffffcfe6578920 [ 568.839192] x19: 0000000040000402 x18: ffffff9f0521e000 [ 568.839198] x17: 0000000000000039 x16: ffffff9f045af620 [ 568.839205] x15: ffffff9f04ae26f9 x14: 0000000000003338 [ 568.839211] x13: 000a7d8b33fb1800 x12: 0000000000000000 [ 568.839218] x11: 0000000000000001 x10: 0000000000000007 [ 568.839224] x9 : 77d23af7cb54d900 x8 : 77d23af7cb54d900 [ 568.839231] x7 : 0000000000000000 x6 : ffffff9f0522142a [ 568.839237] x5 : 0000000000000036 x4 : 0000000000000008 [ 568.839244] x3 : 0000000000000000 x2 : 0000000000000000 [ 568.839251] x1 : 00000000000001c0 x0 : 000000000000003e [ 568.839259] [ 568.839259] PC: 0xffffff9f03a8af60: [ 568.839266] af60 910203e2 94045ed1 14000004 900065c1 911b2821 94045ecd f94003a0 b0006201 [ 568.839284] af80 910e4821 2a1703e2 94045ec8 376800fc f94003a0 f0006a21 91264021 94045ec3 [ 568.839297] afa0 d4210000 14000000 2a1f03fa 310042bf 54000100 b9000373 d5033e9f b9405fe8 [ 568.839310] afc0 34000088 91002328 52800029 b9000109 f940a3b3 aa1303e0 97fff232 aa1303e0 [ 568.839326] [ 568.839326] LR: 0xffffff9f03a8af60: [ 568.839331] af60 910203e2 94045ed1 14000004 900065c1 911b2821 94045ecd f94003a0 b0006201 [ 568.839344] af80 910e4821 2a1703e2 94045ec8 376800fc f94003a0 f0006a21 91264021 94045ec3 [ 568.839357] afa0 d4210000 14000000 2a1f03fa 310042bf 54000100 b9000373 d5033e9f b9405fe8 [ 568.839370] afc0 34000088 91002328 52800029 b9000109 f940a3b3 aa1303e0 97fff232 aa1303e0 [ 568.839385] [ 568.839385] SP: 0xffffff8014efbca0: [ 568.839391] bca0 03a8afa0 ffffff9f 60c00145 00000000 ffffffd0 00000000 047d1990 ffffff9f [ 568.839405] bcc0 ffffffff ffffffff cb54d900 77d23af7 f335ac80 ffffffcf 03a8afa0 ffffff9f [ 568.839418] bce0 047da794 ffffff9f 047da794 ffffff9f 047da794 ffffff9f 046be95a ffffff9f [ 568.839432] bd00 047da794 ffffff9f 14efbd10 ffffff80 00010000 00028000 0c400000 ffffff80 [ 568.839447] [ 568.839452] Call trace: [ 568.839461] arm_smmu_context_fault+0x580/0x600 [ 568.839473] Code: f94003a0 f0006a21 91264021 94045ec3 (d4210000) [ 568.839482] ---[ end trace b8f0de5036a861a3 ]--- [ 568.847582] icnss: Received force error fatal request from FW [ 568.847773] icnss: Received early crash indication from FW [ 568.848772] PD_ERR: wlan_process : EF:wlan_process:0x1:WLAN RT:0x1079:cmnos_thread.c:3968:Asserted in whal_interrupt_wifi_ip02.c:whalGetPendingInterrupts:512 Signed-off-by: Danny Lin <danny@kdrag0n.dev> Change-Id: Idd5183cce197016cb64df1faf990d6f606616ef1
Sota4Ever
pushed a commit
to Sota4Ever/linux-android-samsung-exynos990
that referenced
this pull request
Dec 27, 2024
Turns out hotplugging CPUs that are in exclusive cpusets can lead to the
cpuset code feeding empty cpumasks to the sched domain rebuild machinery.
This leads to the following splat:
Internal error: Oops: 96000004 [fcuzzocrea#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 235 Comm: kworker/5:2 Not tainted 5.4.0-rc1-00005-g8d495477d62e #23
Hardware name: ARM Juno development board (r0) (DT)
Workqueue: events cpuset_hotplug_workfn
pstate: 60000005 (nZCv daif -PAN -UAO)
pc : build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
lr : build_sched_domains (kernel/sched/topology.c:1966)
Call trace:
build_sched_domains (./include/linux/arch_topology.h:23 kernel/sched/topology.c:1898 kernel/sched/topology.c:1969)
partition_sched_domains_locked (kernel/sched/topology.c:2250)
rebuild_sched_domains_locked (./include/linux/bitmap.h:370 ./include/linux/cpumask.h:538 kernel/cgroup/cpuset.c:955 kernel/cgroup/cpuset.c:978 kernel/cgroup/cpuset.c:1019)
rebuild_sched_domains (kernel/cgroup/cpuset.c:1032)
cpuset_hotplug_workfn (kernel/cgroup/cpuset.c:3205 (discriminator 2))
process_one_work (./arch/arm64/include/asm/jump_label.h:21 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:114 kernel/workqueue.c:2274)
worker_thread (./include/linux/compiler.h:199 ./include/linux/list.h:268 kernel/workqueue.c:2416)
kthread (kernel/kthread.c:255)
ret_from_fork (arch/arm64/kernel/entry.S:1167)
Code: f860dae2 912802d6 aa1603e1 12800000 (f8616853)
The faulty line in question is:
cap = arch_scale_cpu_capacity(cpumask_first(cpu_map));
and we're not checking the return value against nr_cpu_ids (we shouldn't
have to!), which leads to the above.
Prevent generate_sched_domains() from returning empty cpumasks, and add
some assertion in build_sched_domains() to scream bloody murder if it
happens again.
The above splat was obtained on my Juno r0 with the following reproducer:
$ cgcreate -g cpuset:asym
$ cgset -r cpuset.cpus=0-3 asym
$ cgset -r cpuset.mems=0 asym
$ cgset -r cpuset.cpu_exclusive=1 asym
$ cgcreate -g cpuset:smp
$ cgset -r cpuset.cpus=4-5 smp
$ cgset -r cpuset.mems=0 smp
$ cgset -r cpuset.cpu_exclusive=1 smp
$ cgset -r cpuset.sched_load_balance=0 .
$ echo 0 > /sys/devices/system/cpu/cpu4/online
$ echo 0 > /sys/devices/system/cpu/cpu5/online
Bug: 254441685
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar.Eggemann@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hannes@cmpxchg.org
Cc: lizefan@huawei.com
Cc: morten.rasmussen@arm.com
Cc: qperret@google.com
Cc: tj@kernel.org
Cc: vincent.guittot@linaro.org
Fixes: 05484e098448 ("sched/topology: Add SD_ASYM_CPUCAPACITY flag detection")
Link: https://lkml.kernel.org/r/20191023153745.19515-2-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit cd1cb3350561d2bf544ddfef76fbf0b1c9c7178f)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: If93e57ff867b5d6004cc0481a1fcc198a9bcefe8
Sota4Ever
pushed a commit
to Sota4Ever/linux-android-samsung-exynos990
that referenced
this pull request
Dec 27, 2024
…roups and high uptime Jingfeng reports rare div0 crashes in psi on systems with some uptime: [58914.066423] divide error: 0000 [fcuzzocrea#1] SMP [58914.070416] Modules linked in: ipmi_poweroff ipmi_watchdog toa overlay fuse tcp_diag inet_diag binfmt_misc aisqos(O) aisqos_hotfixes(O) [58914.083158] CPU: 94 PID: 140364 Comm: kworker/94:2 Tainted: G W OE K 4.9.151-015.ali3000.alios7.x86_64 fcuzzocrea#1 [58914.093722] Hardware name: Alibaba Alibaba Cloud ECS/Alibaba Cloud ECS, BIOS 3.23.34 02/14/2019 [58914.102728] Workqueue: events psi_update_work [58914.107258] task: ffff8879da83c280 task.stack: ffffc90059dcc000 [58914.113336] RIP: 0010:[] [] psi_update_stats+0x1c1/0x330 [58914.122183] RSP: 0018:ffffc90059dcfd60 EFLAGS: 00010246 [58914.127650] RAX: 0000000000000000 RBX: ffff8858fe98be50 RCX: 000000007744d640 [58914.134947] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00003594f700648e [58914.142243] RBP: ffffc90059dcfdf8 R08: 0000359500000000 R09: 0000000000000000 [58914.149538] R10: 0000000000000000 R11: 0000000000000000 R12: 0000359500000000 [58914.156837] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8858fe98bd78 [58914.164136] FS: 0000000000000000(0000) GS:ffff887f7f380000(0000) knlGS:0000000000000000 [58914.172529] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [58914.178467] CR2: 00007f2240452090 CR3: 0000005d5d258000 CR4: 00000000007606f0 [58914.185765] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [58914.193061] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [58914.200360] PKRU: 55555554 [58914.203221] Stack: [58914.205383] ffff8858fe98bd48 00000000000002f0 0000002e81036d09 ffffc90059dcfde8 [58914.213168] ffff8858fe98bec8 0000000000000000 0000000000000000 0000000000000000 [58914.220951] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [58914.228734] Call Trace: [58914.231337] [] psi_update_work+0x22/0x60 [58914.237067] [] process_one_work+0x189/0x420 [58914.243063] [] worker_thread+0x4e/0x4b0 [58914.248701] [] ? process_one_work+0x420/0x420 [58914.254869] [] kthread+0xe6/0x100 [58914.259994] [] ? kthread_park+0x60/0x60 [58914.265640] [] ret_from_fork+0x39/0x50 [58914.271193] Code: 41 29 c3 4d 39 dc 4d 0f 42 dc <49> f7 f1 48 8b 13 48 89 c7 48 c1 [58914.279691] RIP [] psi_update_stats+0x1c1/0x330 The crashing instruction is trying to divide the observed stall time by the sampling period. The period, stored in R8, is not 0, but we are dividing by the lower 32 bits only, which are all 0 in this instance. We could switch to a 64-bit division, but the period shouldn't be that big in the first place. It's the time between the last update and the next scheduled one, and so should always be around 2s and comfortably fit into 32 bits. The bug is in the initialization of new cgroups: we schedule the first sampling event in a cgroup as an offset of sched_clock(), but fail to initialize the last_update timestamp, and it defaults to 0. That results in a bogusly large sampling period the first time we run the sampling code, and consequently we underreport pressure for the first 2s of a cgroup's life. But worse, if sched_clock() is sufficiently advanced on the system, and the user gets unlucky, the period's lower 32 bits can all be 0 and the sampling division will crash. Fix this by initializing the last update timestamp to the creation time of the cgroup, thus correctly marking the start of the first pressure sampling period in a new cgroup. Reported-by: Jingfeng Xie <xiejingfeng@linux.alibaba.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Link: https://lkml.kernel.org/r/20191203183524.41378-2-hannes@cmpxchg.org Signed-off-by: Sasha Levin <sashal@kernel.org> (cherry picked from commit 3dfbe25c27eab7c90c8a7e97b4c354a9d24dd985) Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Iaada5c2f1a03cf38cbb053adde478f762ce40843
Sota4Ever
pushed a commit
to Sota4Ever/linux-android-samsung-exynos990
that referenced
this pull request
Dec 27, 2024
[ Upstream commit b4da13aa28d4fd0071247b7b41c579ee8a86c81a ] A missing clock update is causing the following warning: rq->clock_update_flags < RQCF_ACT_SKIP WARNING: CPU: 112 PID: 2041 at kernel/sched/sched.h:1453 sub_running_bw.isra.0+0x190/0x1a0 ... CPU: 112 PID: 2041 Comm: sugov:112 Tainted: G W 5.14.0-rc1 fcuzzocrea#1 Hardware name: WIWYNN Mt.Jade Server System B81.030Z1.0007/Mt.Jade Motherboard, BIOS 1.6.20210526 (SCP: 1.06.20210526) 2021/05/26 ... Call trace: sub_running_bw.isra.0+0x190/0x1a0 migrate_task_rq_dl+0xf8/0x1e0 set_task_cpu+0xa8/0x1f0 try_to_wake_up+0x150/0x3d4 wake_up_q+0x64/0xc0 __up_write+0xd0/0x1c0 up_write+0x4c/0x2b0 cppc_set_perf+0x120/0x2d0 cppc_cpufreq_set_target+0xe0/0x1a4 [cppc_cpufreq] __cpufreq_driver_target+0x74/0x140 sugov_work+0x64/0x80 kthread_worker_fn+0xe0/0x230 kthread+0x138/0x140 ret_from_fork+0x10/0x18 The task causing this is the `cppc_fie` DL task introduced by commit 1eb5dde674f5 ("cpufreq: CPPC: Add support for frequency invariance"). With CONFIG_ACPI_CPPC_CPUFREQ_FIE=y and schedutil cpufreq governor on slow-switching system (like on this Ampere Altra WIWYNN Mt. Jade Arm Server): DL task `curr=sugov:112` lets `p=cppc_fie` migrate and since the latter is in `non_contending` state, migrate_task_rq_dl() calls sub_running_bw()->__sub_running_bw()->cpufreq_update_util()-> rq_clock()->assert_clock_updated() on p. Fix this by updating the clock for a non_contending task in migrate_task_rq_dl() before calling sub_running_bw(). Reported-by: Bruno Goncalves <bgoncalv@redhat.com> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@kernel.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20210804135925.3734605-1-dietmar.eggemann@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Sota4Ever
pushed a commit
to Sota4Ever/linux-android-samsung-exynos990
that referenced
this pull request
Dec 27, 2024
…sses commit 2279f540ea7d05f22d2f0c4224319330228586bc upstream. Glenn reported that "an application [he developed produces] a BUG in deadline.c when a SCHED_DEADLINE task contends with CFS tasks on nested PTHREAD_PRIO_INHERIT mutexes. I believe the bug is triggered when a CFS task that was boosted by a SCHED_DEADLINE task boosts another CFS task (nested priority inheritance). ------------[ cut here ]------------ kernel BUG at kernel/sched/deadline.c:1462! invalid opcode: 0000 [fcuzzocrea#1] PREEMPT SMP CPU: 12 PID: 19171 Comm: dl_boost_bug Tainted: ... Hardware name: ... RIP: 0010:enqueue_task_dl+0x335/0x910 Code: ... RSP: 0018:ffffc9000c2bbc68 EFLAGS: 00010002 RAX: 0000000000000009 RBX: ffff888c0af94c00 RCX: ffffffff81e12500 RDX: 000000000000002e RSI: ffff888c0af94c00 RDI: ffff888c10b22600 RBP: ffffc9000c2bbd08 R08: 0000000000000009 R09: 0000000000000078 R10: ffffffff81e12440 R11: ffffffff81e1236c R12: ffff888bc8932600 R13: ffff888c0af94eb8 R14: ffff888c10b22600 R15: ffff888bc8932600 FS: 00007fa58ac55700(0000) GS:ffff888c10b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa58b523230 CR3: 0000000bf44ab003 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? intel_pstate_update_util_hwp+0x13/0x170 rt_mutex_setprio+0x1cc/0x4b0 task_blocks_on_rt_mutex+0x225/0x260 rt_spin_lock_slowlock_locked+0xab/0x2d0 rt_spin_lock_slowlock+0x50/0x80 hrtimer_grab_expiry_lock+0x20/0x30 hrtimer_cancel+0x13/0x30 do_nanosleep+0xa0/0x150 hrtimer_nanosleep+0xe1/0x230 ? __hrtimer_init_sleeper+0x60/0x60 __x64_sys_nanosleep+0x8d/0xa0 do_syscall_64+0x4a/0x100 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fa58b52330d ... ---[ end trace 0000000000000002 ]— He also provided a simple reproducer creating the situation below: So the execution order of locking steps are the following (N1 and N2 are non-deadline tasks. D1 is a deadline task. M1 and M2 are mutexes that are enabled * with priority inheritance.) Time moves forward as this timeline goes down: N1 N2 D1 | | | | | | Lock(M1) | | | | | | Lock(M2) | | | | | | Lock(M2) | | | | Lock(M1) | | (!!bug triggered!) | Daniel reported a similar situation as well, by just letting ksoftirqd run with DEADLINE (and eventually block on a mutex). Problem is that boosted entities (Priority Inheritance) use static DEADLINE parameters of the top priority waiter. However, there might be cases where top waiter could be a non-DEADLINE entity that is currently boosted by a DEADLINE entity from a different lock chain (i.e., nested priority chains involving entities of non-DEADLINE classes). In this case, top waiter static DEADLINE parameters could be null (initialized to 0 at fork()) and replenish_dl_entity() would hit a BUG(). Fix this by keeping track of the original donor and using its parameters when a task is boosted. Reported-by: Glenn Elliott <glenn@aurora.tech> Reported-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201117061432.517340-1-juri.lelli@redhat.com [Ankit: Regenerated the patch for v4.19.y] Signed-off-by: Ankit Jain <ankitja@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sota4Ever
pushed a commit
to Sota4Ever/linux-android-samsung-exynos990
that referenced
this pull request
Dec 27, 2024
[ Upstream commit 0dd37d6dd33a9c23351e6115ae8cdac7863bc7de ] We've run into the case that the balancer tries to balance a migration disabled task and trigger the warning in set_task_cpu() like below: ------------[ cut here ]------------ WARNING: CPU: 7 PID: 0 at kernel/sched/core.c:3115 set_task_cpu+0x188/0x240 Modules linked in: hclgevf xt_CHECKSUM ipt_REJECT nf_reject_ipv4 <...snip> CPU: 7 PID: 0 Comm: swapper/7 Kdump: loaded Tainted: G O 6.1.0-rc4+ fcuzzocrea#1 Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V5.B221.01 12/09/2021 pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : set_task_cpu+0x188/0x240 lr : load_balance+0x5d0/0xc60 sp : ffff80000803bc70 x29: ffff80000803bc70 x28: ffff004089e190e8 x27: ffff004089e19040 x26: ffff007effcabc38 x25: 0000000000000000 x24: 0000000000000001 x23: ffff80000803be84 x22: 000000000000000c x21: ffffb093e79e2a78 x20: 000000000000000c x19: ffff004089e19040 x18: 0000000000000000 x17: 0000000000001fad x16: 0000000000000030 x15: 0000000000000000 x14: 0000000000000003 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000001 x10: 0000000000000400 x9 : ffffb093e4cee530 x8 : 00000000fffffffe x7 : 0000000000ce168a x6 : 000000000000013e x5 : 00000000ffffffe1 x4 : 0000000000000001 x3 : 0000000000000b2a x2 : 0000000000000b2a x1 : ffffb093e6d6c510 x0 : 0000000000000001 Call trace: set_task_cpu+0x188/0x240 load_balance+0x5d0/0xc60 rebalance_domains+0x26c/0x380 _nohz_idle_balance.isra.0+0x1e0/0x370 run_rebalance_domains+0x6c/0x80 __do_softirq+0x128/0x3d8 ____do_softirq+0x18/0x24 call_on_irq_stack+0x2c/0x38 do_softirq_own_stack+0x24/0x3c __irq_exit_rcu+0xcc/0xf4 irq_exit_rcu+0x18/0x24 el1_interrupt+0x4c/0xe4 el1h_64_irq_handler+0x18/0x2c el1h_64_irq+0x74/0x78 arch_cpu_idle+0x18/0x4c default_idle_call+0x58/0x194 do_idle+0x244/0x2b0 cpu_startup_entry+0x30/0x3c secondary_start_kernel+0x14c/0x190 __secondary_switched+0xb0/0xb4 ---[ end trace 0000000000000000 ]--- Further investigation shows that the warning is superfluous, the migration disabled task is just going to be migrated to its current running CPU. This is because that on load balance if the dst_cpu is not allowed by the task, we'll re-select a new_dst_cpu as a candidate. If no task can be balanced to dst_cpu we'll try to balance the task to the new_dst_cpu instead. In this case when the migration disabled task is not on CPU it only allows to run on its current CPU, load balance will select its current CPU as new_dst_cpu and later triggers the warning above. The new_dst_cpu is chosen from the env->dst_grpmask. Currently it contains CPUs in sched_group_span() and if we have overlapped groups it's possible to run into this case. This patch makes env->dst_grpmask of group_balance_mask() which exclude any CPUs from the busiest group and solve the issue. For balancing in a domain with no overlapped groups the behaviour keeps same as before. Suggested-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20230530082507.10444-1-yangyicong@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.