-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiling with Docker #64
Comments
nice idea to create a docker container...isn't there a bionic base? it seems you create the docker-file 1 level above the git repo (and mounting this to /srv), right? need to setup a docker environment to test it... |
Yes it one level outside of the git repo, but you can adjust the paths easily which are behind the Upgrading to bionic should be rather easy and also worth to try 👍 |
imho the docker-file should be inside utils-directory (folder "docker" or similar) with a script to initialize it (your additional commands). if you make a patch (git format-patch) i can apply it directly...should be easier as a pull request imho gccgo-4.7-arm-linux-gnueabi and binutils-arm-linux-gnueabihf are not needed (or automaticly installed) |
I changed the stuff you mentioned and create the patch:
Download the patch |
I just noticed that the docker daemon will create the |
can you move docker folder to utils/ (utils/docker/...)? and squash the 3 Patches? you can include the fixes/upgrade in single Patch instead of creating old value and than change it default should be running docker with user-rights (user in group docker) also inside container build.sh should not be run as root |
merged your patch, but please make a patch to move docker folder into utils/ and some usage-instructions will be nice ;) as me and some other are not experienced with docker |
As wished, I ...
moved_Docker_tools_to___utils_and_added_some_documentation_for_it.patch.txt |
seems patchfile is broken
can you please make a patch with git-format-patch? |
i have added the changes manually to 4.14-main, can you please test it? |
I tried the version from |
Ok,added it also to 4.19 and set you as author |
commit b803974 upstream. This fixes the below calltrace when the CONFIG_DMA_API_DEBUG is enabled. DMA-API: thunderx_mmc 0000:01:01.4: cpu touching an active dma mapped cacheline [cln=0x000000002fdf9800] WARNING: CPU: 21 PID: 1 at kernel/dma/debug.c:596 debug_dma_assert_idle+0x1f8/0x270 Modules linked in: CPU: 21 PID: 1 Comm: init Not tainted 5.3.0-rc1-next-20190725-yocto-standard+ #64 Hardware name: Marvell OcteonTX CN96XX board (DT) pstate: 80400009 (Nzcv daif +PAN -UAO) pc : debug_dma_assert_idle+0x1f8/0x270 lr : debug_dma_assert_idle+0x1f8/0x270 sp : ffff0000113cfc10 x29: ffff0000113cfc10 x28: 0000ffff8c880000 x27: ffff800bc72a0000 x26: ffff000010ff8000 x25: ffff000010ff8940 x24: ffff000010ff8968 x23: 0000000000000000 x22: ffff000010e83700 x21: ffff000010ea2000 x20: ffff000010e835c8 x19: ffff800bc2c73300 x18: ffffffffffffffff x17: 0000000000000000 x16: 0000000000000000 x15: ffff000010e835c8 x14: 6d20616d64206576 x13: 69746361206e6120 x12: 676e696863756f74 x11: 20757063203a342e x10: 31303a31303a3030 x9 : 303020636d6d5f78 x8 : 3230303030303030 x7 : 00000000000002fd x6 : ffff000010fd57d0 x5 : 0000000000000000 x4 : ffff0000106c5210 x3 : 00000000ffffffff x2 : 0000800bee9c0000 x1 : 57d5843f4aa62800 x0 : 0000000000000000 Call trace: debug_dma_assert_idle+0x1f8/0x270 wp_page_copy+0xb0/0x688 do_wp_page+0xa8/0x5b8 __handle_mm_fault+0x600/0xd00 handle_mm_fault+0x118/0x1e8 do_page_fault+0x200/0x500 do_mem_abort+0x50/0xb0 el0_da+0x20/0x24 ---[ end trace a005534bd23e109f ]--- DMA-API: Mapped at: debug_dma_map_sg+0x94/0x350 cvm_mmc_request+0x3c4/0x988 __mmc_start_request+0x9c/0x1f8 mmc_start_request+0x7c/0xb0 mmc_blk_mq_issue_rq+0x5c4/0x7b8 Signed-off-by: Kevin Hao <haokexin@gmail.com> Fixes: ba3869f ("mmc: cavium: Add core MMC driver for Cavium SOCs") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b803974 upstream. This fixes the below calltrace when the CONFIG_DMA_API_DEBUG is enabled. DMA-API: thunderx_mmc 0000:01:01.4: cpu touching an active dma mapped cacheline [cln=0x000000002fdf9800] WARNING: CPU: 21 PID: 1 at kernel/dma/debug.c:596 debug_dma_assert_idle+0x1f8/0x270 Modules linked in: CPU: 21 PID: 1 Comm: init Not tainted 5.3.0-rc1-next-20190725-yocto-standard+ #64 Hardware name: Marvell OcteonTX CN96XX board (DT) pstate: 80400009 (Nzcv daif +PAN -UAO) pc : debug_dma_assert_idle+0x1f8/0x270 lr : debug_dma_assert_idle+0x1f8/0x270 sp : ffff0000113cfc10 x29: ffff0000113cfc10 x28: 0000ffff8c880000 x27: ffff800bc72a0000 x26: ffff000010ff8000 x25: ffff000010ff8940 x24: ffff000010ff8968 x23: 0000000000000000 x22: ffff000010e83700 x21: ffff000010ea2000 x20: ffff000010e835c8 x19: ffff800bc2c73300 x18: ffffffffffffffff x17: 0000000000000000 x16: 0000000000000000 x15: ffff000010e835c8 x14: 6d20616d64206576 x13: 69746361206e6120 x12: 676e696863756f74 x11: 20757063203a342e x10: 31303a31303a3030 x9 : 303020636d6d5f78 x8 : 3230303030303030 x7 : 00000000000002fd x6 : ffff000010fd57d0 x5 : 0000000000000000 x4 : ffff0000106c5210 x3 : 00000000ffffffff x2 : 0000800bee9c0000 x1 : 57d5843f4aa62800 x0 : 0000000000000000 Call trace: debug_dma_assert_idle+0x1f8/0x270 wp_page_copy+0xb0/0x688 do_wp_page+0xa8/0x5b8 __handle_mm_fault+0x600/0xd00 handle_mm_fault+0x118/0x1e8 do_page_fault+0x200/0x500 do_mem_abort+0x50/0xb0 el0_da+0x20/0x24 ---[ end trace a005534bd23e109f ]--- DMA-API: Mapped at: debug_dma_map_sg+0x94/0x350 cvm_mmc_request+0x3c4/0x988 __mmc_start_request+0x9c/0x1f8 mmc_start_request+0x7c/0xb0 mmc_blk_mq_issue_rq+0x5c4/0x7b8 Signed-off-by: Kevin Hao <haokexin@gmail.com> Fixes: ba3869f ("mmc: cavium: Add core MMC driver for Cavium SOCs") Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
arm32 uses software to simulate the instruction replaced by kprobe. some instructions may be simulated by constructing assembly functions. therefore, before executing instruction simulation, it is necessary to construct assembly function execution environment in C language through binding registers. after kasan is enabled, the register binding relationship will be destroyed, resulting in instruction simulation errors and causing kernel panic. the kprobe emulate instruction function is distributed in three files: actions-common.c actions-arm.c actions-thumb.c, so disable KASAN when compiling these files. for example, use kprobe insert on cap_capable+20 after kasan enabled, the cap_capable assembly code is as follows: <cap_capable>: e92d47f0 push {r4, r5, r6, r7, r8, r9, sl, lr} e1a05000 mov r5, r0 e280006c add r0, r0, #108 ; 0x6c e1a04001 mov r4, r1 e1a06002 mov r6, r2 e59fa090 ldr sl, [pc, #144] ; ebfc7bf8 bl c03aa4b4 <__asan_load4> e595706c ldr r7, [r5, #108] ; 0x6c e2859014 add r9, r5, #20 ...... The emulate_ldr assembly code after enabling kasan is as follows: c06f1384 <emulate_ldr>: e92d47f0 push {r4, r5, r6, r7, r8, r9, sl, lr} e282803c add r8, r2, #60 ; 0x3c e1a05000 mov r5, r0 e7e37855 ubfx r7, r5, #16, #4 e1a00008 mov r0, r8 e1a09001 mov r9, r1 e1a04002 mov r4, r2 ebf35462 bl c03c6530 <__asan_load4> e357000f cmp r7, #15 e7e36655 ubfx r6, r5, #12, #4 e205a00f and sl, r5, #15 0a000001 beq c06f13bc <emulate_ldr+0x38> e0840107 add r0, r4, r7, lsl #2 ebf3545c bl c03c6530 <__asan_load4> e084010a add r0, r4, sl, lsl #2 ebf3545a bl c03c6530 <__asan_load4> e2890010 add r0, r9, #16 ebf35458 bl c03c6530 <__asan_load4> e5990010 ldr r0, [r9, #16] e12fff30 blx r0 e356000f cm r6, #15 1a000014 bne c06f1430 <emulate_ldr+0xac> e1a06000 mov r6, r0 e2840040 add r0, r4, #64 ; 0x40 ...... when running in emulate_ldr to simulate the ldr instruction, panic occurred, and the log is as follows: Unable to handle kernel NULL pointer dereference at virtual address 00000090 pgd = ecb46400 [00000090] *pgd=2e0fa003, *pmd=00000000 Internal error: Oops: 206 [#1] SMP ARM PC is at cap_capable+0x14/0xb0 LR is at emulate_ldr+0x50/0xc0 psr: 600d0293 sp : ecd63af8 ip : 00000004 fp : c0a7c30c r10: 00000000 r9 : c30897f4 r8 : ecd63cd4 r7 : 0000000f r6 : 0000000a r5 : e59fa090 r4 : ecd63c98 r3 : c06ae294 r2 : 00000000 r1 : b7611300 r0 : bf4ec008 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 32c5387d Table: 2d546400 DAC: 55555555 Process bash (pid: 1643, stack limit = 0xecd60190) (cap_capable) from (kprobe_handler+0x218/0x340) (kprobe_handler) from (kprobe_trap_handler+0x24/0x48) (kprobe_trap_handler) from (do_undefinstr+0x13c/0x364) (do_undefinstr) from (__und_svc_finish+0x0/0x30) (__und_svc_finish) from (cap_capable+0x18/0xb0) (cap_capable) from (cap_vm_enough_memory+0x38/0x48) (cap_vm_enough_memory) from (security_vm_enough_memory_mm+0x48/0x6c) (security_vm_enough_memory_mm) from (copy_process.constprop.5+0x16b4/0x25c8) (copy_process.constprop.5) from (_do_fork+0xe8/0x55c) (_do_fork) from (SyS_clone+0x1c/0x24) (SyS_clone) from (__sys_trace_return+0x0/0x10) Code: 0050a0e1 6c0080e2 0140a0e1 0260a0e1 (f801f0e7) Fixes: 35aa1df ("ARM kprobes: instruction single-stepping support") Fixes: 4210157 ("ARM: 9017/2: Enable KASan for ARM") Signed-off-by: huangshaobo <huangshaobo6@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
commit 8b59b0a upstream. arm32 uses software to simulate the instruction replaced by kprobe. some instructions may be simulated by constructing assembly functions. therefore, before executing instruction simulation, it is necessary to construct assembly function execution environment in C language through binding registers. after kasan is enabled, the register binding relationship will be destroyed, resulting in instruction simulation errors and causing kernel panic. the kprobe emulate instruction function is distributed in three files: actions-common.c actions-arm.c actions-thumb.c, so disable KASAN when compiling these files. for example, use kprobe insert on cap_capable+20 after kasan enabled, the cap_capable assembly code is as follows: <cap_capable>: e92d47f0 push {r4, r5, r6, r7, r8, r9, sl, lr} e1a05000 mov r5, r0 e280006c add r0, r0, #108 ; 0x6c e1a04001 mov r4, r1 e1a06002 mov r6, r2 e59fa090 ldr sl, [pc, #144] ; ebfc7bf8 bl c03aa4b4 <__asan_load4> e595706c ldr r7, [r5, #108] ; 0x6c e2859014 add r9, r5, #20 ...... The emulate_ldr assembly code after enabling kasan is as follows: c06f1384 <emulate_ldr>: e92d47f0 push {r4, r5, r6, r7, r8, r9, sl, lr} e282803c add r8, r2, #60 ; 0x3c e1a05000 mov r5, r0 e7e37855 ubfx r7, r5, #16, #4 e1a00008 mov r0, r8 e1a09001 mov r9, r1 e1a04002 mov r4, r2 ebf35462 bl c03c6530 <__asan_load4> e357000f cmp r7, #15 e7e36655 ubfx r6, r5, #12, #4 e205a00f and sl, r5, #15 0a000001 beq c06f13bc <emulate_ldr+0x38> e0840107 add r0, r4, r7, lsl #2 ebf3545c bl c03c6530 <__asan_load4> e084010a add r0, r4, sl, lsl #2 ebf3545a bl c03c6530 <__asan_load4> e2890010 add r0, r9, #16 ebf35458 bl c03c6530 <__asan_load4> e5990010 ldr r0, [r9, #16] e12fff30 blx r0 e356000f cm r6, #15 1a000014 bne c06f1430 <emulate_ldr+0xac> e1a06000 mov r6, r0 e2840040 add r0, r4, #64 ; 0x40 ...... when running in emulate_ldr to simulate the ldr instruction, panic occurred, and the log is as follows: Unable to handle kernel NULL pointer dereference at virtual address 00000090 pgd = ecb46400 [00000090] *pgd=2e0fa003, *pmd=00000000 Internal error: Oops: 206 [#1] SMP ARM PC is at cap_capable+0x14/0xb0 LR is at emulate_ldr+0x50/0xc0 psr: 600d0293 sp : ecd63af8 ip : 00000004 fp : c0a7c30c r10: 00000000 r9 : c30897f4 r8 : ecd63cd4 r7 : 0000000f r6 : 0000000a r5 : e59fa090 r4 : ecd63c98 r3 : c06ae294 r2 : 00000000 r1 : b7611300 r0 : bf4ec008 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 32c5387d Table: 2d546400 DAC: 55555555 Process bash (pid: 1643, stack limit = 0xecd60190) (cap_capable) from (kprobe_handler+0x218/0x340) (kprobe_handler) from (kprobe_trap_handler+0x24/0x48) (kprobe_trap_handler) from (do_undefinstr+0x13c/0x364) (do_undefinstr) from (__und_svc_finish+0x0/0x30) (__und_svc_finish) from (cap_capable+0x18/0xb0) (cap_capable) from (cap_vm_enough_memory+0x38/0x48) (cap_vm_enough_memory) from (security_vm_enough_memory_mm+0x48/0x6c) (security_vm_enough_memory_mm) from (copy_process.constprop.5+0x16b4/0x25c8) (copy_process.constprop.5) from (_do_fork+0xe8/0x55c) (_do_fork) from (SyS_clone+0x1c/0x24) (SyS_clone) from (__sys_trace_return+0x0/0x10) Code: 0050a0e1 6c0080e2 0140a0e1 0260a0e1 (f801f0e7) Fixes: 35aa1df ("ARM kprobes: instruction single-stepping support") Fixes: 4210157 ("ARM: 9017/2: Enable KASan for ARM") Signed-off-by: huangshaobo <huangshaobo6@huawei.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Map the leaf SPTE when handling a TDP MMU page fault if and only if the target level is reached. A recent commit reworked the retry logic and incorrectly assumed that walking SPTEs would never "fail", as the loop either bails (retries) or installs parent SPs. However, the iterator itself will bail early if it detects a frozen (REMOVED) SPTE when stepping down. The TDP iterator also rereads the current SPTE before stepping down specifically to avoid walking into a part of the tree that is being removed, which means it's possible to terminate the loop without the guts of the loop observing the frozen SPTE, e.g. if a different task zaps a parent SPTE between the initial read and try_step_down()'s refresh. Mapping a leaf SPTE at the wrong level results in all kinds of badness as page table walkers interpret the SPTE as a page table, not a leaf, and walk into the weeds. ------------[ cut here ]------------ WARNING: CPU: 1 PID: 1025 at arch/x86/kvm/mmu/tdp_mmu.c:1070 kvm_tdp_mmu_map+0x481/0x510 Modules linked in: kvm_intel CPU: 1 PID: 1025 Comm: nx_huge_pages_t Tainted: G W 6.1.0-rc4+ #64 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kvm_tdp_mmu_map+0x481/0x510 RSP: 0018:ffffc9000072fba8 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffffc9000072fcc0 RCX: 0000000000000027 RDX: 0000000000000027 RSI: 00000000ffffdfff RDI: ffff888277c5b4c8 RBP: ffff888107d45a10 R08: ffff888277c5b4c0 R09: ffffc9000072fa48 R10: 0000000000000001 R11: 0000000000000001 R12: ffffc9000073a0e0 R13: ffff88810fc54800 R14: ffff888107d1ae60 R15: ffff88810fc54f90 FS: 00007fba9f853740(0000) GS:ffff888277c40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000010aa7a003 CR4: 0000000000172ea0 Call Trace: <TASK> kvm_tdp_page_fault+0x10c/0x130 kvm_mmu_page_fault+0x103/0x680 vmx_handle_exit+0x132/0x5a0 [kvm_intel] vcpu_enter_guest+0x60c/0x16f0 kvm_arch_vcpu_ioctl_run+0x1e2/0x9d0 kvm_vcpu_ioctl+0x271/0x660 __x64_sys_ioctl+0x80/0xb0 do_syscall_64+0x2b/0x50 entry_SYSCALL_64_after_hwframe+0x46/0xb0 </TASK> ---[ end trace 0000000000000000 ]--- Invalid SPTE change: cannot replace a present leaf SPTE with another present leaf SPTE mapping a different PFN! as_id: 0 gfn: 100200 old_spte: 600000112400bf3 new_spte: 6000001126009f3 level: 2 ------------[ cut here ]------------ kernel BUG at arch/x86/kvm/mmu/tdp_mmu.c:559! invalid opcode: 0000 [#1] SMP CPU: 1 PID: 1025 Comm: nx_huge_pages_t Tainted: G W 6.1.0-rc4+ #64 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:__handle_changed_spte.cold+0x95/0x9c RSP: 0018:ffffc9000072faf8 EFLAGS: 00010246 RAX: 00000000000000c1 RBX: ffffc90000731000 RCX: 0000000000000027 RDX: 0000000000000000 RSI: 00000000ffffdfff RDI: ffff888277c5b4c8 RBP: 0600000112400bf3 R08: ffff888277c5b4c0 R09: ffffc9000072f9a0 R10: 0000000000000001 R11: 0000000000000001 R12: 06000001126009f3 R13: 0000000000000002 R14: 0000000012600901 R15: 0000000012400b01 FS: 00007fba9f853740(0000) GS:ffff888277c40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000010aa7a003 CR4: 0000000000172ea0 Call Trace: <TASK> kvm_tdp_mmu_map+0x3b0/0x510 kvm_tdp_page_fault+0x10c/0x130 kvm_mmu_page_fault+0x103/0x680 vmx_handle_exit+0x132/0x5a0 [kvm_intel] vcpu_enter_guest+0x60c/0x16f0 kvm_arch_vcpu_ioctl_run+0x1e2/0x9d0 kvm_vcpu_ioctl+0x271/0x660 __x64_sys_ioctl+0x80/0xb0 do_syscall_64+0x2b/0x50 entry_SYSCALL_64_after_hwframe+0x46/0xb0 </TASK> Modules linked in: kvm_intel ---[ end trace 0000000000000000 ]--- Fixes: 63d28a2 ("KVM: x86/mmu: simplify kvm_tdp_mmu_map flow when guest has to retry") Cc: Robert Hoo <robert.hu@linux.intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221213033030.83345-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Don't install a leaf TDP MMU SPTE if the parent page's level doesn't match the target level of the fault, and instead have the vCPU retry the faulting instruction after warning. Continuing on is completely unnecessary as the absolute worst case scenario of retrying is DoSing the vCPU, whereas continuing on all but guarantees bigger explosions, e.g. ------------[ cut here ]------------ kernel BUG at arch/x86/kvm/mmu/tdp_mmu.c:559! invalid opcode: 0000 [#1] SMP CPU: 1 PID: 1025 Comm: nx_huge_pages_t Tainted: G W 6.1.0-rc4+ #64 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:__handle_changed_spte.cold+0x95/0x9c RSP: 0018:ffffc9000072faf8 EFLAGS: 00010246 RAX: 00000000000000c1 RBX: ffffc90000731000 RCX: 0000000000000027 RDX: 0000000000000000 RSI: 00000000ffffdfff RDI: ffff888277c5b4c8 RBP: 0600000112400bf3 R08: ffff888277c5b4c0 R09: ffffc9000072f9a0 R10: 0000000000000001 R11: 0000000000000001 R12: 06000001126009f3 R13: 0000000000000002 R14: 0000000012600901 R15: 0000000012400b01 FS: 00007fba9f853740(0000) GS:ffff888277c40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000010aa7a003 CR4: 0000000000172ea0 Call Trace: <TASK> kvm_tdp_mmu_map+0x3b0/0x510 kvm_tdp_page_fault+0x10c/0x130 kvm_mmu_page_fault+0x103/0x680 vmx_handle_exit+0x132/0x5a0 [kvm_intel] vcpu_enter_guest+0x60c/0x16f0 kvm_arch_vcpu_ioctl_run+0x1e2/0x9d0 kvm_vcpu_ioctl+0x271/0x660 __x64_sys_ioctl+0x80/0xb0 do_syscall_64+0x2b/0x50 entry_SYSCALL_64_after_hwframe+0x46/0xb0 </TASK> Modules linked in: kvm_intel ---[ end trace 0000000000000000 ]--- Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20221213033030.83345-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
'__net_initdata' becomes a no-op with CONFIG_NET_NS=y, but when this option is disabled it becomes '__initdata', which means the data can be freed after the initialization phase. This annotation is obviously incorrect for the devlink net device notifier block which is still registered after the initialization phase [1]. Fix this crash by removing the '__net_initdata' annotation. [1] general protection fault, probably for non-canonical address 0xcccccccccccccccc: 0000 [#1] PREEMPT SMP CPU: 3 PID: 117 Comm: (udev-worker) Not tainted 6.4.0-rc1-custom-gdf0acdc59b09 #64 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc37 04/01/2014 RIP: 0010:notifier_call_chain+0x58/0xc0 [...] Call Trace: <TASK> dev_set_mac_address+0x85/0x120 dev_set_mac_address_user+0x30/0x50 do_setlink+0x219/0x1270 rtnl_setlink+0xf7/0x1a0 rtnetlink_rcv_msg+0x142/0x390 netlink_rcv_skb+0x58/0x100 netlink_unicast+0x188/0x270 netlink_sendmsg+0x214/0x470 __sys_sendto+0x12f/0x1a0 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x38/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: e93c937 ("devlink: change per-devlink netdev notifier to static one") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Closes: https://lore.kernel.org/netdev/600ddf9e-589a-2aa0-7b69-a438f833ca10@samsung.com/ Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230515162925.1144416-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
[ Upstream commit 6d41d4f ] BUG: KASAN: slab-use-after-free in xfrm_policy_inexact_list_reinsert+0xb6/0x430 Read of size 1 at addr ffff8881051f3bf8 by task ip/668 CPU: 2 PID: 668 Comm: ip Not tainted 6.5.0-rc5-00182-g25aa0bebba72-dirty #64 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x72/0xa0 print_report+0xd0/0x620 kasan_report+0xb6/0xf0 xfrm_policy_inexact_list_reinsert+0xb6/0x430 xfrm_policy_inexact_insert_node.constprop.0+0x537/0x800 xfrm_policy_inexact_alloc_chain+0x23f/0x320 xfrm_policy_inexact_insert+0x6b/0x590 xfrm_policy_insert+0x3b1/0x480 xfrm_add_policy+0x23c/0x3c0 xfrm_user_rcv_msg+0x2d0/0x510 netlink_rcv_skb+0x10d/0x2d0 xfrm_netlink_rcv+0x49/0x60 netlink_unicast+0x3fe/0x540 netlink_sendmsg+0x528/0x970 sock_sendmsg+0x14a/0x160 ____sys_sendmsg+0x4fc/0x580 ___sys_sendmsg+0xef/0x160 __sys_sendmsg+0xf7/0x1b0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x73/0xdd The root cause is: cpu 0 cpu1 xfrm_dump_policy xfrm_policy_walk list_move_tail xfrm_add_policy ... ... xfrm_policy_inexact_list_reinsert list_for_each_entry_reverse if (!policy->bydst_reinsert) //read non-existent policy xfrm_dump_policy_done xfrm_policy_walk_done list_del(&walk->walk.all); If dump_one_policy() returns err (triggered by netlink socket), xfrm_policy_walk() will move walk initialized by socket to list net->xfrm.policy_all. so this socket becomes visible in the global policy list. The head *walk can be traversed when users add policies with different prefixlen and trigger xfrm_policy node merge. The issue can also be triggered by policy list traversal while rehashing and flushing policies. It can be fixed by skip such "policies" with walk.dead set to 1. Fixes: 9cf545e ("xfrm: policy: store inexact policies in a tree ordered by destination address") Fixes: 12a169e ("ipsec: Put dumpers on the dump list") Signed-off-by: Dong Chenchen <dongchenchen2@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied. For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to. The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe. Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor #128, despite #64 being observably not open. The minimally invasive fix would be to deal with that in dup_fd(). If this proves to add measurable overhead, we can go that way, but let's try to fix copy_fd_bitmaps() first. * new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size). * make copy_fd_bitmaps() take the bitmap size in words, rather than bits; it's 'count' argument is always a multiple of BITS_PER_LONG, so we are not losing any information, and that way we can use the same helper for all three bitmaps - compiler will see that count is a multiple of BITS_PER_LONG for the large ones, so it'll generate plain memcpy()+memset(). Reproducer added to tools/testing/selftests/core/close_range_test.c Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
commit 9a2fa14 upstream. copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied. For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to. The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe. Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor frank-w#128, despite frank-w#64 being observably not open. The minimally invasive fix would be to deal with that in dup_fd(). If this proves to add measurable overhead, we can go that way, but let's try to fix copy_fd_bitmaps() first. * new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size). * make copy_fd_bitmaps() take the bitmap size in words, rather than bits; it's 'count' argument is always a multiple of BITS_PER_LONG, so we are not losing any information, and that way we can use the same helper for all three bitmaps - compiler will see that count is a multiple of BITS_PER_LONG for the large ones, so it'll generate plain memcpy()+memset(). Reproducer added to tools/testing/selftests/core/close_range_test.c Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9a2fa14 upstream. copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied. For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to. The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe. Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor #128, despite #64 being observably not open. The minimally invasive fix would be to deal with that in dup_fd(). If this proves to add measurable overhead, we can go that way, but let's try to fix copy_fd_bitmaps() first. * new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size). * make copy_fd_bitmaps() take the bitmap size in words, rather than bits; it's 'count' argument is always a multiple of BITS_PER_LONG, so we are not losing any information, and that way we can use the same helper for all three bitmaps - compiler will see that count is a multiple of BITS_PER_LONG for the large ones, so it'll generate plain memcpy()+memset(). Reproducer added to tools/testing/selftests/core/close_range_test.c Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9a2fa14 upstream. copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied. For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to. The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe. Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor #128, despite #64 being observably not open. The minimally invasive fix would be to deal with that in dup_fd(). If this proves to add measurable overhead, we can go that way, but let's try to fix copy_fd_bitmaps() first. * new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size). * make copy_fd_bitmaps() take the bitmap size in words, rather than bits; it's 'count' argument is always a multiple of BITS_PER_LONG, so we are not losing any information, and that way we can use the same helper for all three bitmaps - compiler will see that count is a multiple of BITS_PER_LONG for the large ones, so it'll generate plain memcpy()+memset(). Reproducer added to tools/testing/selftests/core/close_range_test.c Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9a2fa14 upstream. copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied. For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to. The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe. Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor #128, despite #64 being observably not open. The minimally invasive fix would be to deal with that in dup_fd(). If this proves to add measurable overhead, we can go that way, but let's try to fix copy_fd_bitmaps() first. * new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size). * make copy_fd_bitmaps() take the bitmap size in words, rather than bits; it's 'count' argument is always a multiple of BITS_PER_LONG, so we are not losing any information, and that way we can use the same helper for all three bitmaps - compiler will see that count is a multiple of BITS_PER_LONG for the large ones, so it'll generate plain memcpy()+memset(). Reproducer added to tools/testing/selftests/core/close_range_test.c Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9a2fa14 upstream. copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied. For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to. The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe. Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor #128, despite #64 being observably not open. The minimally invasive fix would be to deal with that in dup_fd(). If this proves to add measurable overhead, we can go that way, but let's try to fix copy_fd_bitmaps() first. * new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size). * make copy_fd_bitmaps() take the bitmap size in words, rather than bits; it's 'count' argument is always a multiple of BITS_PER_LONG, so we are not losing any information, and that way we can use the same helper for all three bitmaps - compiler will see that count is a multiple of BITS_PER_LONG for the large ones, so it'll generate plain memcpy()+memset(). Reproducer added to tools/testing/selftests/core/close_range_test.c Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9a2fa14 upstream. copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied. For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to. The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe. Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor #128, despite #64 being observably not open. The minimally invasive fix would be to deal with that in dup_fd(). If this proves to add measurable overhead, we can go that way, but let's try to fix copy_fd_bitmaps() first. * new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size). * make copy_fd_bitmaps() take the bitmap size in words, rather than bits; it's 'count' argument is always a multiple of BITS_PER_LONG, so we are not losing any information, and that way we can use the same helper for all three bitmaps - compiler will see that count is a multiple of BITS_PER_LONG for the large ones, so it'll generate plain memcpy()+memset(). Reproducer added to tools/testing/selftests/core/close_range_test.c Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9a2fa14 upstream. copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied. For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to. The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe. Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor #128, despite #64 being observably not open. The minimally invasive fix would be to deal with that in dup_fd(). If this proves to add measurable overhead, we can go that way, but let's try to fix copy_fd_bitmaps() first. * new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size). * make copy_fd_bitmaps() take the bitmap size in words, rather than bits; it's 'count' argument is always a multiple of BITS_PER_LONG, so we are not losing any information, and that way we can use the same helper for all three bitmaps - compiler will see that count is a multiple of BITS_PER_LONG for the large ones, so it'll generate plain memcpy()+memset(). Reproducer added to tools/testing/selftests/core/close_range_test.c Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alex reports that syzkaller has managed to trigger a use-after-free when tearing down a VM: BUG: KASAN: slab-use-after-free in kvm_put_kvm+0x300/0xe68 virt/kvm/kvm_main.c:5769 Read of size 8 at addr ffffff801c6890d0 by task syz.3.2219/10758 CPU: 3 UID: 0 PID: 10758 Comm: syz.3.2219 Not tainted 6.11.0-rc6-dirty #64 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x17c/0x1a8 arch/arm64/kernel/stacktrace.c:317 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:324 __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x94/0xc0 lib/dump_stack.c:119 print_report+0x144/0x7a4 mm/kasan/report.c:377 kasan_report+0xcc/0x128 mm/kasan/report.c:601 __asan_report_load8_noabort+0x20/0x2c mm/kasan/report_generic.c:381 kvm_put_kvm+0x300/0xe68 virt/kvm/kvm_main.c:5769 kvm_vm_release+0x4c/0x60 virt/kvm/kvm_main.c:1409 __fput+0x198/0x71c fs/file_table.c:422 ____fput+0x20/0x30 fs/file_table.c:450 task_work_run+0x1cc/0x23c kernel/task_work.c:228 do_notify_resume+0x144/0x1a0 include/linux/resume_user_mode.h:50 el0_svc+0x64/0x68 arch/arm64/kernel/entry-common.c:169 el0t_64_sync_handler+0x90/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 Upon closer inspection, it appears that we do not properly tear down the MMIO registration for a vCPU that fails creation late in the game, e.g. a vCPU w/ the same ID already exists in the VM. It is important to consider the context of commit that introduced this bug by moving the unregistration out of __kvm_vgic_vcpu_destroy(). That change correctly sought to avoid an srcu v. config_lock inversion by breaking up the vCPU teardown into two parts, one guarded by the config_lock. Fix the use-after-free while avoiding lock inversion by adding a special-cased unregistration to __kvm_vgic_vcpu_destroy(). This is safe because failed vCPUs are torn down outside of the config_lock. Cc: stable@vger.kernel.org Fixes: f616506 ("KVM: arm64: vgic: Don't hold config_lock while unregistering redistributors") Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007223909.2157336-1-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
During the migration of Soundwire runtime stream allocation from the Qualcomm Soundwire controller to SoC's soundcard drivers the sdm845 soundcard was forgotten. At this point any playback attempt or audio daemon startup, for instance on sdm845-db845c (Qualcomm RB3 board), will result in stream pointer NULL dereference: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101ecf000 [0000000000000020] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP Modules linked in: ... CPU: 5 UID: 0 PID: 1198 Comm: aplay Not tainted 6.12.0-rc2-qcomlt-arm64-00059-g9d78f315a362-dirty #18 Hardware name: Thundercomm Dragonboard 845c (DT) pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : sdw_stream_add_slave+0x44/0x380 [soundwire_bus] lr : sdw_stream_add_slave+0x44/0x380 [soundwire_bus] sp : ffff80008a2035c0 x29: ffff80008a2035c0 x28: ffff80008a203978 x27: 0000000000000000 x26: 00000000000000c0 x25: 0000000000000000 x24: ffff1676025f4800 x23: ffff167600ff1cb8 x22: ffff167600ff1c98 x21: 0000000000000003 x20: ffff167607316000 x19: ffff167604e64e80 x18: 0000000000000000 x17: 0000000000000000 x16: ffffcec265074160 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff167600ff1cec x5 : ffffcec22cfa2010 x4 : 0000000000000000 x3 : 0000000000000003 x2 : ffff167613f836c0 x1 : 0000000000000000 x0 : ffff16761feb60b8 Call trace: sdw_stream_add_slave+0x44/0x380 [soundwire_bus] wsa881x_hw_params+0x68/0x80 [snd_soc_wsa881x] snd_soc_dai_hw_params+0x3c/0xa4 __soc_pcm_hw_params+0x230/0x660 dpcm_be_dai_hw_params+0x1d0/0x3f8 dpcm_fe_dai_hw_params+0x98/0x268 snd_pcm_hw_params+0x124/0x460 snd_pcm_common_ioctl+0x998/0x16e8 snd_pcm_ioctl+0x34/0x58 __arm64_sys_ioctl+0xac/0xf8 invoke_syscall+0x48/0x104 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x34/0xe0 el0t_64_sync_handler+0x120/0x12c el0t_64_sync+0x190/0x194 Code: aa0403fb f9418400 9100e000 9400102f (f8420f22) ---[ end trace 0000000000000000 ]--- 0000000000006108 <sdw_stream_add_slave>: 6108: d503233f paciasp 610c: a9b97bfd stp x29, x30, [sp, #-112]! 6110: 910003fd mov x29, sp 6114: a90153f3 stp x19, x20, [sp, #16] 6118: a9025bf5 stp x21, x22, [sp, #32] 611c: aa0103f6 mov x22, x1 6120: 2a0303f5 mov w21, w3 6124: a90363f7 stp x23, x24, [sp, #48] 6128: aa0003f8 mov x24, x0 612c: aa0203f7 mov x23, x2 6130: a9046bf9 stp x25, x26, [sp, #64] 6134: aa0403f9 mov x25, x4 <-- x4 copied to x25 6138: a90573fb stp x27, x28, [sp, #80] 613c: aa0403fb mov x27, x4 6140: f9418400 ldr x0, [x0, #776] 6144: 9100e000 add x0, x0, #0x38 6148: 94000000 bl 0 <mutex_lock> 614c: f8420f22 ldr x2, [x25, #32]! <-- offset 0x44 ^^^ This is 0x6108 + offset 0x44 from the beginning of sdw_stream_add_slave() where data abort happens. wsa881x_hw_params() is called with stream = NULL and passes it further in register x4 (5th argument) to sdw_stream_add_slave() without any checks. Value from x4 is copied to x25 and finally it aborts on trying to load a value from address in x25 plus offset 32 (in dec) which corresponds to master_list member in struct sdw_stream_runtime: struct sdw_stream_runtime { const char * name; /* 0 8 */ struct sdw_stream_params params; /* 8 12 */ enum sdw_stream_state state; /* 20 4 */ enum sdw_stream_type type; /* 24 4 */ /* XXX 4 bytes hole, try to pack */ here-> struct list_head master_list; /* 32 16 */ int m_rt_count; /* 48 4 */ /* size: 56, cachelines: 1, members: 6 */ /* sum members: 48, holes: 1, sum holes: 4 */ /* padding: 4 */ /* last cacheline: 56 bytes */ Fix this by adding required calls to qcom_snd_sdw_startup() and sdw_release_stream() to startup and shutdown routines which restores the previous correct behaviour when ->set_stream() method is called to set a valid stream runtime pointer on playback startup. Reproduced and then fix was tested on db845c RB3 board. Reported-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: stable@vger.kernel.org Fixes: 15c7fab ("ASoC: qcom: Move Soundwire runtime stream alloc to soundcards") Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org> Tested-by: Steev Klimaszewski <steev@kali.org> # Lenovo Yoga C630 Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://patch.msgid.link/20241009213922.999355-1-alexey.klimov@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org>
commit ae8f8b3 upstream. Alex reports that syzkaller has managed to trigger a use-after-free when tearing down a VM: BUG: KASAN: slab-use-after-free in kvm_put_kvm+0x300/0xe68 virt/kvm/kvm_main.c:5769 Read of size 8 at addr ffffff801c6890d0 by task syz.3.2219/10758 CPU: 3 UID: 0 PID: 10758 Comm: syz.3.2219 Not tainted 6.11.0-rc6-dirty frank-w#64 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x17c/0x1a8 arch/arm64/kernel/stacktrace.c:317 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:324 __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x94/0xc0 lib/dump_stack.c:119 print_report+0x144/0x7a4 mm/kasan/report.c:377 kasan_report+0xcc/0x128 mm/kasan/report.c:601 __asan_report_load8_noabort+0x20/0x2c mm/kasan/report_generic.c:381 kvm_put_kvm+0x300/0xe68 virt/kvm/kvm_main.c:5769 kvm_vm_release+0x4c/0x60 virt/kvm/kvm_main.c:1409 __fput+0x198/0x71c fs/file_table.c:422 ____fput+0x20/0x30 fs/file_table.c:450 task_work_run+0x1cc/0x23c kernel/task_work.c:228 do_notify_resume+0x144/0x1a0 include/linux/resume_user_mode.h:50 el0_svc+0x64/0x68 arch/arm64/kernel/entry-common.c:169 el0t_64_sync_handler+0x90/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598 Upon closer inspection, it appears that we do not properly tear down the MMIO registration for a vCPU that fails creation late in the game, e.g. a vCPU w/ the same ID already exists in the VM. It is important to consider the context of commit that introduced this bug by moving the unregistration out of __kvm_vgic_vcpu_destroy(). That change correctly sought to avoid an srcu v. config_lock inversion by breaking up the vCPU teardown into two parts, one guarded by the config_lock. Fix the use-after-free while avoiding lock inversion by adding a special-cased unregistration to __kvm_vgic_vcpu_destroy(). This is safe because failed vCPUs are torn down outside of the config_lock. Cc: stable@vger.kernel.org Fixes: f616506 ("KVM: arm64: vgic: Don't hold config_lock while unregistering redistributors") Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241007223909.2157336-1-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d0e806b upstream. During the migration of Soundwire runtime stream allocation from the Qualcomm Soundwire controller to SoC's soundcard drivers the sdm845 soundcard was forgotten. At this point any playback attempt or audio daemon startup, for instance on sdm845-db845c (Qualcomm RB3 board), will result in stream pointer NULL dereference: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101ecf000 [0000000000000020] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [frank-w#1] PREEMPT SMP Modules linked in: ... CPU: 5 UID: 0 PID: 1198 Comm: aplay Not tainted 6.12.0-rc2-qcomlt-arm64-00059-g9d78f315a362-dirty frank-w#18 Hardware name: Thundercomm Dragonboard 845c (DT) pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : sdw_stream_add_slave+0x44/0x380 [soundwire_bus] lr : sdw_stream_add_slave+0x44/0x380 [soundwire_bus] sp : ffff80008a2035c0 x29: ffff80008a2035c0 x28: ffff80008a203978 x27: 0000000000000000 x26: 00000000000000c0 x25: 0000000000000000 x24: ffff1676025f4800 x23: ffff167600ff1cb8 x22: ffff167600ff1c98 x21: 0000000000000003 x20: ffff167607316000 x19: ffff167604e64e80 x18: 0000000000000000 x17: 0000000000000000 x16: ffffcec265074160 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff167600ff1cec x5 : ffffcec22cfa2010 x4 : 0000000000000000 x3 : 0000000000000003 x2 : ffff167613f836c0 x1 : 0000000000000000 x0 : ffff16761feb60b8 Call trace: sdw_stream_add_slave+0x44/0x380 [soundwire_bus] wsa881x_hw_params+0x68/0x80 [snd_soc_wsa881x] snd_soc_dai_hw_params+0x3c/0xa4 __soc_pcm_hw_params+0x230/0x660 dpcm_be_dai_hw_params+0x1d0/0x3f8 dpcm_fe_dai_hw_params+0x98/0x268 snd_pcm_hw_params+0x124/0x460 snd_pcm_common_ioctl+0x998/0x16e8 snd_pcm_ioctl+0x34/0x58 __arm64_sys_ioctl+0xac/0xf8 invoke_syscall+0x48/0x104 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x34/0xe0 el0t_64_sync_handler+0x120/0x12c el0t_64_sync+0x190/0x194 Code: aa0403fb f9418400 9100e000 9400102f (f8420f22) ---[ end trace 0000000000000000 ]--- 0000000000006108 <sdw_stream_add_slave>: 6108: d503233f paciasp 610c: a9b97bfd stp x29, x30, [sp, #-112]! 6110: 910003fd mov x29, sp 6114: a90153f3 stp x19, x20, [sp, frank-w#16] 6118: a9025bf5 stp x21, x22, [sp, frank-w#32] 611c: aa0103f6 mov x22, x1 6120: 2a0303f5 mov w21, w3 6124: a90363f7 stp x23, x24, [sp, frank-w#48] 6128: aa0003f8 mov x24, x0 612c: aa0203f7 mov x23, x2 6130: a9046bf9 stp x25, x26, [sp, frank-w#64] 6134: aa0403f9 mov x25, x4 <-- x4 copied to x25 6138: a90573fb stp x27, x28, [sp, frank-w#80] 613c: aa0403fb mov x27, x4 6140: f9418400 ldr x0, [x0, #776] 6144: 9100e000 add x0, x0, #0x38 6148: 94000000 bl 0 <mutex_lock> 614c: f8420f22 ldr x2, [x25, frank-w#32]! <-- offset 0x44 ^^^ This is 0x6108 + offset 0x44 from the beginning of sdw_stream_add_slave() where data abort happens. wsa881x_hw_params() is called with stream = NULL and passes it further in register x4 (5th argument) to sdw_stream_add_slave() without any checks. Value from x4 is copied to x25 and finally it aborts on trying to load a value from address in x25 plus offset 32 (in dec) which corresponds to master_list member in struct sdw_stream_runtime: struct sdw_stream_runtime { const char * name; /* 0 8 */ struct sdw_stream_params params; /* 8 12 */ enum sdw_stream_state state; /* 20 4 */ enum sdw_stream_type type; /* 24 4 */ /* XXX 4 bytes hole, try to pack */ here-> struct list_head master_list; /* 32 16 */ int m_rt_count; /* 48 4 */ /* size: 56, cachelines: 1, members: 6 */ /* sum members: 48, holes: 1, sum holes: 4 */ /* padding: 4 */ /* last cacheline: 56 bytes */ Fix this by adding required calls to qcom_snd_sdw_startup() and sdw_release_stream() to startup and shutdown routines which restores the previous correct behaviour when ->set_stream() method is called to set a valid stream runtime pointer on playback startup. Reproduced and then fix was tested on db845c RB3 board. Reported-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: stable@vger.kernel.org Fixes: 15c7fab ("ASoC: qcom: Move Soundwire runtime stream alloc to soundcards") Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org> Tested-by: Steev Klimaszewski <steev@kali.org> # Lenovo Yoga C630 Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://patch.msgid.link/20241009213922.999355-1-alexey.klimov@linaro.org Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 60f07e2 ] We use uprobe in aarch64_be, which we found the tracee task would exit due to SIGILL when we enable the uprobe trace. We can see the replace inst from uprobe is not correct in aarch big-endian. As in Armv8-A, instruction fetches are always treated as little-endian, we should treat the UPROBE_SWBP_INSN as little-endian。 The test case is as following。 bash-4.4# ./mqueue_test_aarchbe 1 1 2 1 10 > /dev/null & bash-4.4# cd /sys/kernel/debug/tracing/ bash-4.4# echo 'p:test /mqueue_test_aarchbe:0xc30 %x0 %x1' > uprobe_events bash-4.4# echo 1 > events/uprobes/enable bash-4.4# bash-4.4# ps PID TTY TIME CMD 140 ? 00:00:01 bash 237 ? 00:00:00 ps [1]+ Illegal instruction ./mqueue_test_aarchbe 1 1 2 1 100 > /dev/null which we debug use gdb as following: bash-4.4# gdb attach 155 (gdb) disassemble send Dump of assembler code for function send: 0x0000000000400c30 <+0>: .inst 0xa00020d4 ; undefined 0x0000000000400c34 <+4>: mov x29, sp 0x0000000000400c38 <+8>: str w0, [sp, #28] 0x0000000000400c3c <+12>: strb w1, [sp, #27] 0x0000000000400c40 <+16>: str xzr, [sp, #40] 0x0000000000400c44 <+20>: str xzr, [sp, #48] 0x0000000000400c48 <+24>: add x0, sp, #0x1b 0x0000000000400c4c <+28>: mov w3, #0x0 // #0 0x0000000000400c50 <+32>: mov x2, #0x1 // #1 0x0000000000400c54 <+36>: mov x1, x0 0x0000000000400c58 <+40>: ldr w0, [sp, #28] 0x0000000000400c5c <+44>: bl 0x405e10 <mq_send> 0x0000000000400c60 <+48>: str w0, [sp, #60] 0x0000000000400c64 <+52>: ldr w0, [sp, #60] 0x0000000000400c68 <+56>: ldp x29, x30, [sp], #64 0x0000000000400c6c <+60>: ret End of assembler dump. (gdb) info b No breakpoints or watchpoints. (gdb) c Continuing. Program received signal SIGILL, Illegal instruction. 0x0000000000400c30 in send () (gdb) x/10x 0x400c30 0x400c30 <send>: 0xd42000a0 0xfd030091 0xe01f00b9 0xe16f0039 0x400c40 <send+16>: 0xff1700f9 0xff1b00f9 0xe06f0091 0x03008052 0x400c50 <send+32>: 0x220080d2 0xe10300aa (gdb) disassemble 0x400c30 Dump of assembler code for function send: => 0x0000000000400c30 <+0>: .inst 0xa00020d4 ; undefined 0x0000000000400c34 <+4>: mov x29, sp 0x0000000000400c38 <+8>: str w0, [sp, #28] 0x0000000000400c3c <+12>: strb w1, [sp, #27] 0x0000000000400c40 <+16>: str xzr, [sp, #40] Signed-off-by: junhua huang <huang.junhua@zte.com.cn> Link: https://lore.kernel.org/r/202212021511106844809@zte.com.cn Signed-off-by: Will Deacon <will@kernel.org> Stable-dep-of: 13f8f1e ("arm64: probes: Fix uprobes for big-endian kernels") Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 60f07e2 ] We use uprobe in aarch64_be, which we found the tracee task would exit due to SIGILL when we enable the uprobe trace. We can see the replace inst from uprobe is not correct in aarch big-endian. As in Armv8-A, instruction fetches are always treated as little-endian, we should treat the UPROBE_SWBP_INSN as little-endian。 The test case is as following。 bash-4.4# ./mqueue_test_aarchbe 1 1 2 1 10 > /dev/null & bash-4.4# cd /sys/kernel/debug/tracing/ bash-4.4# echo 'p:test /mqueue_test_aarchbe:0xc30 %x0 %x1' > uprobe_events bash-4.4# echo 1 > events/uprobes/enable bash-4.4# bash-4.4# ps PID TTY TIME CMD 140 ? 00:00:01 bash 237 ? 00:00:00 ps [1]+ Illegal instruction ./mqueue_test_aarchbe 1 1 2 1 100 > /dev/null which we debug use gdb as following: bash-4.4# gdb attach 155 (gdb) disassemble send Dump of assembler code for function send: 0x0000000000400c30 <+0>: .inst 0xa00020d4 ; undefined 0x0000000000400c34 <+4>: mov x29, sp 0x0000000000400c38 <+8>: str w0, [sp, #28] 0x0000000000400c3c <+12>: strb w1, [sp, #27] 0x0000000000400c40 <+16>: str xzr, [sp, #40] 0x0000000000400c44 <+20>: str xzr, [sp, #48] 0x0000000000400c48 <+24>: add x0, sp, #0x1b 0x0000000000400c4c <+28>: mov w3, #0x0 // #0 0x0000000000400c50 <+32>: mov x2, #0x1 // #1 0x0000000000400c54 <+36>: mov x1, x0 0x0000000000400c58 <+40>: ldr w0, [sp, #28] 0x0000000000400c5c <+44>: bl 0x405e10 <mq_send> 0x0000000000400c60 <+48>: str w0, [sp, #60] 0x0000000000400c64 <+52>: ldr w0, [sp, #60] 0x0000000000400c68 <+56>: ldp x29, x30, [sp], #64 0x0000000000400c6c <+60>: ret End of assembler dump. (gdb) info b No breakpoints or watchpoints. (gdb) c Continuing. Program received signal SIGILL, Illegal instruction. 0x0000000000400c30 in send () (gdb) x/10x 0x400c30 0x400c30 <send>: 0xd42000a0 0xfd030091 0xe01f00b9 0xe16f0039 0x400c40 <send+16>: 0xff1700f9 0xff1b00f9 0xe06f0091 0x03008052 0x400c50 <send+32>: 0x220080d2 0xe10300aa (gdb) disassemble 0x400c30 Dump of assembler code for function send: => 0x0000000000400c30 <+0>: .inst 0xa00020d4 ; undefined 0x0000000000400c34 <+4>: mov x29, sp 0x0000000000400c38 <+8>: str w0, [sp, #28] 0x0000000000400c3c <+12>: strb w1, [sp, #27] 0x0000000000400c40 <+16>: str xzr, [sp, #40] Signed-off-by: junhua huang <huang.junhua@zte.com.cn> Link: https://lore.kernel.org/r/202212021511106844809@zte.com.cn Signed-off-by: Will Deacon <will@kernel.org> Stable-dep-of: 13f8f1e ("arm64: probes: Fix uprobes for big-endian kernels") Signed-off-by: Sasha Levin <sashal@kernel.org>
Today I wanted to compile the Kernel on a fresh Archlinux and realized that installing all those cross-compiler packages may take a time. So I hacked a small Dockerfile for me, which I'll share with you:
Dockerfile:
Create the local image with:
docker build . --tag bpi-compile:1
Run the compile environment (adjust the path before the colon):
docker run -it -v $(pwd)/SD:/SD -v $(pwd)/BPI-R2-4.14:/srv bpi-compile:1 /bin/bash
Is it worth to create a pull request for this?
The text was updated successfully, but these errors were encountered: