lkl: Add FreeBSD platform support #1

cemeyer · 2015-11-09T18:11:18Z

Please don't pull this as-is! In particular, the first few commits are big hacks that worked around issues only present on FreeBSD.

For example, without "Hack NR_CPUS" I got some warning about an invalid range; "timeconst.bc" hacks around limitations in FreeBSD's bc(1) program by hardcoding 250 hz in the bc script; and "Build 64-bit" hardcodes CONFIG_64BIT (without which it seemed to try and build 32-bit on a 64-bit host).

Despite those kludges, there are some decent portability-improving patches in here that won't hurt Linux or NT support. And some fixes for compiler or other issues (e.g. statfs64 was missing a 3rd argument which resulted in -EINVAL on all calls).

With this patchset I have ext4 mounting r/w on FreeBSD w/ lklfuse. Neat!

…vocation

tavip · 2015-11-10T10:26:02Z

Awesome ! I will test and cherry pick most of the patches and rework some - "Build 64bit" can use the OUTPUT_FORMAT to detect that host is 64bit and enable 64bit build. Thanks !

cemeyer · 2015-11-10T15:44:55Z

Thanks!

tavip · 2015-11-11T13:38:26Z

Hi Conrad,

I have created a new branch (lkl_fbsd) which has your fixes. I've added your Signed-off-by: to your commits to adhere with the Linux process, hope that is ok.

I've also squashed a few commits and reworked some. Could you take a look and give it a try? Hope I didn't broke anything :)

Thanks,
Tavi

cemeyer · 2015-11-11T16:42:09Z

I've added your Signed-off-by: to your commits to adhere with the Linux process, hope that is ok.

Fine with me. (Edit: if it's not too difficult, can the Signed-off-by address be changed to cem@FreeBSD.org?)

I've also squashed a few commits and reworked some. Could you take a look and give it a try? Hope I didn't broke anything :)

I'll take a look, thanks.

Edit:

Re: "lkl: convert makefile echo \t to inline tab", it worked on FreeBSD :). If the inline tab works everywhere that's even better.

Re: "error: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Werror=unused-result]" (both of them) — I believe you can just cast the result to void.

(void)write(STDOUT_FILENO, str, len);

Instead of the unused stack variable.

Re: "lkl: add support for 64bit FreeBSD", excellent! It's way better than my hack, thanks!

The patches look good to me. I will try them out shortly (I suspect the bc script will still fail on FreeBSD, unfortunately). If you haven't already, I have some follow-up patches for lklfuse you could look at in https://github.com/cemeyer/linux/tree/cem/lklfuse . I don't have XFS or BTRFS actually mounting yet; currently the kernel seems to get wedged at

[    0.004000] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.004000] xor: measuring software checksum speed

cemeyer · 2015-11-11T16:56:31Z

A test build on lkl_fbsd branch:

...
  CHK     include/generated/timeconst.h
dc: not a string
  UPD     include/generated/timeconst.h
...
In file included from include/linux/jiffies.h:10:0,
                 from include/linux/ktime.h:25,
                 from include/linux/rcupdate.h:47,
                 from include/linux/srcu.h:33,
                 from include/linux/notifier.h:15,
                 from include/linux/memory_hotplug.h:6,
                 from include/linux/mmzone.h:812,
                 from include/linux/gfp.h:5,
                 from include/linux/kmod.h:22,
                 from include/linux/module.h:13,
                 from init/main.c:15:
include/generated/timeconst.h:11:2: error: #error "include/generated/timeconst.h has the wrong HZ value!"
 #error "include/generated/timeconst.h has the wrong HZ value!"
  ^
include/generated/timeconst.h:14:2: error: #error Totally bogus HZ value!
 #error Totally bogus HZ value!
  ^
include/generated/timeconst.h:15:1: error: expected identifier or '(' before numeric constant
 0
 ^
...

Yup, the bc script is still a problem. I think it may be easiest to port GNU bc to FreeBSD — the FreeBSD native bc is too limited for the needs of this script.

cemeyer · 2015-11-11T17:04:33Z

With just the BC hack from before and a second patch (-largp for fs2tar got dropped from cemeyer@8c2495a6), lkl_fbsd builds again: https://github.com/cemeyer/linux/tree/lkl_fbsd_plus Edit: And I can mount a USB ext4 under FUSE mostly fine. O_CREAT is broken without either handling it in open(2)or adding the FUSE create() method (I added it in my cem/lklfuse branch: cemeyer@ef78131).

tavip · 2015-11-11T18:58:31Z

Fine with me. (Edit: if it's not too difficult, can the Signed-off-by address be changed to >cem@FreeBSD.org?)

Yep, I'll do that. I guess I should also change the Author address, right?

Re: "error: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Werror=unused->result]" (both of them) — I believe you can just cast the result to void.

With the new compilers (at least on Ubuntu 14.04) the (void) cast trick does not work anymore apparently due to better optimizations :)

-largp for fs2tar got dropped from cemeyer/linux@8c2495a),

Oops, I'll fix that.

I suspect the bc script will still fail on FreeBSD, unfortunately

Not sure how to deal with this. Can we keep it separate for now until maybe we find a better solution?

I'll rebase the patches to address the above and force push to the same branch and way you say ok I will merge it to the main branch.

cemeyer · 2015-11-11T19:03:53Z

Yep, I'll do that. I guess I should also change the Author address, right?

Sure.

With the new compilers (at least on Ubuntu 14.04) the (void) cast trick does not work anymore apparently due to better optimizations :)

I think it is just GCC being very pedantic about warn_unused_result.

I suspect the bc script will still fail on FreeBSD, unfortunately

Not sure how to deal with this. Can we keep it separate for now until maybe we find a better solution?

Yes. I think I'll try and import GNU bc into FreeBSD ports. After that, we may want a patch that changes bc invocations to $(BC) and sets BC to bc on Linux and gbc on FreeBSD. I think that's a straightforward way of getting where we want.

I'll rebase the patches to address the above and force push to the same branch and way you say ok I will merge it to the main branch.

Great, thanks!

the driver uses libata's "tag" values from in various arrays. Since the mentioned patch bumped the ATA_TAG_INTERNAL to 32, the value of the SATA_DWC_QCMD_MAX needs to account for that. Otherwise ATA_TAG_INTERNAL usage cause similar crashes like this as reported by Tice Rex on the OpenWrt Forum and reproduced (with symbols) here: | BUG: Kernel NULL pointer dereference at 0x00000000 | Faulting instruction address: 0xc03ed4b8 | Oops: Kernel access of bad area, sig: 11 [#1] | BE PAGE_SIZE=4K PowerPC 44x Platform | CPU: 0 PID: 362 Comm: scsi_eh_1 Not tainted 5.4.163 #0 | NIP: c03ed4b8 LR: c03d27e8 CTR: c03ed36c | REGS: cfa59950 TRAP: 0300 Not tainted (5.4.163) | MSR: 00021000 <CE,ME> CR: 42000222 XER: 00000000 | DEAR: 00000000 ESR: 00000000 | GPR00: c03d27e8 cfa59a08 cfa55fe0 00000000 0fa46bc0 [...] | [..] | NIP [c03ed4b8] sata_dwc_qc_issue+0x14c/0x254 | LR [c03d27e8] ata_qc_issue+0x1c8/0x2dc | Call Trace: | [cfa59a08] [c003f4e] __cancel_work_timer+0x124/0x194 (unreliable) | [cfa59a78] [c03d27e8] ata_qc_issue+0x1c8/0x2dc | [cfa59a98] [c03d2b3c] ata_exec_internal_sg+0x240/0x524 | [cfa59b08] [c03d2e98] ata_exec_internal+0x78/0xe0 | [cfa59b58] [c03d30fc] ata_read_log_page.part.38+0x1dc/0x204 | [cfa59bc8] [c03d324] ata_identify_page_supported+0x68/0x130 | [...] This is because sata_dwc_dma_xfer_complete() NULLs the dma_pending's next neighbour "chan" (a *dma_chan struct) in this '32' case right here (line ~735): > hsdevp->dma_pending[tag] = SATA_DWC_DMA_PENDING_NONE; Then the next time, a dma gets issued; dma_dwc_xfer_setup() passes the NULL'd hsdevp->chan to the dmaengine_slave_config() which then causes the crash. With this patch, SATA_DWC_QCMD_MAX is now set to ATA_MAX_QUEUE + 1. This avoids the OOB. But please note, there was a worthwhile discussion on what ATA_TAG_INTERNAL and ATA_MAX_QUEUE is. And why there should not be a "fake" 33 command-long queue size. Ideally, the dw driver should account for the ATA_TAG_INTERNAL. In Damien Le Moal's words: "... having looked at the driver, it is a bigger change than just faking a 33rd "tag" that is in fact not a command tag at all." Fixes: 28361c4 ("libata: add extra internal command") Cc: stable@kernel.org # 4.18+ BugLink: openwrt/openwrt#9505 Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>

…date_bw [Why] Below general protection fault observed when WebGL Aquarium is run for longer duration. If drm debug logs are enabled and set to 0x1f then the issue is observed within 10 minutes of run. [ 100.717056] general protection fault, probably for non-canonical address 0x2d33302d32323032: 0000 [#1] PREEMPT SMP NOPTI [ 100.727921] CPU: 3 PID: 1906 Comm: DrmThread Tainted: G W 5.15.30 #12 d726c6a2d6ebe5cf9223931cbca6892f916fe18b [ 100.754419] RIP: 0010:CalculateSwathWidth+0x1f7/0x44f [ 100.767109] Code: 00 00 00 f2 42 0f 11 04 f0 48 8b 85 88 00 00 00 f2 42 0f 10 04 f0 48 8b 85 98 00 00 00 f2 42 0f 11 04 f0 48 8b 45 10 0f 57 c0 <f3> 42 0f 2a 04 b0 0f 57 c9 f3 43 0f 2a 0c b4 e8 8c e2 f3 ff 48 8b [ 100.781269] RSP: 0018:ffffa9230079eeb0 EFLAGS: 00010246 [ 100.812528] RAX: 2d33302d32323032 RBX: 0000000000000500 RCX: 0000000000000000 [ 100.819656] RDX: 0000000000000001 RSI: ffff99deb712c49c RDI: 0000000000000000 [ 100.826781] RBP: ffffa9230079ef50 R08: ffff99deb712460c R09: ffff99deb712462c [ 100.833907] R10: ffff99deb7124940 R11: ffff99deb7124d70 R12: ffff99deb712ae44 [ 100.841033] R13: 0000000000000001 R14: 0000000000000000 R15: ffffa9230079f0a0 [ 100.848159] FS: 00007af121212640(0000) GS:ffff99deba780000(0000) knlGS:0000000000000000 [ 100.856240] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 100.861980] CR2: 0000209000fe1000 CR3: 000000011b18c000 CR4: 0000000000350ee0 [ 100.869106] Call Trace: [ 100.871555] <TASK> [ 100.873655] ? asm_sysvec_reschedule_ipi+0x12/0x20 [ 100.878449] CalculateSwathAndDETConfiguration+0x1a3/0x6dd [ 100.883937] dml31_ModeSupportAndSystemConfigurationFull+0x2ce4/0x76da [ 100.890467] ? kallsyms_lookup_buildid+0xc8/0x163 [ 100.895173] ? kallsyms_lookup_buildid+0xc8/0x163 [ 100.899874] ? __sprint_symbol+0x80/0x135 [ 100.903883] ? dm_update_plane_state+0x3f9/0x4d2 [ 100.908500] ? symbol_string+0xb7/0xde [ 100.912250] ? number+0x145/0x29b [ 100.915566] ? vsnprintf+0x341/0x5ff [ 100.919141] ? desc_read_finalized_seq+0x39/0x87 [ 100.923755] ? update_load_avg+0x1b9/0x607 [ 100.927849] ? compute_mst_dsc_configs_for_state+0x7d/0xd5b [ 100.933416] ? fetch_pipe_params+0xa4d/0xd0c [ 100.937686] ? dc_fpu_end+0x3d/0xa8 [ 100.941175] dml_get_voltage_level+0x16b/0x180 [ 100.945619] dcn30_internal_validate_bw+0x10e/0x89b [ 100.950495] ? dcn31_validate_bandwidth+0x68/0x1fc [ 100.955285] ? resource_build_scaling_params+0x98b/0xb8c [ 100.960595] ? dcn31_validate_bandwidth+0x68/0x1fc [ 100.965384] dcn31_validate_bandwidth+0x9a/0x1fc [ 100.970001] dc_validate_global_state+0x238/0x295 [ 100.974703] amdgpu_dm_atomic_check+0x9c1/0xbce [ 100.979235] ? _printk+0x59/0x73 [ 100.982467] drm_atomic_check_only+0x403/0x78b [ 100.986912] drm_mode_atomic_ioctl+0x49b/0x546 [ 100.991358] ? drm_ioctl+0x1c1/0x3b3 [ 100.994936] ? drm_atomic_set_property+0x92a/0x92a [ 100.999725] drm_ioctl_kernel+0xdc/0x149 [ 101.003648] drm_ioctl+0x27f/0x3b3 [ 101.007051] ? drm_atomic_set_property+0x92a/0x92a [ 101.011842] amdgpu_drm_ioctl+0x49/0x7d [ 101.015679] __se_sys_ioctl+0x7c/0xb8 [ 101.015685] do_syscall_64+0x5f/0xb8 [ 101.015690] ? __irq_exit_rcu+0x34/0x96 [How] It calles populate_dml_pipes which uses doubles to initialize. Adding FPU protection avoids context switch and probable loss of vba context as there is potential contention while drm debug logs are enabled. Signed-off-by: CHANDAN VURDIGERE NATARAJ <chandan.vurdigerenataraj@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org

OF framebuffers do not have an underlying device in the Linux device hierarchy. Do a regular unregister call instead of hot unplugging such a non-existing device. Fixes a NULL dereference. An example error message on ppc64le is shown below. BUG: Kernel NULL pointer dereference on read at 0x00000060 Faulting instruction address: 0xc00000000080dfa4 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [...] CPU: 2 PID: 139 Comm: systemd-udevd Not tainted 5.17.0-ae085d7f9365 #1 NIP: c00000000080dfa4 LR: c00000000080df9c CTR: c000000000797430 REGS: c000000004132fe0 TRAP: 0300 Not tainted (5.17.0-ae085d7f9365) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 28228282 XER: 20000000 CFAR: c00000000000c80c DAR: 0000000000000060 DSISR: 40000000 IRQMASK: 0 GPR00: c00000000080df9c c000000004133280 c00000000169d200 0000000000000029 GPR04: 00000000ffffefff c000000004132f90 c000000004132f88 0000000000000000 GPR08: c0000000015658f8 c0000000015cd200 c0000000014f57d0 0000000048228283 GPR12: 0000000000000000 c00000003fffe300 0000000020000000 0000000000000000 GPR16: 0000000000000000 0000000113fc4a40 0000000000000005 0000000113fcfb80 GPR20: 000001000f7283b0 0000000000000000 c000000000e4a588 c000000000e4a5b0 GPR24: 0000000000000001 00000000000a0000 c008000000db0168 c0000000021f6ec0 GPR28: c0000000016d65a8 c000000004b36460 0000000000000000 c0000000016d64b0 NIP [c00000000080dfa4] do_remove_conflicting_framebuffers+0x184/0x1d0 [c000000004133280] [c00000000080df9c] do_remove_conflicting_framebuffers+0x17c/0x1d0 (unreliable) [c000000004133350] [c00000000080e4d0] remove_conflicting_framebuffers+0x60/0x150 [c0000000041333a0] [c00000000080e6f4] remove_conflicting_pci_framebuffers+0x134/0x1b0 [c000000004133450] [c008000000e70438] drm_aperture_remove_conflicting_pci_framebuffers+0x90/0x100 [drm] [c000000004133490] [c008000000da0ce4] bochs_pci_probe+0x6c/0xa64 [bochs] [...] [c000000004133db0] [c00000000002aaa0] system_call_exception+0x170/0x2d0 [c000000004133e10] [c00000000000c3cc] system_call_common+0xec/0x250 The bug [1] was introduced by commit 27599aa ("fbdev: Hot-unplug firmware fb devices on forced removal"). Most firmware framebuffers have an underlying platform device, which can be hot-unplugged before loading the native graphics driver. OF framebuffers do not (yet) have that device. Fix the code by unregistering the framebuffer as before without a hot unplug. Tested with 5.17 on qemu ppc64le emulation. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 27599aa ("fbdev: Hot-unplug firmware fb devices on forced removal") Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Cc: Zack Rusin <zackr@vmware.com> Cc: Javier Martinez Canillas <javierm@redhat.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: stable@vger.kernel.org # v5.11+ Cc: Helge Deller <deller@gmx.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Zheyu Ma <zheyuma97@gmail.com> Cc: Xiyu Yang <xiyuyang19@fudan.edu.cn> Cc: Zhen Lei <thunder.leizhen@huawei.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Guenter Roeck <linux@roeck-us.net> Cc: linux-fbdev@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Link: https://lore.kernel.org/all/YkHXO6LGHAN0p1pq@debian/ # [1] Link: https://patchwork.freedesktop.org/patch/msgid/20220404194402.29974-1-tzimmermann@suse.de

The following crash was reported: [ 1950.279393] list_del corruption, ffff99560d485790->next is NULL [ 1950.279400] ------------[ cut here ]------------ [ 1950.279401] kernel BUG at lib/list_debug.c:49! [ 1950.279405] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 1950.279407] CPU: 11 PID: 5886 Comm: modprobe Tainted: G O 6.2.8_1 #1 [ 1950.279409] Hardware name: Gigabyte Technology Co., Ltd. B550M AORUS PRO-P/B550M AORUS PRO-P, BIOS F15c 05/11/2022 [ 1950.279410] RIP: 0010:__list_del_entry_valid+0x59/0xc0 [ 1950.279415] Code: 48 8b 01 48 39 f8 75 5a 48 8b 72 08 48 39 c6 75 65 b8 01 00 00 00 c3 cc cc cc cc 48 89 fe 48 c7 c7 08 a8 13 9e e8 b7 0a bc ff <0f> 0b 48 89 fe 48 c7 c7 38 a8 13 9e e8 a6 0a bc ff 0f 0b 48 89 fe [ 1950.279416] RSP: 0018:ffffa96d05647e08 EFLAGS: 00010246 [ 1950.279418] RAX: 0000000000000033 RBX: ffff99560d485750 RCX: 0000000000000000 [ 1950.279419] RDX: 0000000000000000 RSI: ffffffff9e107c59 RDI: 00000000ffffffff [ 1950.279420] RBP: ffffffffc19c5168 R08: 0000000000000000 R09: ffffa96d05647cc8 [ 1950.279421] R10: 0000000000000003 R11: ffffffff9ea2a568 R12: 0000000000000000 [ 1950.279422] R13: ffff99560140a2e0 R14: ffff99560127d2e0 R15: 0000000000000000 [ 1950.279422] FS: 00007f67da795380(0000) GS:ffff995d1f0c0000(0000) knlGS:0000000000000000 [ 1950.279424] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1950.279424] CR2: 00007f67da7e65c0 CR3: 00000001feed2000 CR4: 0000000000750ee0 [ 1950.279426] PKRU: 55555554 [ 1950.279426] Call Trace: [ 1950.279428] <TASK> [ 1950.279430] hwrng_unregister+0x28/0xe0 [rng_core] [ 1950.279436] tpm_chip_unregister+0xd5/0xf0 [tpm] Add the forgotten !tpm_amd_is_rng_defective() invariant to the hwrng_unregister() call site inside tpm_chip_unregister(). Cc: stable@vger.kernel.org Reported-by: Martin Dimov <martin@dmarto.com> Link: https://lore.kernel.org/linux-integrity/3d1d7e9dbfb8c96125bc93b6b58b90a7@dmarto.com/ Fixes: f1324bb ("tpm: disable hwrng for fTPM on some AMD designs") Fixes: b006c43 ("hwrng: core - start hwrng kthread also for untrusted sources") Tested-by: Martin Dimov <martin@dmarto.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

When ufshcd_err_handler() is executed, CQ event interrupt can enter waiting for the same lock. This can happen in ufshcd_handle_mcq_cq_events() and also in ufs_mtk_mcq_intr(). The following warning message will be generated when &hwq->cq_lock is used in IRQ context with IRQ enabled. Use ufshcd_mcq_poll_cqe_lock() with spin_lock_irqsave instead of spin_lock to resolve the deadlock issue. [name:lockdep&]WARNING: inconsistent lock state [name:lockdep&]-------------------------------- [name:lockdep&]inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. [name:lockdep&]kworker/u16:4/260 [HC0[0]:SC0[0]:HE1:SE1] takes: ffffff8028444600 (&hwq->cq_lock){?.-.}-{2:2}, at: ufshcd_mcq_poll_cqe_lock+0x30/0xe0 [name:lockdep&]{IN-HARDIRQ-W} state was registered at: lock_acquire+0x17c/0x33c _raw_spin_lock+0x5c/0x7c ufshcd_mcq_poll_cqe_lock+0x30/0xe0 ufs_mtk_mcq_intr+0x60/0x1bc [ufs_mediatek_mod] __handle_irq_event_percpu+0x140/0x3ec handle_irq_event+0x50/0xd8 handle_fasteoi_irq+0x148/0x2b0 generic_handle_domain_irq+0x4c/0x6c gic_handle_irq+0x58/0x134 call_on_irq_stack+0x40/0x74 do_interrupt_handler+0x84/0xe4 el1_interrupt+0x3c/0x78 <snip> Possible unsafe locking scenario: CPU0 ---- lock(&hwq->cq_lock); <Interrupt> lock(&hwq->cq_lock); *** DEADLOCK *** 2 locks held by kworker/u16:4/260: [name:lockdep&] stack backtrace: CPU: 7 PID: 260 Comm: kworker/u16:4 Tainted: G S W OE 6.1.17-mainline-android14-2-g277223301adb #1 Workqueue: ufs_eh_wq_0 ufshcd_err_handler Call trace: dump_backtrace+0x10c/0x160 show_stack+0x20/0x30 dump_stack_lvl+0x98/0xd8 dump_stack+0x20/0x60 print_usage_bug+0x584/0x76c mark_lock_irq+0x488/0x510 mark_lock+0x1ec/0x25c __lock_acquire+0x4d8/0xffc lock_acquire+0x17c/0x33c _raw_spin_lock+0x5c/0x7c ufshcd_mcq_poll_cqe_lock+0x30/0xe0 ufshcd_poll+0x68/0x1b0 ufshcd_transfer_req_compl+0x9c/0xc8 ufshcd_err_handler+0x3bc/0xea0 process_one_work+0x2f4/0x7e8 worker_thread+0x234/0x450 kthread+0x110/0x134 ret_from_fork+0x10/0x20 Fixes: ed97506 ("scsi: ufs: core: mcq: Add completion support in poll") Reviewed-by: Can Guo <quic_cang@quicinc.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Alice Chao <alice.chao@mediatek.com> Link: https://lore.kernel.org/r/20230424080400.8955-1-alice.chao@mediatek.com Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Commit 4fe8158 ("ixgbe: let the xdpdrv work with more than 64 cpus") adds support to allow XDP programs to run on systems with more than 64 CPUs by locking the XDP TX rings and indexing them using cpu % 64 (IXGBE_MAX_XDP_QS). Upon trying this out patch on a system with more than 64 cores, the kernel paniced with an array-index-out-of-bounds at the return in ixgbe_determine_xdp_ring in ixgbe.h, which means ixgbe_determine_xdp_q_idx was just returning the cpu instead of cpu % IXGBE_MAX_XDP_QS. An example splat: ========================================================================== UBSAN: array-index-out-of-bounds in /var/lib/dkms/ixgbe/5.18.6+focal-1/build/src/ixgbe.h:1147:26 index 65 is out of range for type 'ixgbe_ring *[64]' ========================================================================== BUG: kernel NULL pointer dereference, address: 0000000000000058 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 65 PID: 408 Comm: ksoftirqd/65 Tainted: G IOE 5.15.0-48-generic #54~20.04.1-Ubuntu Hardware name: Dell Inc. PowerEdge R640/0W23H8, BIOS 2.5.4 01/13/2020 RIP: 0010:ixgbe_xmit_xdp_ring+0x1b/0x1c0 [ixgbe] Code: 3b 52 d4 cf e9 42 f2 ff ff 66 0f 1f 44 00 00 0f 1f 44 00 00 55 b9 00 00 00 00 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 08 <44> 0f b7 47 58 0f b7 47 5a 0f b7 57 54 44 0f b7 76 08 66 41 39 c0 RSP: 0018:ffffbc3fcd88fcb0 EFLAGS: 00010282 RAX: ffff92a253260980 RBX: ffffbc3fe68b00a0 RCX: 0000000000000000 RDX: ffff928b5f659000 RSI: ffff928b5f659000 RDI: 0000000000000000 RBP: ffffbc3fcd88fce0 R08: ffff92b9dfc20580 R09: 0000000000000001 R10: 3d3d3d3d3d3d3d3d R11: 3d3d3d3d3d3d3d3d R12: 0000000000000000 R13: ffff928b2f0fa8c0 R14: ffff928b9be20050 R15: 000000000000003c FS: 0000000000000000(0000) GS:ffff92b9dfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000058 CR3: 000000011dd6a002 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ixgbe_poll+0x103e/0x1280 [ixgbe] ? sched_clock_cpu+0x12/0xe0 __napi_poll+0x30/0x160 net_rx_action+0x11c/0x270 __do_softirq+0xda/0x2ee run_ksoftirqd+0x2f/0x50 smpboot_thread_fn+0xb7/0x150 ? sort_range+0x30/0x30 kthread+0x127/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x1f/0x30 </TASK> I think this is how it happens: Upon loading the first XDP program on a system with more than 64 CPUs, ixgbe_xdp_locking_key is incremented in ixgbe_xdp_setup. However, immediately after this, the rings are reconfigured by ixgbe_setup_tc. ixgbe_setup_tc calls ixgbe_clear_interrupt_scheme which calls ixgbe_free_q_vectors which calls ixgbe_free_q_vector in a loop. ixgbe_free_q_vector decrements ixgbe_xdp_locking_key once per call if it is non-zero. Commenting out the decrement in ixgbe_free_q_vector stopped my system from panicing. I suspect to make the original patch work, I would need to load an XDP program and then replace it in order to get ixgbe_xdp_locking_key back above 0 since ixgbe_setup_tc is only called when transitioning between XDP and non-XDP ring configurations, while ixgbe_xdp_locking_key is incremented every time ixgbe_xdp_setup is called. Also, ixgbe_setup_tc can be called via ethtool --set-channels, so this becomes another path to decrement ixgbe_xdp_locking_key to 0 on systems with more than 64 CPUs. Since ixgbe_xdp_locking_key only protects the XDP_TX path and is tied to the number of CPUs present, there is no reason to disable it upon unloading an XDP program. To avoid confusion, I have moved enabling ixgbe_xdp_locking_key into ixgbe_sw_init, which is part of the probe path. Fixes: 4fe8158 ("ixgbe: let the xdpdrv work with more than 64 cpus") Signed-off-by: John Hickey <jjh@daedalian.us> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230425170308.2522429-1-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>

The intel_idle_init_cstates_icpu() function includes a loop that iterates over every C-state. Inside the loop, the same C-state data is referenced 2 ways: 1. as cpuidle_state_table[cstate] 2. as drv->states[drv->state_count] (but it is a copy of #1, not the same object). Make the code be more consistent and easier to read by using only the 2nd way. So the code structure would be as follows: 1. Use cpuidle_state_table[cstate] 2. Copy cpuidle_state_table[cstate] to drv->states[drv->state_count] 3. Use only drv->states[drv->state_count] from this point. Note, this change introduces a checkpatch.pl warning (too long line), but it will be addressed in the next patch. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Hayes Wang says: ==================== r8152: fix 2.5G devices v3: For patch #2, modify the comment. v2: For patch #1, Remove inline for fc_pause_on_auto() and fc_pause_off_auto(), and update the commit message. For patch #2, define the magic value for OCP register 0xa424. v1: These patches are used to fix some issues of RTL8156. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>

When booting with 'kasan.vmalloc=off', a kernel configured with support for KASAN_HW_TAGS will explode at boot time due to bogus use of virt_to_page() on a vmalloc adddress. With CONFIG_DEBUG_VIRTUAL selected this will be reported explicitly, and with or without CONFIG_DEBUG_VIRTUAL the kernel will dereference a bogus address: | ------------[ cut here ]------------ | virt_to_phys used for non-linear address: (____ptrval____) (0xffff800008000000) | WARNING: CPU: 0 PID: 0 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x78/0x80 | Modules linked in: | CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.3.0-rc3-00073-g83865133300d-dirty #4 | Hardware name: linux,dummy-virt (DT) | pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : __virt_to_phys+0x78/0x80 | lr : __virt_to_phys+0x78/0x80 | sp : ffffcd076afd3c80 | x29: ffffcd076afd3c80 x28: 0068000000000f07 x27: ffff800008000000 | x26: fffffbfff0000000 x25: fffffbffff000000 x24: ff00000000000000 | x23: ffffcd076ad3c000 x22: fffffc0000000000 x21: ffff800008000000 | x20: ffff800008004000 x19: ffff800008000000 x18: ffff800008004000 | x17: 666678302820295f x16: ffffffffffffffff x15: 0000000000000004 | x14: ffffcd076b009e88 x13: 0000000000000fff x12: 0000000000000003 | x11: 00000000ffffefff x10: c0000000ffffefff x9 : 0000000000000000 | x8 : 0000000000000000 x7 : 205d303030303030 x6 : 302e30202020205b | x5 : ffffcd076b41d63f x4 : ffffcd076afd3827 x3 : 0000000000000000 | x2 : 0000000000000000 x1 : ffffcd076afd3a30 x0 : 000000000000004f | Call trace: | __virt_to_phys+0x78/0x80 | __kasan_unpoison_vmalloc+0xd4/0x478 | __vmalloc_node_range+0x77c/0x7b8 | __vmalloc_node+0x54/0x64 | init_IRQ+0x94/0xc8 | start_kernel+0x194/0x420 | __primary_switched+0xbc/0xc4 | ---[ end trace 0000000000000000 ]--- | Unable to handle kernel paging request at virtual address 03fffacbe27b8000 | Mem abort info: | ESR = 0x0000000096000004 | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | FSC = 0x04: level 0 translation fault | Data abort info: | ISV = 0, ISS = 0x00000004 | CM = 0, WnR = 0 | swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041bc5000 | [03fffacbe27b8000] pgd=0000000000000000, p4d=0000000000000000 | Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP | Modules linked in: | CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.3.0-rc3-00073-g83865133300d-dirty #4 | Hardware name: linux,dummy-virt (DT) | pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : __kasan_unpoison_vmalloc+0xe4/0x478 | lr : __kasan_unpoison_vmalloc+0xd4/0x478 | sp : ffffcd076afd3ca0 | x29: ffffcd076afd3ca0 x28: 0068000000000f07 x27: ffff800008000000 | x26: 0000000000000000 x25: 03fffacbe27b8000 x24: ff00000000000000 | x23: ffffcd076ad3c000 x22: fffffc0000000000 x21: ffff800008000000 | x20: ffff800008004000 x19: ffff800008000000 x18: ffff800008004000 | x17: 666678302820295f x16: ffffffffffffffff x15: 0000000000000004 | x14: ffffcd076b009e88 x13: 0000000000000fff x12: 0000000000000001 | x11: 0000800008000000 x10: ffff800008000000 x9 : ffffb2f8dee00000 | x8 : 000ffffb2f8dee00 x7 : 205d303030303030 x6 : 302e30202020205b | x5 : ffffcd076b41d63f x4 : ffffcd076afd3827 x3 : 0000000000000000 | x2 : 0000000000000000 x1 : ffffcd076afd3a30 x0 : ffffb2f8dee00000 | Call trace: | __kasan_unpoison_vmalloc+0xe4/0x478 | __vmalloc_node_range+0x77c/0x7b8 | __vmalloc_node+0x54/0x64 | init_IRQ+0x94/0xc8 | start_kernel+0x194/0x420 | __primary_switched+0xbc/0xc4 | Code: d34cfc08 aa1f03fa 8b081b39 d503201f (f9400328) | ---[ end trace 0000000000000000 ]--- | Kernel panic - not syncing: Attempted to kill the idle task! This is because init_vmalloc_pages() erroneously calls virt_to_page() on a vmalloc address, while virt_to_page() is only valid for addresses in the linear/direct map. Since init_vmalloc_pages() expects virtual addresses in the vmalloc range, it must use vmalloc_to_page() rather than virt_to_page(). We call init_vmalloc_pages() from __kasan_unpoison_vmalloc(), where we check !is_vmalloc_or_module_addr(), suggesting that we might encounter a non-vmalloc address. Luckily, this never happens. By design, we only call __kasan_unpoison_vmalloc() on pointers in the vmalloc area, and I have verified that we don't violate that expectation. Given that, is_vmalloc_or_module_addr() must always be true for any legitimate argument to __kasan_unpoison_vmalloc(). Correct init_vmalloc_pages() to use vmalloc_to_page(), and remove the redundant and misleading use of is_vmalloc_or_module_addr() in __kasan_unpoison_vmalloc(). Link: https://lkml.kernel.org/r/20230418164212.1775741-1-mark.rutland@arm.com Fixes: 6c2f761 ("kasan: fix zeroing vmalloc memory with HW_TAGS") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Marco Elver <elver@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Upon physical link change, firmware reports to the kernel about the change along with the details like speed, lmac_type_id, etc. Kernel derives lmac_type based on lmac_type_id received from firmware. In a few scenarios, firmware returns an invalid lmac_type_id, which is resulting in below kernel panic. This patch adds the missing validation of the lmac_type_id field. Internal error: Oops: 96000005 [#1] PREEMPT SMP [ 35.321595] Modules linked in: [ 35.328982] CPU: 0 PID: 31 Comm: kworker/0:1 Not tainted 5.4.210-g2e3169d8e1bc-dirty #17 [ 35.337014] Hardware name: Marvell CN103XX board (DT) [ 35.344297] Workqueue: events work_for_cpu_fn [ 35.352730] pstate: 40400089 (nZcv daIf +PAN -UAO) [ 35.360267] pc : strncpy+0x10/0x30 [ 35.366595] lr : cgx_link_change_handler+0x90/0x180 Fixes: 61071a8 ("octeontx2-af: Forward CGX link notifications to PFs") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>

Sai Krishna says: ==================== octeontx2: Miscellaneous fixes This patchset includes following fixes. Patch #1 Fix for the race condition while updating APR table Patch #2 Fix end bit position in NPC scan config Patch #3 Fix depth of CAM, MEM table entries Patch #4 Fix in increase the size of DMAC filter flows Patch #5 Fix driver crash resulting from invalid interface type information retrieved from firmware Patch #6 Fix incorrect mask used while installing filters involving fragmented packets Patch #7 Fixes for NPC field hash extract w.r.t IPV6 hash reduction, IPV6 filed hash configuration. Patch #8 Fix for NPC hardware parser configuration destination address hash, IPV6 endianness issues. Patch #9 Fix for skipping mbox initialization for PFs disabled by firmware. Patch #10 Fix disabling packet I/O in case of mailbox timeout. Patch #11 Fix detaching LF resources in case of VF probe fail. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>

syzkaller reported a warning below [0]. We can reproduce it by sending 0-byte data from the (AF_PACKET, SOCK_PACKET) socket via some devices whose dev->hard_header_len is 0. struct sockaddr_pkt addr = { .spkt_family = AF_PACKET, .spkt_device = "tun0", }; int fd; fd = socket(AF_PACKET, SOCK_PACKET, 0); sendto(fd, NULL, 0, 0, (struct sockaddr *)&addr, sizeof(addr)); We have a similar fix for the (AF_PACKET, SOCK_RAW) socket as commit dc63370 ("net/af_packet: check len when min_header_len equals to 0"). Let's add the same test for the SOCK_PACKET socket. [0]: skb_assert_len WARNING: CPU: 1 PID: 19945 at include/linux/skbuff.h:2552 skb_assert_len include/linux/skbuff.h:2552 [inline] WARNING: CPU: 1 PID: 19945 at include/linux/skbuff.h:2552 __dev_queue_xmit+0x1f26/0x31d0 net/core/dev.c:4159 Modules linked in: CPU: 1 PID: 19945 Comm: syz-executor.0 Not tainted 6.3.0-rc7-02330-gca6270c12e20 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:skb_assert_len include/linux/skbuff.h:2552 [inline] RIP: 0010:__dev_queue_xmit+0x1f26/0x31d0 net/core/dev.c:4159 Code: 89 de e8 1d a2 85 fd 84 db 75 21 e8 64 a9 85 fd 48 c7 c6 80 2a 1f 86 48 c7 c7 c0 06 1f 86 c6 05 23 cf 27 04 01 e8 fa ee 56 fd <0f> 0b e8 43 a9 85 fd 0f b6 1d 0f cf 27 04 31 ff 89 de e8 e3 a1 85 RSP: 0018:ffff8880217af6e0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc90001133000 RDX: 0000000000040000 RSI: ffffffff81186922 RDI: 0000000000000001 RBP: ffff8880217af8b0 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff888030045640 R13: ffff8880300456b0 R14: ffff888030045650 R15: ffff888030045718 FS: 00007fc5864da640(0000) GS:ffff88806cd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020005740 CR3: 000000003f856003 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> dev_queue_xmit include/linux/netdevice.h:3085 [inline] packet_sendmsg_spkt+0xc4b/0x1230 net/packet/af_packet.c:2066 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0x1b4/0x200 net/socket.c:747 ____sys_sendmsg+0x331/0x970 net/socket.c:2503 ___sys_sendmsg+0x11d/0x1c0 net/socket.c:2557 __sys_sendmmsg+0x18c/0x430 net/socket.c:2643 __do_sys_sendmmsg net/socket.c:2672 [inline] __se_sys_sendmmsg net/socket.c:2669 [inline] __x64_sys_sendmmsg+0x9c/0x100 net/socket.c:2669 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3c/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x72/0xdc RIP: 0033:0x7fc58791de5d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48 RSP: 002b:00007fc5864d9cc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007fc58791de5d RDX: 0000000000000001 RSI: 0000000020005740 RDI: 0000000000000004 RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007fc58797e530 R15: 0000000000000000 </TASK> ---[ end trace 0000000000000000 ]--- skb len=0 headroom=16 headlen=0 tailroom=304 mac=(16,0) net=(16,-1) trans=-1 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0 dev name=sit0 feat=0x00000006401d7869 sk family=17 type=10 proto=0 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>

The cited commit moved idr initialization too early in fl_change() which allows concurrent users to access the filter that is still being initialized and is in inconsistent state, which, in turn, can cause NULL pointer dereference [0]. Since there is no obvious way to fix the ordering without reverting the whole cited commit, alternative approach taken to first insert NULL pointer into idr in order to allocate the handle but still cause fl_get() to return NULL and prevent concurrent users from seeing the filter while providing miss-to-action infrastructure with valid handle id early in fl_change(). [ 152.434728] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN [ 152.436163] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] [ 152.437269] CPU: 4 PID: 3877 Comm: tc Not tainted 6.3.0-rc4+ #5 [ 152.438110] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 152.439644] RIP: 0010:fl_dump_key+0x8b/0x1d10 [cls_flower] [ 152.440461] Code: 01 f2 02 f2 c7 40 08 04 f2 04 f2 c7 40 0c 04 f3 f3 f3 65 48 8b 04 25 28 00 00 00 48 89 84 24 00 01 00 00 48 89 c8 48 c1 e8 03 <0f> b6 04 10 84 c0 74 08 3c 03 0f 8e 98 19 00 00 8b 13 85 d2 74 57 [ 152.442885] RSP: 0018:ffff88817a28f158 EFLAGS: 00010246 [ 152.443851] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 152.444826] RDX: dffffc0000000000 RSI: ffffffff8500ae80 RDI: ffff88810a987900 [ 152.445791] RBP: ffff888179d88240 R08: ffff888179d8845c R09: ffff888179d88240 [ 152.446780] R10: ffffed102f451e48 R11: 00000000fffffff2 R12: ffff88810a987900 [ 152.447741] R13: ffffffff8500ae80 R14: ffff88810a987900 R15: ffff888149b3c738 [ 152.448756] FS: 00007f5eb2a34800(0000) GS:ffff88881ec00000(0000) knlGS:0000000000000000 [ 152.449888] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 152.450685] CR2: 000000000046ad19 CR3: 000000010b0bd006 CR4: 0000000000370ea0 [ 152.451641] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 152.452628] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 152.453588] Call Trace: [ 152.454032] <TASK> [ 152.454447] ? netlink_sendmsg+0x7a1/0xcb0 [ 152.455109] ? sock_sendmsg+0xc5/0x190 [ 152.455689] ? ____sys_sendmsg+0x535/0x6b0 [ 152.456320] ? ___sys_sendmsg+0xeb/0x170 [ 152.456916] ? do_syscall_64+0x3d/0x90 [ 152.457529] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 152.458321] ? ___sys_sendmsg+0xeb/0x170 [ 152.458958] ? __sys_sendmsg+0xb5/0x140 [ 152.459564] ? do_syscall_64+0x3d/0x90 [ 152.460122] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 152.460852] ? fl_dump_key_options.part.0+0xea0/0xea0 [cls_flower] [ 152.461710] ? _raw_spin_lock+0x7a/0xd0 [ 152.462299] ? _raw_read_lock_irq+0x30/0x30 [ 152.462924] ? nla_put+0x15e/0x1c0 [ 152.463480] fl_dump+0x228/0x650 [cls_flower] [ 152.464112] ? fl_tmplt_dump+0x210/0x210 [cls_flower] [ 152.464854] ? __kmem_cache_alloc_node+0x1a7/0x330 [ 152.465592] ? nla_put+0x15e/0x1c0 [ 152.466160] tcf_fill_node+0x515/0x9a0 [ 152.466766] ? tc_setup_offload_action+0xf0/0xf0 [ 152.467463] ? __alloc_skb+0x13c/0x2a0 [ 152.468067] ? __build_skb_around+0x330/0x330 [ 152.468814] ? fl_get+0x107/0x1a0 [cls_flower] [ 152.469503] tc_del_tfilter+0x718/0x1330 [ 152.470115] ? is_bpf_text_address+0xa/0x20 [ 152.470765] ? tc_ctl_chain+0xee0/0xee0 [ 152.471335] ? __kernel_text_address+0xe/0x30 [ 152.471948] ? unwind_get_return_address+0x56/0xa0 [ 152.472639] ? __thaw_task+0x150/0x150 [ 152.473218] ? arch_stack_walk+0x98/0xf0 [ 152.473839] ? __stack_depot_save+0x35/0x4c0 [ 152.474501] ? stack_trace_save+0x91/0xc0 [ 152.475119] ? security_capable+0x51/0x90 [ 152.475741] rtnetlink_rcv_msg+0x2c1/0x9d0 [ 152.476387] ? rtnl_calcit.isra.0+0x2b0/0x2b0 [ 152.477042] ? __sys_sendmsg+0xb5/0x140 [ 152.477664] ? do_syscall_64+0x3d/0x90 [ 152.478255] ? entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 152.479010] ? __stack_depot_save+0x35/0x4c0 [ 152.479679] ? __stack_depot_save+0x35/0x4c0 [ 152.480346] netlink_rcv_skb+0x12c/0x360 [ 152.480929] ? rtnl_calcit.isra.0+0x2b0/0x2b0 [ 152.481517] ? do_syscall_64+0x3d/0x90 [ 152.482061] ? netlink_ack+0x1550/0x1550 [ 152.482612] ? rhashtable_walk_peek+0x170/0x170 [ 152.483262] ? kmem_cache_alloc_node+0x1af/0x390 [ 152.483875] ? _copy_from_iter+0x3d6/0xc70 [ 152.484528] netlink_unicast+0x553/0x790 [ 152.485168] ? netlink_attachskb+0x6a0/0x6a0 [ 152.485848] ? unwind_next_frame+0x11cc/0x1a10 [ 152.486538] ? arch_stack_walk+0x61/0xf0 [ 152.487169] netlink_sendmsg+0x7a1/0xcb0 [ 152.487799] ? netlink_unicast+0x790/0x790 [ 152.488355] ? iovec_from_user.part.0+0x4d/0x220 [ 152.488990] ? _raw_spin_lock+0x7a/0xd0 [ 152.489598] ? netlink_unicast+0x790/0x790 [ 152.490236] sock_sendmsg+0xc5/0x190 [ 152.490796] ____sys_sendmsg+0x535/0x6b0 [ 152.491394] ? import_iovec+0x7/0x10 [ 152.491964] ? kernel_sendmsg+0x30/0x30 [ 152.492561] ? __copy_msghdr+0x3c0/0x3c0 [ 152.493160] ? do_syscall_64+0x3d/0x90 [ 152.493706] ___sys_sendmsg+0xeb/0x170 [ 152.494283] ? may_open_dev+0xd0/0xd0 [ 152.494858] ? copy_msghdr_from_user+0x110/0x110 [ 152.495541] ? __handle_mm_fault+0x2678/0x4ad0 [ 152.496205] ? copy_page_range+0x2360/0x2360 [ 152.496862] ? __fget_light+0x57/0x520 [ 152.497449] ? mas_find+0x1c0/0x1c0 [ 152.498026] ? sockfd_lookup_light+0x1a/0x140 [ 152.498703] __sys_sendmsg+0xb5/0x140 [ 152.499306] ? __sys_sendmsg_sock+0x20/0x20 [ 152.499951] ? do_user_addr_fault+0x369/0xd80 [ 152.500595] do_syscall_64+0x3d/0x90 [ 152.501185] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 152.501917] RIP: 0033:0x7f5eb294f887 [ 152.502494] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 152.505008] RSP: 002b:00007ffd2c708f78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 152.506152] RAX: ffffffffffffffda RBX: 00000000642d9472 RCX: 00007f5eb294f887 [ 152.507134] RDX: 0000000000000000 RSI: 00007ffd2c708fe0 RDI: 0000000000000003 [ 152.508113] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [ 152.509119] R10: 00007f5eb2808708 R11: 0000000000000246 R12: 0000000000000001 [ 152.510068] R13: 0000000000000000 R14: 00007ffd2c70d1b8 R15: 0000000000485400 [ 152.511031] </TASK> [ 152.511444] Modules linked in: cls_flower sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter overlay zram zsmalloc fuse [last unloaded: mlx5_core] [ 152.515720] ---[ end trace 0000000000000000 ]--- Fixes: 08a0063 ("net/sched: flower: Move filter handle initialization earlier") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>

The second parameter of stmmac_pltfr_init() needs the pointer of "struct plat_stmmacenet_data". So, correct the parameter typo when calling the function. Otherwise, it may cause this alignment exception when doing suspend/resume. [ 49.067201] CPU1 is up [ 49.135258] Internal error: SP/PC alignment exception: 000000008a000000 [lkl#1] PREEMPT SMP [ 49.143346] Modules linked in: soc_imx9 crct10dif_ce polyval_ce nvmem_imx_ocotp_fsb_s400 polyval_generic layerscape_edac_mod snd_soc_fsl_asoc_card snd_soc_imx_audmux snd_soc_imx_card snd_soc_wm8962 el_enclave snd_soc_fsl_micfil rtc_pcf2127 rtc_pcf2131 flexcan can_dev snd_soc_fsl_xcvr snd_soc_fsl_sai imx8_media_dev(C) snd_soc_fsl_utils fuse [ 49.173393] CPU: 0 PID: 565 Comm: sh Tainted: G C 6.5.0-rc4-next-20230804-05047-g5781a6249dae torvalds#677 [ 49.183721] Hardware name: NXP i.MX93 11X11 EVK board (DT) [ 49.189190] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 49.196140] pc : 0x80800052 [ 49.198931] lr : stmmac_pltfr_resume+0x34/0x50 [ 49.203368] sp : ffff800082f8bab0 [ 49.206670] x29: ffff800082f8bab0 x28: ffff0000047d0ec0 x27: ffff80008186c170 [ 49.213794] x26: 0000000b5e4ff1ba x25: ffff800081e5fa74 x24: 0000000000000010 [ 49.220918] x23: ffff800081fe0000 x22: 0000000000000000 x21: 0000000000000000 [ 49.228042] x20: ffff0000001b4010 x19: ffff0000001b4010 x18: 0000000000000006 [ 49.235166] x17: ffff7ffffe007000 x16: ffff800080000000 x15: 0000000000000000 [ 49.242290] x14: 00000000000000fc x13: 0000000000000000 x12: 0000000000000000 [ 49.249414] x11: 0000000000000001 x10: 0000000000000a60 x9 : ffff800082f8b8c0 [ 49.256538] x8 : 0000000000000008 x7 : 0000000000000001 x6 : 000000005f54a200 [ 49.263662] x5 : 0000000001000000 x4 : ffff800081b93680 x3 : ffff800081519be0 [ 49.270786] x2 : 0000000080800052 x1 : 0000000000000000 x0 : ffff0000001b4000 [ 49.277911] Call trace: [ 49.280346] 0x80800052 [ 49.282781] platform_pm_resume+0x2c/0x68 [ 49.286785] dpm_run_callback.constprop.0+0x74/0x134 [ 49.291742] device_resume+0x88/0x194 [ 49.295391] dpm_resume+0x10c/0x230 [ 49.298866] dpm_resume_end+0x18/0x30 [ 49.302515] suspend_devices_and_enter+0x2b8/0x624 [ 49.307299] pm_suspend+0x1fc/0x348 [ 49.310774] state_store+0x80/0x104 [ 49.314258] kobj_attr_store+0x18/0x2c [ 49.318002] sysfs_kf_write+0x44/0x54 [ 49.321659] kernfs_fop_write_iter+0x120/0x1ec [ 49.326088] vfs_write+0x1bc/0x300 [ 49.329485] ksys_write+0x70/0x104 [ 49.332874] __arm64_sys_write+0x1c/0x28 [ 49.336783] invoke_syscall+0x48/0x114 [ 49.340527] el0_svc_common.constprop.0+0xc4/0xe4 [ 49.345224] do_el0_svc+0x38/0x98 [ 49.348526] el0_svc+0x2c/0x84 [ 49.351568] el0t_64_sync_handler+0x100/0x12c [ 49.355910] el0t_64_sync+0x190/0x194 [ 49.359567] Code: ???????? ???????? ???????? ???????? (????????) [ 49.365644] ---[ end trace 0000000000000000 ]--- Fixes: 97117eb ("net: stmmac: platform: provide stmmac_pltfr_init()") Signed-off-by: Clark Wang <xiaoning.wang@nxp.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>

Since commit 9bf19fb ("media: v4l: async: Rework internal lists"), aka v6.6-rc1~97^2~198, probing the tegra-video VI driver causes infinite recursion due tegra_vi_graph_parse_one() calling itself until: [ 1.571168] Insufficient stack space to handle exception! ... [ 1.591416] Internal error: kernel stack overflow: 0 [lkl#1] PREEMPT SMP ARM ... [ 3.861013] of_phandle_iterator_init from __of_parse_phandle_with_args+0x40/0xf0 [ 3.868497] __of_parse_phandle_with_args from of_fwnode_graph_get_remote_endpoint+0x68/0xa8 [ 3.876938] of_fwnode_graph_get_remote_endpoint from fwnode_graph_get_remote_port_parent+0x30/0x7c [ 3.885984] fwnode_graph_get_remote_port_parent from tegra_vi_graph_parse_one+0x7c/0x224 [ 3.894158] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224 [ 3.901459] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224 [ 3.908760] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224 [ 3.916061] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224 ... [ 4.857892] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224 [ 4.865193] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224 [ 4.872494] tegra_vi_graph_parse_one from tegra_vi_init+0x574/0x6d4 [ 4.878842] tegra_vi_init from host1x_device_init+0x84/0x15c [ 4.884594] host1x_device_init from host1x_video_probe+0xa0/0x114 [ 4.890770] host1x_video_probe from really_probe+0xe0/0x400 The reason is the mentioned commit changed tegra_vi_graph_find_entity() to search for an entity in the done notifier list: > @@ -1464,7 +1464,7 @@ tegra_vi_graph_find_entity(struct tegra_vi_channel *chan, > struct tegra_vi_graph_entity *entity; > struct v4l2_async_connection *asd; > > - list_for_each_entry(asd, &chan->notifier.asc_list, asc_entry) { > + list_for_each_entry(asd, &chan->notifier.done_list, asc_entry) { > entity = to_tegra_vi_graph_entity(asd); > if (entity->asd.match.fwnode == fwnode) > return entity; This is not always correct, being tegra_vi_graph_find_entity() called in three locations, in this order: 1. tegra_vi_graph_parse_one() -- called while probing 2. tegra_vi_graph_notify_bound() -- the .bound notifier op 3. tegra_vi_graph_build() -- called in the .complete notifier op Locations 1 and 2 are called before moving the entity from waiting_list to done_list, thus they won't find what they are looking for in done_list. Location 3 happens afterwards and thus it is not broken, however it means tegra_vi_graph_find_entity() should not search in the same list every time. The error appears at step 1: tegra_vi_graph_parse_one() iterates recursively until it finds the entity already notified, which now never happens. Fix by passing the specific notifier list pointer to tegra_vi_graph_find_entity() instead of the channel, so each caller can search in whatever list is correct. Also improve the tegra_vi_graph_find_entity() comment. Fixes: 9bf19fb ("media: v4l: async: Rework internal lists") Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Jonathan Hunter <jonathanh@nvidia.com> Cc: Sowjanya Komatineni <skomatineni@nvidia.com> Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> [Sakari Ailus: Wrapped some long lines.] Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>

Many functions in drivers/usb/core/hub.c and drivers/usb/core/hub.h access fields inside udev->bos without checking if it was allocated and initialized. If usb_get_bos_descriptor() fails for whatever reason, udev->bos will be NULL and those accesses will result in a crash: BUG: kernel NULL pointer dereference, address: 0000000000000018 PGD 0 P4D 0 Oops: 0000 [lkl#1] PREEMPT SMP NOPTI CPU: 5 PID: 17818 Comm: kworker/5:1 Tainted: G W 5.15.108-18910-gab0e1cb584e1 lkl#1 <HASH:1f9e 1> Hardware name: Google Kindred/Kindred, BIOS Google_Kindred.12672.413.0 02/03/2021 Workqueue: usb_hub_wq hub_event RIP: 0010:hub_port_reset+0x193/0x788 Code: 89 f7 e8 20 f7 15 00 48 8b 43 08 80 b8 96 03 00 00 03 75 36 0f b7 88 92 03 00 00 81 f9 10 03 00 00 72 27 48 8b 80 a8 03 00 00 <48> 83 78 18 00 74 19 48 89 df 48 8b 75 b0 ba 02 00 00 00 4c 89 e9 RSP: 0018:ffffab740c53fcf8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffa1bc5f678000 RCX: 0000000000000310 RDX: fffffffffffffdff RSI: 0000000000000286 RDI: ffffa1be9655b840 RBP: ffffab740c53fd70 R08: 00001b7d5edaa20c R09: ffffffffb005e060 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: ffffab740c53fd3e R14: 0000000000000032 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffa1be96540000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000018 CR3: 000000022e80c005 CR4: 00000000003706e0 Call Trace: hub_event+0x73f/0x156e ? hub_activate+0x5b7/0x68f process_one_work+0x1a2/0x487 worker_thread+0x11a/0x288 kthread+0x13a/0x152 ? process_one_work+0x487/0x487 ? kthread_associate_blkcg+0x70/0x70 ret_from_fork+0x1f/0x30 Fall back to a default behavior if the BOS descriptor isn't accessible and skip all the functionalities that depend on it: LPM support checks, Super Speed capabilitiy checks, U1/U2 states setup. Signed-off-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20230830100418.1952143-1-ricardo.canuelo@collabora.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

For ARM processor, unaligned access to device memory is not allowed. Method memcpy does not take care of alignment. USB detection failure with the unalingned address of memory, with below kernel crash. To fix the unalingned address kernel panic, replace memcpy with memcpy_toio method. Kernel crash: Unable to handle kernel paging request at virtual address ffff80000c05008a Mem abort info: ESR = 0x96000061 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x21: alignment fault Data abort info: ISV = 0, ISS = 0x00000061 CM = 0, WnR = 1 swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000000143b000 [ffff80000c05008a] pgd=100000087ffff003, p4d=100000087ffff003, pud=100000087fffe003, pmd=1000000800bcc003, pte=00680000a0010713 Internal error: Oops: 96000061 [lkl#1] SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.19-xilinx-v2022.1 lkl#1 Hardware name: ZynqMP ZCU102 Rev1.0 (DT) pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __memcpy+0x30/0x260 lr : __xudc_ep0_queue+0xf0/0x110 sp : ffff800008003d00 x29: ffff800008003d00 x28: ffff800009474e80 x27: 00000000000000a0 x26: 0000000000000100 x25: 0000000000000012 x24: ffff000800bc8080 x23: 0000000000000001 x22: 0000000000000012 x21: ffff000800bc8080 x20: 0000000000000012 x19: ffff000800bc8080 x18: 0000000000000000 x17: ffff800876482000 x16: ffff800008004000 x15: 0000000000004000 x14: 00001f09785d0400 x13: 0103020101005567 x12: 0781400000000200 x11: 00000000c5672a10 x10: 00000000000008d0 x9 : ffff800009463cf0 x8 : ffff8000094757b0 x7 : 0201010055670781 x6 : 4000000002000112 x5 : ffff80000c05009a x4 : ffff000800a15012 x3 : ffff00080362ad80 x2 : 0000000000000012 x1 : ffff000800a15000 x0 : ffff80000c050088 Call trace: __memcpy+0x30/0x260 xudc_ep0_queue+0x3c/0x60 usb_ep_queue+0x38/0x44 composite_ep0_queue.constprop.0+0x2c/0xc0 composite_setup+0x8d0/0x185c configfs_composite_setup+0x74/0xb0 xudc_irq+0x570/0xa40 __handle_irq_event_percpu+0x58/0x170 handle_irq_event+0x60/0x120 handle_fasteoi_irq+0xc0/0x220 handle_domain_irq+0x60/0x90 gic_handle_irq+0x74/0xa0 call_on_irq_stack+0x2c/0x60 do_interrupt_handler+0x54/0x60 el1_interrupt+0x30/0x50 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x78/0x7c arch_cpu_idle+0x18/0x2c do_idle+0xdc/0x15c cpu_startup_entry+0x28/0x60 rest_init+0xc8/0xe0 arch_call_rest_init+0x10/0x1c start_kernel+0x694/0x6d4 __primary_switched+0xa4/0xac Fixes: 1f7c516 ("usb: gadget: Add xilinx usb2 device support") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/all/202209020044.CX2PfZzM-lkp@intel.com/ Cc: stable@vger.kernel.org Signed-off-by: Piyush Mehta <piyush.mehta@amd.com> Link: https://lore.kernel.org/r/20230929121514.13475-1-piyush.mehta@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

…ocalls() I received a bug report with the following signature: [ 1759.937637] BUG: unable to handle page fault for address: ffffffffffffffe8 [ 1759.944564] #PF: supervisor read access in kernel mode [ 1759.949732] #PF: error_code(0x0000) - not-present page [ 1759.954901] PGD 7ab615067 P4D 7ab615067 PUD 7ab617067 PMD 0 [ 1759.960596] Oops: 0000 1 PREEMPT SMP PTI [ 1759.964804] CPU: 15 PID: 109 Comm: cpuhp/15 Kdump: loaded Tainted: G X ------- — 5.14.0-362.3.1.el9_3.x86_64 lkl#1 [ 1759.976609] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 06/20/2018 [ 1759.985181] RIP: 0010:io_wq_for_each_worker.isra.0+0x24/0xa0 [ 1759.990877] Code: 90 90 90 90 90 90 0f 1f 44 00 00 41 56 41 55 41 54 55 48 8d 6f 78 53 48 8b 47 78 48 39 c5 74 4f 49 89 f5 49 89 d4 48 8d 58 e8 <8b> 13 85 d2 74 32 8d 4a 01 89 d0 f0 0f b1 0b 75 5c 09 ca 78 3d 48 [ 1760.009758] RSP: 0000:ffffb6f403603e20 EFLAGS: 00010286 [ 1760.015013] RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: 0000000000000000 [ 1760.022188] RDX: ffffb6f403603e50 RSI: ffffffffb11e95b0 RDI: ffff9f73b09e9400 [ 1760.029362] RBP: ffff9f73b09e9478 R08: 000000000000000f R09: 0000000000000000 [ 1760.036536] R10: ffffffffffffff00 R11: ffffb6f403603d80 R12: ffffb6f403603e50 [ 1760.043712] R13: ffffffffb11e95b0 R14: ffffffffb28531e8 R15: ffff9f7a6fbdf548 [ 1760.050887] FS: 0000000000000000(0000) GS:ffff9f7a6fbc0000(0000) knlGS:0000000000000000 [ 1760.059025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1760.064801] CR2: ffffffffffffffe8 CR3: 00000007ab610002 CR4: 00000000007706e0 [ 1760.071976] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1760.079150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1760.086325] PKRU: 55555554 [ 1760.089044] Call Trace: [ 1760.091501] <TASK> [ 1760.093612] ? show_trace_log_lvl+0x1c4/0x2df [ 1760.097995] ? show_trace_log_lvl+0x1c4/0x2df [ 1760.102377] ? __io_wq_cpu_online+0x54/0xb0 [ 1760.106584] ? __die_body.cold+0x8/0xd [ 1760.110356] ? page_fault_oops+0x134/0x170 [ 1760.114479] ? kernelmode_fixup_or_oops+0x84/0x110 [ 1760.119298] ? exc_page_fault+0xa8/0x150 [ 1760.123247] ? asm_exc_page_fault+0x22/0x30 [ 1760.127458] ? __pfx_io_wq_worker_affinity+0x10/0x10 [ 1760.132453] ? __pfx_io_wq_worker_affinity+0x10/0x10 [ 1760.137446] ? io_wq_for_each_worker.isra.0+0x24/0xa0 [ 1760.142527] __io_wq_cpu_online+0x54/0xb0 [ 1760.146558] cpuhp_invoke_callback+0x109/0x460 [ 1760.151029] ? __pfx_io_wq_cpu_offline+0x10/0x10 [ 1760.155673] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 1760.160320] cpuhp_thread_fun+0x8d/0x140 [ 1760.164266] smpboot_thread_fn+0xd3/0x1a0 [ 1760.168297] kthread+0xdd/0x100 [ 1760.171457] ? __pfx_kthread+0x10/0x10 [ 1760.175225] ret_from_fork+0x29/0x50 [ 1760.178826] </TASK> [ 1760.181022] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc vfat fat dm_multipath intel_rapl_msr intel_rapl_common isst_if_common ipmi_ssif nfit libnvdimm mgag200 i2c_algo_bit ioatdma drm_shmem_helper drm_kms_helper acpi_ipmi syscopyarea x86_pkg_temp_thermal sysfillrect ipmi_si intel_powerclamp sysimgblt ipmi_devintf coretemp acpi_power_meter ipmi_msghandler rapl pcspkr dca intel_pch_thermal intel_cstate ses lpc_ich intel_uncore enclosure hpilo mei_me mei acpi_tad fuse drm xfs sd_mod sg bnx2x nvme nvme_core crct10dif_pclmul crc32_pclmul nvme_common ghash_clmulni_intel smartpqi tg3 t10_pi mdio uas libcrc32c crc32c_intel scsi_transport_sas usb_storage hpwdt wmi dm_mirror dm_region_hash dm_log dm_mod [ 1760.248623] CR2: ffffffffffffffe8 A cpu hotplug callback was issued before wq->all_list was initialized. This results in a null pointer dereference. The fix is to fully setup the io_wq before calling cpuhp_state_add_instance_nocalls(). Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Link: https://lore.kernel.org/r/x49y1ghnecs.fsf@segfault.boston.devel.redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>

Surfaces can be backed (i.e. stored in) memory objects (mob's) which are created and managed by the userspace as GEM buffers. Surfaces grab only a ttm reference which means that the gem object can be deleted underneath us, especially in cases where prime buffer export is used. Make sure that all userspace surfaces which are backed by gem objects hold a gem reference to make sure they're not deleted before vmw surfaces are done with them, which fixes: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 2 PID: 2632 at lib/refcount.c:28 refcount_warn_saturate+0xfb/0x150 Modules linked in: overlay vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock snd_ens1371 snd_ac97_codec ac97_bus snd_pcm gameport> CPU: 2 PID: 2632 Comm: vmw_ref_count Not tainted 6.5.0-rc2-vmwgfx lkl#1 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:refcount_warn_saturate+0xfb/0x150 Code: eb 9e 0f b6 1d 8b 5b a6 01 80 fb 01 0f 87 ba e4 80 00 83 e3 01 75 89 48 c7 c7 c0 3c f9 a3 c6 05 6f 5b a6 01 01 e8 15 81 98 ff <0f> 0b e9 6f ff ff ff 0f b> RSP: 0018:ffffbdc34344bba0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 RDX: ffff960475ea1548 RSI: 0000000000000001 RDI: ffff960475ea1540 RBP: ffffbdc34344bba8 R08: 0000000000000003 R09: 65646e75203a745f R10: ffffffffa5b32b20 R11: 72657466612d6573 R12: ffff96037d6a6400 R13: ffff9603484805b0 R14: 000000000000000b R15: ffff9603bed06060 FS: 00007f5fd8520c40(0000) GS:ffff960475e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f5fda755000 CR3: 000000010d012005 CR4: 00000000003706e0 Call Trace: <TASK> ? show_regs+0x6e/0x80 ? refcount_warn_saturate+0xfb/0x150 ? __warn+0x91/0x150 ? refcount_warn_saturate+0xfb/0x150 ? report_bug+0x19d/0x1b0 ? handle_bug+0x46/0x80 ? exc_invalid_op+0x1d/0x80 ? asm_exc_invalid_op+0x1f/0x30 ? refcount_warn_saturate+0xfb/0x150 drm_gem_object_handle_put_unlocked+0xba/0x110 [drm] drm_gem_object_release_handle+0x6e/0x80 [drm] drm_gem_handle_delete+0x6a/0xc0 [drm] ? __pfx_vmw_bo_unref_ioctl+0x10/0x10 [vmwgfx] vmw_bo_unref_ioctl+0x33/0x40 [vmwgfx] drm_ioctl_kernel+0xbc/0x160 [drm] drm_ioctl+0x2d2/0x580 [drm] ? __pfx_vmw_bo_unref_ioctl+0x10/0x10 [vmwgfx] ? do_vmi_munmap+0xee/0x180 vmw_generic_ioctl+0xbd/0x180 [vmwgfx] vmw_unlocked_ioctl+0x19/0x20 [vmwgfx] __x64_sys_ioctl+0x99/0xd0 do_syscall_64+0x5d/0x90 ? syscall_exit_to_user_mode+0x2a/0x50 ? do_syscall_64+0x6d/0x90 ? handle_mm_fault+0x16e/0x2f0 ? exit_to_user_mode_prepare+0x34/0x170 ? irqentry_exit_to_user_mode+0xd/0x20 ? irqentry_exit+0x3f/0x50 ? exc_page_fault+0x8e/0x190 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7f5fda51aaff Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 7> RSP: 002b:00007ffd536a4d30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffd536a4de0 RCX: 00007f5fda51aaff RDX: 00007ffd536a4de0 RSI: 0000000040086442 RDI: 0000000000000003 RBP: 0000000040086442 R08: 000055fa603ada50 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffd536a51b8 R13: 0000000000000003 R14: 000055fa5ebb4c80 R15: 00007f5fda90f040 </TASK> ---[ end trace 0000000000000000 ]--- A lot of the analyis on the bug was done by Murray McAllister and Ian Forbes. Reported-by: Murray McAllister <murray.mcallister@gmail.com> Cc: Ian Forbes <iforbes@vmware.com> Signed-off-by: Zack Rusin <zackr@vmware.com> Fixes: a950b98 ("drm/vmwgfx: Do not drop the reference to the handle too soon") Cc: <stable@vger.kernel.org> # v6.2+ Reviewed-by: Martin Krastev <krastevm@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230928041355.737635-1-zack@kde.org

Eddie reported that newer kernels were crashing during boot on his 476 FSP2 system: kernel tried to execute user page (b7ee2000) - exploit attempt? (uid: 0) BUG: Unable to handle kernel instruction fetch Faulting instruction address: 0xb7ee2000 Oops: Kernel access of bad area, sig: 11 [lkl#1] BE PAGE_SIZE=4K FSP-2 Modules linked in: CPU: 0 PID: 61 Comm: mount Not tainted 6.1.55-d23900f.ppcnf-fsp2 lkl#1 Hardware name: ibm,fsp2 476fpe 0x7ff520c0 FSP-2 NIP: b7ee2000 LR: 8c008000 CTR: 00000000 REGS: bffebd83 TRAP: 0400 Not tainted (6.1.55-d23900f.ppcnf-fs p2) MSR: 00000030 <IR,DR> CR: 00001000 XER: 20000000 GPR00: c00110ac bffebe63 bffebe7e bffebe88 8c008000 00001000 00000d12 b7ee2000 GPR08: 00000033 00000000 00000000 c139df10 48224824 1016c314 10160000 00000000 GPR16: 10160000 10160000 00000008 00000000 10160000 00000000 10160000 1017f5b0 GPR24: 1017fa50 1017f4f0 1017fa50 1017f740 1017f630 00000000 00000000 1017f4f0 NIP [b7ee2000] 0xb7ee2000 LR [8c008000] 0x8c008000 Call Trace: Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX ---[ end trace 0000000000000000 ]--- The problem is in ret_from_syscall where the check for icache_44x_need_flush is done. When the flush is needed the code jumps out-of-line to do the flush, and then intends to jump back to continue the syscall return. However the branch back to label 1b doesn't return to the correct location, instead branching back just prior to the return to userspace, causing bogus register values to be used by the rfi. The breakage was introduced by commit 6f76a01 ("powerpc/syscall: implement system call entry/exit logic in C for PPC32") which inadvertently removed the "1" label and reused it elsewhere. Fix it by adding named local labels in the correct locations. Note that the return label needs to be outside the ifdef so that CONFIG_PPC_47x=n compiles. Fixes: 6f76a01 ("powerpc/syscall: implement system call entry/exit logic in C for PPC32") Cc: stable@vger.kernel.org # v5.12+ Reported-by: Eddie James <eajames@linux.ibm.com> Tested-by: Eddie James <eajames@linux.ibm.com> Link: https://lore.kernel.org/linuxppc-dev/fdaadc46-7476-9237-e104-1d2168526e72@linux.ibm.com/ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://msgid.link/20231010114750.847794-1-mpe@ellerman.id.au

The cited patch change mlx5 driver so that during probe DMA operations were performed before pci_enable_device(), and during teardown DMA operations were performed after pci_disable_device(). DMA operations require PCI to be enabled. Hence, The above leads to the following oops in PPC systems[1]. On s390x systems, as reported by Niklas Schnelle, this is a problem because mlx5_pci_init() is where the DMA and coherent mask is set but mlx5_cmd_init() already does a dma_alloc_coherent(). Thus a DMA allocation is done during probe before the correct mask is set. This causes probe to fail initialization of the cmdif SW structs on s390x after that is converted to the common dma-iommu code. This is because on s390x DMA addresses below 4 GiB are reserved on current machines and unlike the old s390x specific DMA API implementation common code enforces DMA masks. Fix it by performing the DMA operations during probe after pci_enable_device() and after the dma mask is set, and during teardown before pci_disable_device(). [1] Oops: Kernel access of bad area, sig: 11 [lkl#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 netconsole rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser ib_umad rdma_cm ib_ipoib iw_cm libiscsi scsi_transport_iscsi ib_cm ib_uverbs ib_core mlx5_core(-) ptp pps_core fuse vmx_crypto crc32c_vpmsum [last unloaded: mlx5_ib] CPU: 1 PID: 8937 Comm: modprobe Not tainted 6.5.0-rc3_for_upstream_min_debug_2023_07_31_16_02 lkl#1 Hardware name: IBM pSeries (emulated by qemu) POWER9 (raw) 0x4e1202 0xf000005 of:SLOF,HEAD hv:linux,kvm pSeries NIP: c000000000423388 LR: c0000000001e733c CTR: c0000000001e4720 REGS: c0000000055636d0 TRAP: 0380 Not tainted (6.5.0-rc3_for_upstream_min_debug_2023_07_31_16_02) MSR: 8000000000009033 CR: 24008884 XER: 20040000 CFAR: c0000000001e7338 IRQMASK: 0 NIP [c000000000423388] __free_pages+0x28/0x160 LR [c0000000001e733c] dma_direct_free+0xac/0x190 Call Trace: [c000000005563970] [5deadbeef0000100] 0x5deadbeef0000100 (unreliable) [c0000000055639b0] [c0000000003d46cc] kfree+0x7c/0x150 [c000000005563a40] [c0000000001e47c8] dma_free_attrs+0xa8/0x1a0 [c000000005563aa0] [c008000000d0064c] mlx5_cmd_cleanup+0xa4/0x100 [mlx5_core] [c000000005563ad0] [c008000000cf629c] mlx5_mdev_uninit+0xf4/0x140 [mlx5_core] [c000000005563b00] [c008000000cf6448] remove_one+0x160/0x1d0 [mlx5_core] [c000000005563b40] [c000000000958540] pci_device_remove+0x60/0x110 [c000000005563b80] [c000000000a35e80] device_remove+0x70/0xd0 [c000000005563bb0] [c000000000a37a38] device_release_driver_internal+0x2a8/0x330 [c000000005563c00] [c000000000a37b8c] driver_detach+0x8c/0x160 [c000000005563c40] [c000000000a35350] bus_remove_driver+0x90/0x110 [c000000005563c80] [c000000000a38948] driver_unregister+0x48/0x90 [c000000005563cf0] [c000000000957e38] pci_unregister_driver+0x38/0x150 [c000000005563d40] [c008000000eb6140] mlx5_cleanup+0x38/0x90 [mlx5_core] Fixes: 06cd555 ("net/mlx5: split mlx5_cmd_init() to probe and reload routines") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Hold RTNL lock when calling xdp_set_features() with a registered netdev, as the call triggers the netdev notifiers. This could happen when switching from nic profile to uplink representor for example. Similar logic which fixed a similar scenario was previously introduced in the following commit: commit 72cc654 net/mlx5e: Take RTNL lock when needed before calling xdp_set_features(). This fixes the following assertion and warning call trace: RTNL: assertion failed at net/core/dev.c (1961) WARNING: CPU: 13 PID: 2529 at net/core/dev.c:1961 call_netdevice_notifiers_info+0x7c/0x80 Modules linked in: rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_core zram zsmalloc fuse CPU: 13 PID: 2529 Comm: devlink Not tainted 6.5.0_for_upstream_min_debug_2023_09_07_20_04 lkl#1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:call_netdevice_notifiers_info+0x7c/0x80 Code: 8f ff 80 3d 77 0d 16 01 00 75 c5 ba a9 07 00 00 48 c7 c6 c4 bb 0d 82 48 c7 c7 18 c8 06 82 c6 05 5b 0d 16 01 01 e8 44 f6 8c ff <0f> 0b eb a2 0f 1f 44 00 00 55 48 89 e5 41 54 48 83 e4 f0 48 83 ec RSP: 0018:ffff88819930f7f0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff8309f740 RCX: 0000000000000027 RDX: ffff88885fb5b5c8 RSI: 0000000000000001 RDI: ffff88885fb5b5c0 RBP: 0000000000000028 R08: ffff88887ffabaa8 R09: 0000000000000003 R10: ffff88887fecbac0 R11: ffff88887ff7bac0 R12: ffff88819930f810 R13: ffff88810b7fea40 R14: ffff8881154e8fd8 R15: ffff888107e881a0 FS: 00007f3ad248f800(0000) GS:ffff88885fb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000563b85f164e0 CR3: 0000000113b5c006 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x79/0x120 ? call_netdevice_notifiers_info+0x7c/0x80 ? report_bug+0x17c/0x190 ? handle_bug+0x3c/0x60 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? call_netdevice_notifiers_info+0x7c/0x80 call_netdevice_notifiers+0x2e/0x50 mlx5e_set_xdp_feature+0x21/0x50 [mlx5_core] mlx5e_build_rep_params+0x97/0x130 [mlx5_core] mlx5e_init_ul_rep+0x9f/0x100 [mlx5_core] mlx5e_netdev_init_profile+0x76/0x110 [mlx5_core] mlx5e_netdev_attach_profile+0x1f/0x90 [mlx5_core] mlx5e_netdev_change_profile+0x92/0x160 [mlx5_core] mlx5e_vport_rep_load+0x329/0x4a0 [mlx5_core] mlx5_esw_offloads_rep_load+0x9e/0xf0 [mlx5_core] esw_offloads_enable+0x4bc/0xe90 [mlx5_core] mlx5_eswitch_enable_locked+0x3c8/0x570 [mlx5_core] ? kmalloc_trace+0x25/0x80 mlx5_devlink_eswitch_mode_set+0x224/0x680 [mlx5_core] ? devlink_get_from_attrs_lock+0x9e/0x110 devlink_nl_cmd_eswitch_set_doit+0x60/0xe0 genl_family_rcv_msg_doit+0xd0/0x120 genl_rcv_msg+0x180/0x2b0 ? devlink_get_from_attrs_lock+0x110/0x110 ? devlink_nl_cmd_eswitch_get_doit+0x290/0x290 ? devlink_pernet_pre_exit+0xf0/0xf0 ? genl_family_rcv_msg_dumpit+0xf0/0xf0 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x1fc/0x2c0 netlink_sendmsg+0x232/0x4a0 sock_sendmsg+0x38/0x60 ? _copy_from_user+0x2a/0x60 __sys_sendto+0x110/0x160 ? handle_mm_fault+0x161/0x260 ? do_user_addr_fault+0x276/0x620 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f3ad231340a Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffd70aad4b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000c36b00 RCX:00007f3ad231340a RDX: 0000000000000038 RSI: 0000000000c36b00 RDI: 0000000000000003 RBP: 0000000000c36910 R08: 00007f3ad2625200 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001 </TASK> ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ Fixes: 4d5ab0a ("net/mlx5e: take into account device reconfiguration for xdp_features flag") Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

When testing on risc-v QEMU environment with "crashkernel=" parameter enabled, a problem occurred with the following message: [ 0.000000] crashkernel low memory reserved: 0xf8000000 - 0x100000000 (128 MB) [ 0.000000] crashkernel reserved: 0x0000000177e00000 - 0x0000000277e00000 (4096 MB) [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/resource.c:779 __insert_resource+0x8e/0xd0 [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0-rc2-next-20230920 lkl#1 [ 0.000000] Hardware name: riscv-virtio,qemu (DT) [ 0.000000] epc : __insert_resource+0x8e/0xd0 [ 0.000000] ra : insert_resource+0x28/0x4e [ 0.000000] epc : ffffffff80017344 ra : ffffffff8001742e sp : ffffffff81203db0 [ 0.000000] gp : ffffffff812ece98 tp : ffffffff8120dac0 t0 : ff600001f7ff2b00 [ 0.000000] t1 : 0000000000000000 t2 : 3428203030303030 s0 : ffffffff81203dc0 [ 0.000000] s1 : ffffffff81211e18 a0 : ffffffff81211e18 a1 : ffffffff81289380 [ 0.000000] a2 : 0000000277dfffff a3 : 0000000177e00000 a4 : 0000000177e00000 [ 0.000000] a5 : ffffffff81289380 a6 : 0000000277dfffff a7 : 0000000000000078 [ 0.000000] s2 : ffffffff81289380 s3 : ffffffff80a0bac8 s4 : ff600001f7ff2880 [ 0.000000] s5 : 0000000000000280 s6 : 8000000a00006800 s7 : 000000000000007f [ 0.000000] s8 : 0000000080017038 s9 : 0000000080038ea0 s10: 0000000000000000 [ 0.000000] s11: 0000000000000000 t3 : ffffffff80a0bc00 t4 : ffffffff80a0bc00 [ 0.000000] t5 : ffffffff80a0bbd0 t6 : ffffffff80a0bc00 [ 0.000000] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003 [ 0.000000] [<ffffffff80017344>] __insert_resource+0x8e/0xd0 [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Failed to add a Crash kernel resource at 177e00000 The crashkernel memory has been allocated successfully, whereas it failed to insert into iomem_resource. This is due to the unique reserving logic in risc-v arch specific code, i.e. crashk_res/crashk_low_res will be added into iomem_resource later in init_resources(), which is not aligned with current unified reserving logic in reserve_crashkernel_{generic,low}() and therefore leads to the failure of crashkernel reservation. Removing the arch specific code within #ifdef CONFIG_KEXEC_CORE in init_resources() to fix above problem. Fixes: 3154915 ("riscv: kdump: use generic interface to simplify crashkernel reservation") Signed-off-by: Chen Jiahao <chenjiahao16@huawei.com> Link: https://lore.kernel.org/r/20230925024333.730964-1-chenjiahao16@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

On driver load, scsi_add_host() can fail. This triggers the free path to call qla2x00_mem_free() multiple times. This causes NULL pointer access of ha->base_qpair. Add check before access. BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 IP: [<ffffffffc118f73c>] qla2x00_mem_free+0x51c/0xcb0 [qla2xxx] PGD 8000001fcfe4a067 PUD 1fc8f0a067 PMD 0 Oops: 0000 [lkl#1] SMP RIP: 0010:[<ffffffffc118f73c>] [<ffffffffc118f73c>] qla2x00_mem_free+0x51c/0xcb0 [qla2xxx] RSP: 0018:ffff8ace97a93a30 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8ace8efd0000 RCX: 000000000000488f RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff8ace97a93a60 R08: 000000000001f040 R09: ffffffff8678209b R10: ffff8acf7d6df040 R11: ffffc591c0fcc980 R12: ffffffff87034800 R13: ffff8acf0e3cc740 R14: ffff8ace8efd0000 R15: 00000000fffffff4 FS: 00007f4cf5449740(0000) GS:ffff8acf7d6c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000030 CR3: 0000001fc2f6c000 CR4: 00000000007607e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: [<ffffffff86781f18>] ? kobject_put+0x28/0x60 [<ffffffffc119a59c>] qla2x00_probe_one+0x19fc/0x3040 [qla2xxx] Fixes: efeda3b ("scsi: qla2xxx: Move resource to allow code reuse") Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Link: https://lore.kernel.org/r/20231016101749.5059-1-njavali@marvell.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

…ress If we specify a valid CQ ring address but an invalid SQ ring address, we'll correctly spot this and free the allocated pages and clear them to NULL. However, we don't clear the ring page count, and hence will attempt to free the pages again. We've already cleared the address of the page array when freeing them, but we don't check for that. This causes the following crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Oops [lkl#1] Modules linked in: CPU: 0 PID: 20 Comm: kworker/u2:1 Not tainted 6.6.0-rc5-dirty lkl#56 Hardware name: ucbbar,riscvemu-bare (DT) Workqueue: events_unbound io_ring_exit_work epc : io_pages_free+0x2a/0x58 ra : io_rings_free+0x3a/0x50 epc : ffffffff808811a2 ra : ffffffff80881406 sp : ffff8f80000c3cd0 status: 0000000200000121 badaddr: 0000000000000000 cause: 000000000000000d [<ffffffff808811a2>] io_pages_free+0x2a/0x58 [<ffffffff80881406>] io_rings_free+0x3a/0x50 [<ffffffff80882176>] io_ring_exit_work+0x37e/0x424 [<ffffffff80027234>] process_one_work+0x10c/0x1f4 [<ffffffff8002756e>] worker_thread+0x252/0x31c [<ffffffff8002f5e4>] kthread+0xc4/0xe0 [<ffffffff8000332a>] ret_from_fork+0xa/0x1c Check for a NULL array in io_pages_free(), but also clear the page counts when we free them to be on the safer side. Reported-by: rtm@csail.mit.edu Fixes: 03d89a2 ("io_uring: support for user allocated memory for rings/sqes") Cc: stable@vger.kernel.org Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>

when the checked address is illegal,the corresponding shadow address from kasan_mem_to_shadow may have no mapping in mmu table. Access such shadow address causes kernel oops. Here is a sample about oops on arm64(VA 39bit) with KASAN_SW_TAGS and KASAN_OUTLINE on: [ffffffb80aaaaaaa] pgd=000000005d3ce003, p4d=000000005d3ce003, pud=000000005d3ce003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [lkl#1] PREEMPT SMP Modules linked in: CPU: 3 PID: 100 Comm: sh Not tainted 6.6.0-rc1-dirty lkl#43 Hardware name: linux,dummy-virt (DT) pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __hwasan_load8_noabort+0x5c/0x90 lr : do_ib_ob+0xf4/0x110 ffffffb80aaaaaaa is the shadow address for efffff80aaaaaaaa. The problem is reading invalid shadow in kasan_check_range. The generic kasan also has similar oops. It only reports the shadow address which causes oops but not the original address. Commit 2f004ee("x86/kasan: Print original address on #GP") introduce to kasan_non_canonical_hook but limit it to KASAN_INLINE. This patch extends it to KASAN_OUTLINE mode. Link: https://lkml.kernel.org/r/20231009073748.159228-1-haibo.li@mediatek.com Fixes: 2f004ee("x86/kasan: Print original address on #GP") Signed-off-by: Haibo Li <haibo.li@mediatek.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Haibo Li <haibo.li@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

As drm_dp_get_mst_branch_device_by_guid() is called from drm_dp_get_mst_branch_device_by_guid(), mstb parameter has to be checked, otherwise NULL dereference may occur in the call to the memcpy() and cause following: [12579.365869] BUG: kernel NULL pointer dereference, address: 0000000000000049 [12579.365878] #PF: supervisor read access in kernel mode [12579.365880] #PF: error_code(0x0000) - not-present page [12579.365882] PGD 0 P4D 0 [12579.365887] Oops: 0000 [lkl#1] PREEMPT SMP NOPTI ... [12579.365895] Workqueue: events_long drm_dp_mst_up_req_work [12579.365899] RIP: 0010:memcmp+0xb/0x29 [12579.365921] Call Trace: [12579.365927] get_mst_branch_device_by_guid_helper+0x22/0x64 [12579.365930] drm_dp_mst_up_req_work+0x137/0x416 [12579.365933] process_one_work+0x1d0/0x419 [12579.365935] worker_thread+0x11a/0x289 [12579.365938] kthread+0x13e/0x14f [12579.365941] ? process_one_work+0x419/0x419 [12579.365943] ? kthread_blkcg+0x31/0x31 [12579.365946] ret_from_fork+0x1f/0x30 As get_mst_branch_device_by_guid_helper() is recursive, moving condition to the first line allow to remove a similar one for step over of NULL elements inside a loop. Fixes: 5e93b82 ("drm/dp/mst: move GUID storage from mgr, port to only mst branch") Cc: <stable@vger.kernel.org> # 4.14+ Signed-off-by: Lukasz Majczak <lma@semihalf.com> Reviewed-by: Radoslaw Biernacki <rad@chromium.org> Signed-off-by: Manasi Navare <navaremanasi@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230922063410.23626-1-lma@semihalf.com

Erhard reported that his G5 was crashing with v6.6-rc kernels: mpic: Setting up HT PICs workarounds for U3/U4 BUG: Unable to handle kernel data access at 0xfeffbb62ffec65fe Faulting instruction address: 0xc00000000005dc40 Oops: Kernel access of bad area, sig: 11 [lkl#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 PowerMac Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G T 6.6.0-rc3-PMacGS lkl#1 Hardware name: PowerMac11,2 PPC970MP 0x440101 PowerMac NIP: c00000000005dc40 LR: c000000000066660 CTR: c000000000007730 REGS: c0000000022bf510 TRAP: 0380 Tainted: G T (6.6.0-rc3-PMacGS) MSR: 9000000000001032 <SF,HV,ME,IR,DR,RI> CR: 44004242 XER: 00000000 IRQMASK: 3 GPR00: 0000000000000000 c0000000022bf7b0 c0000000010c0b00 00000000000001ac GPR04: 0000000003c80000 0000000000000300 c0000000f20001ae 0000000000000300 GPR08: 0000000000000006 feffbb62ffec65ff 0000000000000001 0000000000000000 GPR12: 9000000000001032 c000000002362000 c000000000f76b80 000000000349ecd8 GPR16: 0000000002367ba8 0000000002367f08 0000000000000006 0000000000000000 GPR20: 00000000000001ac c000000000f6f920 c0000000022cd985 000000000000000c GPR24: 0000000000000300 00000003b0a3691d c0003e008030000e 0000000000000000 GPR28: c00000000000000c c0000000f20001ee feffbb62ffec65fe 00000000000001ac NIP hash_page_do_lazy_icache+0x50/0x100 LR __hash_page_4K+0x420/0x590 Call Trace: hash_page_mm+0x364/0x6f0 do_hash_fault+0x114/0x2b0 data_access_common_virt+0x198/0x1f0 --- interrupt: 300 at mpic_init+0x4bc/0x10c4 NIP: c000000002020a5c LR: c000000002020a04 CTR: 0000000000000000 REGS: c0000000022bf9f0 TRAP: 0300 Tainted: G T (6.6.0-rc3-PMacGS) MSR: 9000000000001032 <SF,HV,ME,IR,DR,RI> CR: 24004248 XER: 00000000 DAR: c0003e008030000e DSISR: 40000000 IRQMASK: 1 ... NIP mpic_init+0x4bc/0x10c4 LR mpic_init+0x464/0x10c4 --- interrupt: 300 pmac_setup_one_mpic+0x258/0x2dc pmac_pic_init+0x28c/0x3d8 init_IRQ+0x90/0x140 start_kernel+0x57c/0x78c start_here_common+0x1c/0x20 A bisect pointed to the breakage beginning with commit 9fee28b ("powerpc: implement the new page table range API"). Analysis of the oops pointed to a struct page with a corrupted compound_head being loaded via page_folio() -> _compound_head() in hash_page_do_lazy_icache(). The access by the mpic code is to an MMIO address, so the expectation is that the struct page for that address would be initialised by init_unavailable_range(), as pointed out by Aneesh. Instrumentation showed that was not the case, which eventually lead to the realisation that pfn_valid() was returning false for that address, causing the struct page to not be initialised. Because the system is using FLATMEM, the version of pfn_valid() in memory_model.h is used: static inline int pfn_valid(unsigned long pfn) { ... return pfn >= pfn_offset && (pfn - pfn_offset) < max_mapnr; } Which relies on max_mapnr being initialised. Early in boot max_mapnr is zero meaning no PFNs are valid. max_mapnr is initialised in mem_init() called via: start_kernel() mm_core_init() # init/main.c:928 mem_init() But that is too late for the usage in init_unavailable_range() called via: start_kernel() setup_arch() # init/main.c:893 paging_init() free_area_init() init_unavailable_range() Although max_mapnr is currently set in mem_init(), the value is actually already available much earlier, as soon as mem_topology_setup() has completed, which is also before paging_init() is called. So move the initialisation there, which causes paging_init() to correctly initialise the struct page and fixes the bug. This bug seems to have been lurking for years, but went unnoticed because the pre-folio code was inspecting the uninitialised page->flags but not dereferencing it. Thanks to Erhard and Aneesh for help debugging. Reported-by: Erhard Furtner <erhard_f@mailbox.org> Closes: https://lore.kernel.org/all/20230929132750.3cd98452@yea/ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20231023112500.1550208-1-mpe@ellerman.id.au

cemeyer added 22 commits November 8, 2015 14:23

time/Kconfig: Hack NR_CPUS range for cross-build on FreeBSD

54dda2f

Kbuild/timeconst.bc: Hack to build with FreeBSD bc(1)

a6dee54

lkl/Kconfig: Build 64-bit

549f49e

checksyscalls.sh: Turn off missing syscall warning spam

900fac9

lkl/Makefile: On FreeBSD, use hw.ncpu sysctl instead of Linux nproc(1)

af529c9

headers_install: Use env(1) to locate python binary

2c0e501

lkl/Makefile: Drop lkl.o:lkl.o circular dependency

f8065a1

Documentation/lkl.txt: Document FreeBSD build process

7d26e42

lkl/Makefile: Use echo -e to print tab

a047066

lkl/lib: Fix several compiler warnings from Clang

2251107

lkl/Makefile: Build with GCC and turn on the optimizer; warnings

caaf737

lkl/Makefile: Add kernel clean to clean target

855af94

lkl/lib/virtio.h: On FreeBSD, endian.h is in sys/

542a717

lkl/Makefile: Detect elf64-x86-64-freebsd as a 64-bit platform

010e86c

lkl/Makefile: Use OUTPUT_FORMAT variable instead of duplicating LD in…

a1b8b2e

…vocation

lkl/Makefile: Drop useless warns and turn on the Werror hammer

841ee1f

lkl/cptofs: Fix build on FreeBSD

92bac99

lkl/cptofs: Fix compiler warning on bogus error checking

58aed8f

lkl/lklfuse: Fix compiler warnings

c7c2288

lkl/fs2tar: Fix compiler warnings

8c2495a

lklfuse: sys_statfs64 takes a middle size parameter

00be943

lkl/Makefile: ignore bad aliasing in virtio.c

903eac6

D0nYu mentioned this pull request Dec 11, 2024

Lkl hid fuzzer #515

Merged

rodionov mentioned this pull request Dec 26, 2024

LKL MMU support #551

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

lkl: Add FreeBSD platform support #1

lkl: Add FreeBSD platform support #1

Uh oh!

cemeyer commented Nov 9, 2015

Uh oh!

tavip commented Nov 10, 2015

Uh oh!

cemeyer commented Nov 10, 2015

Uh oh!

tavip commented Nov 11, 2015

Uh oh!

cemeyer commented Nov 11, 2015

Uh oh!

cemeyer commented Nov 11, 2015

Uh oh!

cemeyer commented Nov 11, 2015

Uh oh!

tavip commented Nov 11, 2015

Uh oh!

cemeyer commented Nov 11, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lkl: Add FreeBSD platform support #1

lkl: Add FreeBSD platform support #1

Uh oh!

Conversation

cemeyer commented Nov 9, 2015

Uh oh!

tavip commented Nov 10, 2015

Uh oh!

cemeyer commented Nov 10, 2015

Uh oh!

tavip commented Nov 11, 2015

Uh oh!

cemeyer commented Nov 11, 2015

Uh oh!

cemeyer commented Nov 11, 2015

Uh oh!

cemeyer commented Nov 11, 2015

Uh oh!

tavip commented Nov 11, 2015

Uh oh!

cemeyer commented Nov 11, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants