Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5.15.5 on intel 6th gen CPU: long startup, long reboot, NO powerdown or suspend unless Vt-d is disabled. #86

Closed
spxak1 opened this issue Dec 1, 2021 · 16 comments · Fixed by #105
Assignees

Comments

@spxak1
Copy link

spxak1 commented Dec 1, 2021

This is on 21.04 (system details at the end).

Lot's of edits on this sorry, but I put things together as it goes.

Slow after POST

Once the EFI stub is read, it takes a while to load the kernel. Normally this starts right away, it takes a good 8-10 seconds from EFI stub: Loaded initrd from command line option to actually load it and get started the boot process. With 5.13 this was instant.

Slow boot

Like a timeout, but actually these are the two oops.

Slow reboot

From sudo reboot to POST screen it takes under 10 seconds in 5.13, but over 27 seconds in 5.15.5. The time added is between the OS finishing and the laptop restarting (before POST). Which links to the next item

Here's the weird part: The laptop does not shutdown.

The process appears to be completed as usual (some output, I'll check later for details), the screen goes off, power button goes off, but keyboard backlight stays on, keyboard LEDs stay on and the fans stay on. I need to keep the power button for 4 seconds to turn off.

Same behaviour when suspending to RAM. Obviously there is no waking up from it as the power button is irresponsive, opening the lid won't change anything and only thing left to do is hold the power button for a hard power off and then restart as normal.

All of the above are resolved once VT-d is disabled in the bios. So something in this kernel doesn't agree with my laptop's (or CPU) VT-d. I will keep it disabled and keep testing (although I will need it back on for my VM's).

Info

NAME="Pop!_OS"
VERSION="21.04"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 21.04"
VERSION_ID="21.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=hirsute
UBUNTU_CODENAME=hirsute
LOGO=distributor-logo-pop-os
Linux weywot 5.15.5-76051505-generic #202111250933~1638201579~21.04~09f1aa7-Ubuntu SMP Tue Nov 30 02: x86_64 x86_64 x86_64 GNU/Linux
@spxak1
Copy link
Author

spxak1 commented Dec 1, 2021

Update:

Removing acpi-call-dkms, and rebooting makes the oops stop. No more oops.

Edit: This seems to be the same problem as this. I'm still trying to figure out if the same workaround works. Edit: It is the same issue solved with the same workaround and while we wait for the new version of acpi-call-dkms (>1.2) to appear in the Ubuntu repositories.

@spxak1 spxak1 changed the title 5.15.5 on i7-6820hq (Thinkpad T460p) Oops: 0010 [#1] SMP NOPTI and no shutdown. 5.15.5 on i7-6820hq (Thinkpad T460p) Oops: 0010 [#1] SMP NOPTI, long startup, long reboot, NO powerdown or suspend. Dec 1, 2021
@spxak1 spxak1 changed the title 5.15.5 on i7-6820hq (Thinkpad T460p) Oops: 0010 [#1] SMP NOPTI, long startup, long reboot, NO powerdown or suspend. 5.15.5 on i7-6820hq (Thinkpad T460p) Oops: 0010 [#1] SMP NOPTI, long startup, long reboot, NO powerdown or suspend when Vt-d enabled. Dec 2, 2021
@spxak1
Copy link
Author

spxak1 commented Dec 2, 2021

More testing on the rebooting/booting/poweroff/suspend issue.

I have done a separate installation of 21.10 beta to test what is coming my way soon.
Steps:

  • Enable VT-d in bios
  • Clean install 21.10 beta (.2 iso)
  • 5.15.0 out of the box works fine. That is, booting is as fast as expected, ditto with rebooting. The system powers down properly and suspends and resumes as expected.
  • 5.15.5 installed via apt-get.
  • Same behaviour as with 21.04. From EFI stub to loading intramfs takes 12 seconds, reboot extremely slow to POST (i.e before loading the EFI stub), system will not power off or suspend but rather turn power LED off but keep keyboard LEDs + fans running (to no end).
  • Reboot, disable VT-d.
  • System behaves as expected (as with 5.15.0). All issues resolved.
  • Reboot, enable VT-d.
  • 5.15.6 mainline installed.
  • Same behaviour as 5.15.5. No improvement.
  • Reboot, disable VT-d.
  • System behaves as expected (as with 5.15.0). All issues resolved.
  • Reboot, enable VT-d.
  • Remove 5.15.6 mainline, install 5.15.6 xanmod, reboot to xanmod
  • System behaves as expected, all issues resolved. No need to disable VT-d.

So evidently with the same mainline kernel, xanmod does not cause this issue, while the vanilla mainline does.

I hope this helps.

Thanks.
*

@spxak1 spxak1 changed the title 5.15.5 on i7-6820hq (Thinkpad T460p) Oops: 0010 [#1] SMP NOPTI, long startup, long reboot, NO powerdown or suspend when Vt-d enabled. 5.15.5 on intel 6th gen CPU: long startup, long reboot, NO powerdown or suspend unless Vt-d is disabled. Dec 3, 2021
@fouadzouraibi
Copy link

fouadzouraibi commented Dec 3, 2021

pop!_OS 21.04, after kernel 5.15 update, no suspend nor reboot or shutdown. Disable VT-d in bios seems to fix issue.
i7 6700hq gtx 960M

@BaronKrause
Copy link

I can confirm. Laptop with Intel 6th Gen CPU, same issues after Kernel 5.15.. Resolved after disabling VT-d.

@dmitriinor
Copy link

Same issue here, hotfixed by disabling VT-d in BIOS
Laptop: Thinkpad T460, i5 6300u
OS: Pop_OS! 21.04, kernel 5.15.5

@abrighton
Copy link

Same problem with Lenovo Thinkpad P70, OS: Pop_OS! 21.04, kernel 5.15.5.
Reverting to the previous 5.13 kernel is also a temporary workaround.

@spxak1
Copy link
Author

spxak1 commented Dec 14, 2021

I can confirm this is a problem with Haswell CPU's too. The issue is focused on graphics.

Here's a video of the issue: https://www.youtube.com/watch?v=XprqYp9iMtc

Solution is either to use a kernel older than 5.15.5 (5.15.4 and older works fine). Or to disable VT-d in the bios.

Thank you.

@polhaghverdian
Copy link

I have the same problem. 6th gen Xeon laptop.

Lenovo ThinkPad P50
Intel® Xeon(R) CPU E3-1505M v5 @ 2.80GHz × 8
NVIDIA Corporation GM107GLM [Quadro M2000M] / Quadro M2000M/PCIe/SSE2

@spxak1
Copy link
Author

spxak1 commented Dec 21, 2021

Jut to confirm, 5.15.8 has not fixed the issue.

Also to confirm that Xanmod, Manjaro and Fedora on the same kernel work fine.

@jackpot51 jackpot51 self-assigned this Dec 21, 2021
@jackpot51
Copy link
Member

I diffed the configs for the Arch kernel and the Pop kernel of the same version. It looks like our kernel has CONFIG_INTEL_IOMMU_DEFAULT_ON set to y while it is unset for the Arch kernel. I wonder if booting with intel_iommu=off as a kernel parameter works around the issues. This would mimic the config of the Arch kernel.

@polhaghverdian
Copy link

polhaghverdian commented Dec 21, 2021

@jackpot51 That is basically disabling VT-d if am not mistaken. That is one solution, but not viable for those who need it.

@spxak1
Copy link
Author

spxak1 commented Dec 21, 2021

I diffed the configs for the Arch kernel and the Pop kernel of the same version. It looks like our kernel has CONFIG_INTEL_IOMMU_DEFAULT_ON set to y while it is unset for the Arch kernel. I wonder if booting with intel_iommu=off as a kernel parameter works around the issues. This would mimic the config of the Arch kernel.

@jackpot51 I'm testing it now. In the mean time, please note Xanmod kernel doesn't have that issue in any version of 5.15.

Edit:

OK, adding intel_iommu=off does solve the problem.

I just checked Xanmod's config for 5.15.10 and indeed it has:

# CONFIG_INTEL_IOMMU_DEFAULT_ON is not set

However booting Xanmod and 5.15.8 with intel_iommu=off have different output for sudo dmesg | grep DMAR

This is 5.15.8 with iommu=off

[    0.012792] ACPI: DMAR 0x00000000AFFCD000 0000A8 (v01 LENOVO TP-R07   00002330 PTEC 00000002)
[    0.012822] ACPI: Reserving DMAR table memory at [mem 0xaffcd000-0xaffcd0a7]
[    0.063886] DMAR: IOMMU disabled
[    0.130876] DMAR: Host address width 39
[    0.130877] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[    0.130881] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 7e3ff0505e
[    0.130883] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[    0.130885] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[    0.130887] DMAR: RMRR base: 0x000000af25f000 end: 0x000000af27efff
[    0.130888] DMAR: RMRR base: 0x000000ba000000 end: 0x000000bc7fffff
[    0.130890] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 1
[    0.130891] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[    0.130892] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.132496] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    0.239893] iommu: Default domain type: Translated 
[    0.239893] iommu: DMA domain TLB invalidation policy: lazy mode 

You can clearly see IOMMU disabled (as epxected) on the third line.

The output of the same command on Xanmod (no kernel option to disable it as in 5.15.8) is identical except for that, i.e it appears IOMMU is enabled.

So while the kernel configs are different, Xanmod does not (?) disable IOMMU while with that kernel option for 5.15.8 it is explicitly disabled.

I am no expert in VT-d, but disabling iommu in 5.15.8 does not appear to be the same as whatever config Xanmod has.

But at this point it's a welcome workaround, although I'm not certain of the effects of disabling iommu.

Thank you for the attention.

@moeffju
Copy link

moeffju commented Dec 21, 2021

I have the same issue with a Thinkpad T460. Previous Pop!OS has been working just fine including sleep and resume, after upgrading to 21.10, sleep/resume does not work anymore as described in this thread.

CPU is i5-6300U with Intel HD 520 Graphics.
I've disabled VT-d for now.

@pbui
Copy link

pbui commented Dec 22, 2021

@spxak1 Could you try intel_iommu=igfx_off instead of intel_iommu=off? That would keep iommu on but not for the igpu. That is what I needed to stop the flickering issue (on 8th gen Intel).

I would guess that CONFIG_INTEL_IOMMU_DEFAULT_ON=y is equivalent to intel_iommu=on, while intel_iommu=off is equivalent to CONFIG_INTEL_IOMMU=n... which is not exactly the same as what xanmod and archlinux is doing.

By unsetting CONFIG_INTEL_IOMMU_DEFAULT_ON, it may allow for certain kernel systems to selectively enable the iommu for whatever it believes is appropriate rather than force enabling it by default across the board.

Either way, it seems as though unsetting CONFIG_INTEL_IOMMU_DEFAULT_ON would be a good fix it resolves this issue. Users can always force it on with intel_iommu=on.

BTW, here are some bug reports related to the iommu setting and why archlinux has disabled it:

https://bugs.archlinux.org/task/55629
https://bugs.archlinux.org/task/65362
https://bugzilla.kernel.org/show_bug.cgi?id=197029

@jackpot51 jackpot51 mentioned this issue Dec 22, 2021
@spxak1
Copy link
Author

spxak1 commented Dec 22, 2021

@spxak1 Could you try intel_iommu=igfx_off instead of intel_iommu=off? That would keep iommu on but not for the igpu. That is what I needed to stop the flickering issue (on 8th gen Intel).

I would guess that CONFIG_INTEL_IOMMU_DEFAULT_ON=y is equivalent to intel_iommu=on, while intel_iommu=off is equivalent to CONFIG_INTEL_IOMMU=n... which is not exactly the same as what xanmod and archlinux is doing.

By unsetting CONFIG_INTEL_IOMMU_DEFAULT_ON, it may allow for certain kernel systems to selectively enable the iommu for whatever it believes is appropriate rather than force enabling it by default across the board.

Either way, it seems as though unsetting CONFIG_INTEL_IOMMU_DEFAULT_ON would be a good fix it resolves this issue. Users can always force it on with intel_iommu=on.

BTW, here are some bug reports related to the iommu setting and why archlinux has disabled it:

https://bugs.archlinux.org/task/55629 https://bugs.archlinux.org/task/65362 https://bugzilla.kernel.org/show_bug.cgi?id=197029

Thanks. I'll play with it later when I have time and report back. I need to do some reading on IOMMU. I am not even sure how much for that VT-d it affects, or how much I and my peers actually use in their VM's.

Thanks again.

P.S. I see there is a pull request for 5.15.10 that will settle this. Thanks @jackpot51

@jackpot51 jackpot51 mentioned this issue Dec 22, 2021
jackpot51 pushed a commit that referenced this issue Dec 13, 2022
…ging

commit 1fdbed6 upstream.

The following bug is reported to be triggered when starting X on x86-32
system with i915:

  [  225.777375] kernel BUG at mm/memory.c:2664!
  [  225.777391] invalid opcode: 0000 [#1] PREEMPT SMP
  [  225.777405] CPU: 0 PID: 2402 Comm: Xorg Not tainted 6.1.0-rc3-bdg+ #86
  [  225.777415] Hardware name:  /8I865G775-G, BIOS F1 08/29/2006
  [  225.777421] EIP: __apply_to_page_range+0x24d/0x31c
  [  225.777437] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76
  [  225.777446] EAX: 00000001 EBX: c53a3b58 ECX: b5c00000 EDX: c258aa00
  [  225.777454] ESI: b5c00000 EDI: b5900000 EBP: c4b0fdb4 ESP: c4b0fd80
  [  225.777462] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202
  [  225.777470] CR0: 80050033 CR2: b5900000 CR3: 053a3000 CR4: 000006d0
  [  225.777479] Call Trace:
  [  225.777486]  ? i915_memcpy_init_early+0x63/0x63 [i915]
  [  225.777684]  apply_to_page_range+0x21/0x27
  [  225.777694]  ? i915_memcpy_init_early+0x63/0x63 [i915]
  [  225.777870]  remap_io_mapping+0x49/0x75 [i915]
  [  225.778046]  ? i915_memcpy_init_early+0x63/0x63 [i915]
  [  225.778220]  ? mutex_unlock+0xb/0xd
  [  225.778231]  ? i915_vma_pin_fence+0x6d/0xf7 [i915]
  [  225.778420]  vm_fault_gtt+0x2a9/0x8f1 [i915]
  [  225.778644]  ? lock_is_held_type+0x56/0xe7
  [  225.778655]  ? lock_is_held_type+0x7a/0xe7
  [  225.778663]  ? 0xc1000000
  [  225.778670]  __do_fault+0x21/0x6a
  [  225.778679]  handle_mm_fault+0x708/0xb21
  [  225.778686]  ? mt_find+0x21e/0x5ae
  [  225.778696]  exc_page_fault+0x185/0x705
  [  225.778704]  ? doublefault_shim+0x127/0x127
  [  225.778715]  handle_exception+0x130/0x130
  [  225.778723] EIP: 0xb700468a

Recently pud_huge() got aware of non-present entry by commit 3a194f3
("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present
pud entry") to handle some special states of gigantic page.  However, it's
overlooked that pud_none() always returns false when running with 2-level
paging, and as a result pud_huge() can return true pointlessly.

Introduce "#if CONFIG_PGTABLE_LEVELS > 2" to pud_huge() to deal with this.

Link: https://lkml.kernel.org/r/20221107021010.2449306-1-naoya.horiguchi@linux.dev
Fixes: 3a194f3 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
13r0ck pushed a commit that referenced this issue Feb 9, 2023
commit 031af50 upstream.

The inline assembly for arm64's cmpxchg_double*() implementations use a
+Q constraint to hazard against other accesses to the memory location
being exchanged. However, the pointer passed to the constraint is a
pointer to unsigned long, and thus the hazard only applies to the first
8 bytes of the location.

GCC can take advantage of this, assuming that other portions of the
location are unchanged, leading to a number of potential problems.

This is similar to what we fixed back in commit:

  fee960b ("arm64: xchg: hazard against entire exchange variable")

... but we forgot to adjust cmpxchg_double*() similarly at the same
time.

The same problem applies, as demonstrated with the following test:

| struct big {
|         u64 lo, hi;
| } __aligned(128);
|
| unsigned long foo(struct big *b)
| {
|         u64 hi_old, hi_new;
|
|         hi_old = b->hi;
|         cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78);
|         hi_new = b->hi;
|
|         return hi_old ^ hi_new;
| }

... which GCC 12.1.0 compiles as:

| 0000000000000000 <foo>:
|    0:   d503233f        paciasp
|    4:   aa0003e4        mov     x4, x0
|    8:   1400000e        b       40 <foo+0x40>
|    c:   d2800240        mov     x0, #0x12                       // #18
|   10:   d2800681        mov     x1, #0x34                       // #52
|   14:   aa0003e5        mov     x5, x0
|   18:   aa0103e6        mov     x6, x1
|   1c:   d2800ac2        mov     x2, #0x56                       // #86
|   20:   d2800f03        mov     x3, #0x78                       // #120
|   24:   48207c82        casp    x0, x1, x2, x3, [x4]
|   28:   ca050000        eor     x0, x0, x5
|   2c:   ca060021        eor     x1, x1, x6
|   30:   aa010000        orr     x0, x0, x1
|   34:   d2800000        mov     x0, #0x0                        // #0    <--- BANG
|   38:   d50323bf        autiasp
|   3c:   d65f03c0        ret
|   40:   d2800240        mov     x0, #0x12                       // #18
|   44:   d2800681        mov     x1, #0x34                       // #52
|   48:   d2800ac2        mov     x2, #0x56                       // #86
|   4c:   d2800f03        mov     x3, #0x78                       // #120
|   50:   f9800091        prfm    pstl1strm, [x4]
|   54:   c87f1885        ldxp    x5, x6, [x4]
|   58:   ca0000a5        eor     x5, x5, x0
|   5c:   ca0100c6        eor     x6, x6, x1
|   60:   aa0600a6        orr     x6, x5, x6
|   64:   b5000066        cbnz    x6, 70 <foo+0x70>
|   68:   c8250c82        stxp    w5, x2, x3, [x4]
|   6c:   35ffff45        cbnz    w5, 54 <foo+0x54>
|   70:   d2800000        mov     x0, #0x0                        // #0     <--- BANG
|   74:   d50323bf        autiasp
|   78:   d65f03c0        ret

Notice that at the lines with "BANG" comments, GCC has assumed that the
higher 8 bytes are unchanged by the cmpxchg_double() call, and that
`hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and
LL/SC versions of cmpxchg_double().

This patch fixes the issue by passing a pointer to __uint128_t into the
+Q constraint, ensuring that the compiler hazards against the entire 16
bytes being modified.

With this change, GCC 12.1.0 compiles the above test as:

| 0000000000000000 <foo>:
|    0:   f9400407        ldr     x7, [x0, #8]
|    4:   d503233f        paciasp
|    8:   aa0003e4        mov     x4, x0
|    c:   1400000f        b       48 <foo+0x48>
|   10:   d2800240        mov     x0, #0x12                       // #18
|   14:   d2800681        mov     x1, #0x34                       // #52
|   18:   aa0003e5        mov     x5, x0
|   1c:   aa0103e6        mov     x6, x1
|   20:   d2800ac2        mov     x2, #0x56                       // #86
|   24:   d2800f03        mov     x3, #0x78                       // #120
|   28:   48207c82        casp    x0, x1, x2, x3, [x4]
|   2c:   ca050000        eor     x0, x0, x5
|   30:   ca060021        eor     x1, x1, x6
|   34:   aa010000        orr     x0, x0, x1
|   38:   f9400480        ldr     x0, [x4, #8]
|   3c:   d50323bf        autiasp
|   40:   ca0000e0        eor     x0, x7, x0
|   44:   d65f03c0        ret
|   48:   d2800240        mov     x0, #0x12                       // #18
|   4c:   d2800681        mov     x1, #0x34                       // #52
|   50:   d2800ac2        mov     x2, #0x56                       // #86
|   54:   d2800f03        mov     x3, #0x78                       // #120
|   58:   f9800091        prfm    pstl1strm, [x4]
|   5c:   c87f1885        ldxp    x5, x6, [x4]
|   60:   ca0000a5        eor     x5, x5, x0
|   64:   ca0100c6        eor     x6, x6, x1
|   68:   aa0600a6        orr     x6, x5, x6
|   6c:   b5000066        cbnz    x6, 78 <foo+0x78>
|   70:   c8250c82        stxp    w5, x2, x3, [x4]
|   74:   35ffff45        cbnz    w5, 5c <foo+0x5c>
|   78:   f9400480        ldr     x0, [x4, #8]
|   7c:   d50323bf        autiasp
|   80:   ca0000e0        eor     x0, x7, x0
|   84:   d65f03c0        ret

... sampling the high 8 bytes before and after the cmpxchg, and
performing an EOR, as we'd expect.

For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note
that linux-4.9.y is oldest currently supported stable release, and
mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run
on my machines due to library incompatibilities.

I've also used a standalone test to check that we can use a __uint128_t
pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM
3.9.1.

Fixes: 5284e1b ("arm64: xchg: Implement cmpxchg_double")
Fixes: e9a4b79 ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU")
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/
Reported-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/Y6GXoO4qmH9OIZ5Q@hirez.programming.kicks-ass.net/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230104151626.3262137-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mmstick pushed a commit that referenced this issue Jul 6, 2024
[ Upstream commit 8ecf3c1 ]

Recent additions in BPF like cpu v4 instructions, test_bpf module
exhibits the following failures:

  test_bpf: #82 ALU_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #83 ALU_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #84 ALU64_MOVSX | BPF_B jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #85 ALU64_MOVSX | BPF_H jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)
  test_bpf: #86 ALU64_MOVSX | BPF_W jited:1 ret 2 != 1 (0x2 != 0x1)FAIL (1 times)

  test_bpf: #165 ALU_SDIV_X: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)
  test_bpf: #166 ALU_SDIV_K: -6 / 2 = -3 jited:1 ret 2147483645 != -3 (0x7ffffffd != 0xfffffffd)FAIL (1 times)

  test_bpf: #169 ALU_SMOD_X: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)
  test_bpf: #170 ALU_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: #172 ALU64_SMOD_K: -7 % 2 = -1 jited:1 ret 1 != -1 (0x1 != 0xffffffff)FAIL (1 times)

  test_bpf: #313 BSWAP 16: 0x0123456789abcdef -> 0xefcd
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 301 PASS
  test_bpf: #314 BSWAP 32: 0x0123456789abcdef -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 555 PASS
  test_bpf: #315 BSWAP 64: 0x0123456789abcdef -> 0x67452301
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 268 PASS
  test_bpf: #316 BSWAP 64: 0x0123456789abcdef >> 32 -> 0xefcdab89
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 269 PASS
  test_bpf: #317 BSWAP 16: 0xfedcba9876543210 -> 0x1032
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 460 PASS
  test_bpf: #318 BSWAP 32: 0xfedcba9876543210 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 320 PASS
  test_bpf: #319 BSWAP 64: 0xfedcba9876543210 -> 0x98badcfe
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 222 PASS
  test_bpf: #320 BSWAP 64: 0xfedcba9876543210 >> 32 -> 0x10325476
  eBPF filter opcode 00d7 (@2) unsupported
  jited:0 273 PASS

  test_bpf: torvalds#344 BPF_LDX_MEMSX | BPF_B
  eBPF filter opcode 0091 (@5) unsupported
  jited:0 432 PASS
  test_bpf: torvalds#345 BPF_LDX_MEMSX | BPF_H
  eBPF filter opcode 0089 (@5) unsupported
  jited:0 381 PASS
  test_bpf: torvalds#346 BPF_LDX_MEMSX | BPF_W
  eBPF filter opcode 0081 (@5) unsupported
  jited:0 505 PASS

  test_bpf: torvalds#490 JMP32_JA: Unconditional jump: if (true) return 1
  eBPF filter opcode 0006 (@1) unsupported
  jited:0 261 PASS

  test_bpf: Summary: 1040 PASSED, 10 FAILED, [924/1038 JIT'ed]

Fix them by adding missing processing.

Fixes: daabb2b ("bpf/tests: add tests for cpuv4 instructions")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/91de862dda99d170697eb79ffb478678af7e0b27.1709652689.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants