[LTS 9.2] net: sched: sch_qfq: Fix UAF in qfq_dequeue() #120

pvts-mat · 2025-02-12T02:57:37Z

CVE-2023-4921
VULN-6733

Problem

https://www.cve.org/CVERecord?id=CVE-2023-4921

A use-after-free vulnerability in the Linux kernel's net/sched: sch_qfq component can be exploited to achieve local privilege escalation. When the plug qdisc is used as a class of the qfq qdisc, sending network packets triggers use-after-free in qfq_dequeue() due to the incorrect .peek handler of sch_plug and lack of error checking in agg_dequeue()

Solution

A single commit was identified as a fix for this issue: 8fc134fee27f2263988ae38920bc03da416b03d8

kABI check: passed

python3 /mnt/code/kernel-dist-git/SOURCES/check-kabi \
        -k /mnt/code/kernel-dist-git/SOURCES/Module.kabi_$(uname -m) \
        -s /mnt/build_files/kernel-src-tree-ciqlts9_2-CVE-2023-4921/Module.symvers; echo $?

0

kernel-dist-git state:

Switched to branch 'el-9.2'
Your branch is up to date with 'origin/el-9.2'.

Boot test: passed

See Specific tests for implied boot test passing.

Kselftests: (in progress)

Specific tests: passed

Bug replication

The bug can be replicated with the following commands, as mention in the commit's message:

tc qdisc add dev lo root handle 1: qfq
tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 512
tc qdisc add dev lo parent 1:1 handle 2: plug
tc filter add dev lo parent 1: basic classid 1:1
ping -c1 127.0.0.1

The tests were performed on the referential and patched kernel.

Prerequisites

The tc commands above require the following kernel options to be enabled: CONFIG_NET_SCHED, CONFIG_NET_SCH_INGRESS, CONFIG_NET_SCH_QFQ, CONFIG_NET_SCH_PLUG, CONFIG_NET_CLS_BASIC.

This required change in the default configs/kernel-5.14.0-x86_64.config configuration file for the tested x86_64 platform.

CONFIG_NET_SCHED=y
CONFIG_NET_SCH_INGRESS=m
# CONFIG_NET_SCH_QFQ is not set
# CONFIG_NET_SCH_PLUG is not set
# CONFIG_NET_CLS_BASIC is not set

The following values were used:

CONFIG_NET_SCH_QFQ=y
CONFIG_NET_SCH_PLUG=y
CONFIG_NET_CLS_BASIC=y

Reference `ciqlts9_2` (`26e535ef9bf864de514c667b4f98e75354af4bde` with necessary config changes)

Bug replicated successfully. Kernel crashed and machine automatically rebooted.

[root@ciqlts92 pvts]# tc qdisc add dev lo root handle 1: qfq
tc qdisc add dev lo root handle 1: qfq
[root@ciqlts92 pvts]# tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 512
tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 512
[root@ciqlts92 pvts]# tc qdisc add dev lo parent 1:1 handle 2: plug
tc qdisc add dev lo parent 1:1 handle 2: plug
[root@ciqlts92 pvts]# tc filter add dev lo parent 1: basic classid 1:1
tc filter add dev lo parent 1: basic classid 1:1
[root@ciqlts92 pvts]# ping -c1 127.0.0.1
ping -c1 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
[   34.471432] ------------[ cut here ]------------
[   34.473552] kernel BUG at include/linux/skbuff.h:2420!
[   34.475764] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[   34.477888] CPU: 9 PID: 1566 Comm: ping Kdump: loaded Not tainted 5.14.0-ciqlts9_2 #5
[   34.481156] Hardware name: Red Hat KVM/RHEL, BIOS 1.16.3-2.el9_5.1 04/01/2014
[   34.483633] RIP: 0010:eth_type_trans+0xd3/0x140
[   34.484938] Code: 80 00 00 00 eb c1 8b 47 70 2b 47 74 48 8b 97 c8 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb a5 <0f> 0b b8 00 01 00 00 eb 9c 48 85 ff 74 eb 31 f6 b9 02 00 00 00 48
[   34.490165] RSP: 0018:ffff9d5ac0280e80 EFLAGS: 00010a87
[   34.491652] RAX: 0000000054488239 RBX: ffff8f7fc10e4000 RCX: ffff9d5ac0280f04
[   34.493273] RDX: ffff8f7fc888ee64 RSI: ffff8f7fc10e4000 RDI: ffff8f7fc10a0100
[   34.494690] RBP: ffff8f7fc10a0100 R08: ffff8f7fc47420ac R09: 0000000000000001
[   34.496115] R10: 0000000000000004 R11: ffffffffa1a060c0 R12: 0000000000000000
[   34.497543] R13: 0000000000000000 R14: ffff8f7fc10e4000 R15: ffff8f7fc10a0100
[   34.498970] FS:  00007fead6210b80(0000) GS:ffff8f871fc80000(0000) knlGS:0000000000000000
[   34.500592] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.501746] CR2: 00007ffcaa98e000 CR3: 00000001081ea005 CR4: 0000000000370ee0
[   34.503062] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   34.504154] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   34.505247] Call Trace:
[   34.505637]  <IRQ>
[   34.505963]  loopback_xmit+0xbb/0x120
[   34.506548]  dev_hard_start_xmit+0xc4/0x210
[   34.507206]  sch_direct_xmit+0x9e/0x370
[   34.507801]  __qdisc_run+0x65/0xd0
[   34.508367]  net_tx_action+0x1c0/0x270
[   34.508959]  __do_softirq+0xc7/0x2ac
[   34.509526]  do_softirq+0x63/0x90
[   34.510047]  </IRQ>
[   34.510382]  <TASK>
[   34.510719]  __local_bh_enable_ip+0x5a/0x70
[   34.511382]  ip_finish_output2+0x188/0x420
[   34.512032]  ip_push_pending_frames+0xa2/0xb0
[   34.512744]  ping_v4_sendmsg+0x435/0x800
[   34.513376]  ? find_next_iomem_res+0xb2/0x110
[   34.514067]  ? release_sock+0x40/0x90
[   34.514647]  ? sock_sendmsg+0x5b/0x70
[   34.515226]  sock_sendmsg+0x5b/0x70
[   34.515776]  __sys_sendto+0xf0/0x160
[   34.516357]  ? security_file_ioctl+0x2f/0x50
[   34.517040]  __x64_sys_sendto+0x20/0x30
[   34.517637]  do_syscall_64+0x59/0x90
[   34.518205]  ? handle_mm_fault+0xc5/0x2a0
[   34.518837]  ? do_user_addr_fault+0x1dd/0x6b0
[   34.519536]  ? exc_page_fault+0x62/0x150
[   34.520152]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[   34.520946] RIP: 0033:0x7fead614fb1a
[   34.521517] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[   34.524411] RSP: 002b:00007ffcaa8c9f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[   34.525601] RAX: ffffffffffffffda RBX: 00007ffcaa8cb710 RCX: 00007fead614fb1a
[   34.526705] RDX: 0000000000000040 RSI: 00005574cac78e00 RDI: 0000000000000003
[   34.527827] RBP: 00005574cac78e00 R08: 00007ffcaa8cd98c R09: 0000000000000010
[   34.528939] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[   34.530047] R13: 00007ffcaa8cb688 R14: 00007ffcaa8c9f30 R15: 00007ffcaa8cb710
[   34.531159]  </TASK>
[   34.531513] Modules linked in: rfkill vfat fat intel_rapl_msr intel_rapl_common isst_if_common nfit libnvdimm kvm_intel iTCO_wdt kvm iTCO_vendor_support virtio_gpu irqbypass rapl virtio_dma_buf drm_shmem_helper drm_kms_helper i2c_i801 pcspkr syscopyarea i2c_smbus lpc_ich sysfillrect virtio_balloon sysimgblt fb_sys_fops joydev drm xfs libcrc32c sr_mod cdrom sg ahci libahci crct10dif_pclmul crc32_pclmul virtio_blk crc32c_intel virtio_console virtio_net net_failover ghash_clmulni_intel libata failover virtiofs serio_raw sunrpc dm_mirror dm_region_hash dm_log dm_mod fuse
[    0.000000] Linux version 5.14.0-ciqlts9_2 (pvts@ciqlts92) (gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4), GNU ld version 2.35.2-37.el9) #5 SMP PREEMPT_DYNAMIC Mon Feb 10 22:07:16 UTC 2025
[    0.000000] The list of certified hardware and cloud instances for Enterprise Linux 9 can be viewed at the Red Hat Ecosystem Catalog, https://catalog.redhat.com.
[    0.000000] Command line: elfcorehdr=0x6f000000 BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.14.0-ciqlts9_2 ro console=ttyS0,115200n8 no_timer_check net.ifnames=0  irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 acpi_no_memhotplug transparent_hugepage=never nokaslr hest_disable novmcoredd cma=0 hugetlb_cma=0 disable_cpu_apicid=0 iTCO_wdt.pretimeout=0
...

Full log:
bug-replication–ciqlts9_2.log

Patch

Patch efficacy verified successfully. Repeating the steps doesn't result in a crash and the machine doesn't reboot.

Full log:
bug-replication–ciqlts9_2-CVE-2023-4921.log

A warning can be observed

qfq_dequeue: non-workconserving leaf

issued at sch_qfq.c. The work-conserving queue discipline is a qdisc which never leaves the outbound interface in idle unless the queue is empty, while non-work-conserving qdiscs may delay packets, for example to shape the traffic¹. The plug qdisc, being the leaf in the hierarchical qdiscs configuration used, is non-work-conserving, as it is able to suspend the packet flow. Multiple types of qdiscs are apparently not designed to work with non-work-conserving child qdiscs and issue similar warnings on skb == NULL condition: ets, hfsc, htb, drr. See also the message in b00355db3f88d96810a60011a30cfb2c3469409d

Patrick McHardy <kaber@trash.net> suggested:
> How about making this flag and the warning message (in a out-of-line
> function) globally available? Other qdiscs (f.i. HFSC) can't deal with
> inner non-work-conserving qdiscs as well.

and in 6d25d1dc76bf5943a5c1f4bb74d66d5eac58eb77, which includes qfq to the group above

A helper function for printing non-work-conserving alarms is added in
commit b00355db3f88 ("pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving
 warning handler."). In this commit, use qdisc_warn_nonwc() instead of
WARN_ONCE() to handle the non-work-conserving warning in qfq Qdisc.

The point is: the warning is related to the way the qdisc hierarchy has been defined in the bug replication script and not to the introduced changes.

Footnotes

¹ https://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.qdisc.terminology.html

jira VULN-6733 cve CVE-2023-4921 commit-author valis <sec@valis.email> commit 8fc134f When the plug qdisc is used as a class of the qfq qdisc it could trigger a UAF. This issue can be reproduced with following commands: tc qdisc add dev lo root handle 1: qfq tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 512 tc qdisc add dev lo parent 1:1 handle 2: plug tc filter add dev lo parent 1: basic classid 1:1 ping -c1 127.0.0.1 and boom: [ 285.353793] BUG: KASAN: slab-use-after-free in qfq_dequeue+0xa7/0x7f0 [ 285.354910] Read of size 4 at addr ffff8880bad312a8 by task ping/144 [ 285.355903] [ 285.356165] CPU: 1 PID: 144 Comm: ping Not tainted 6.5.0-rc3+ ctrliq#4 [ 285.357112] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 [ 285.358376] Call Trace: [ 285.358773] <IRQ> [ 285.359109] dump_stack_lvl+0x44/0x60 [ 285.359708] print_address_description.constprop.0+0x2c/0x3c0 [ 285.360611] kasan_report+0x10c/0x120 [ 285.361195] ? qfq_dequeue+0xa7/0x7f0 [ 285.361780] qfq_dequeue+0xa7/0x7f0 [ 285.362342] __qdisc_run+0xf1/0x970 [ 285.362903] net_tx_action+0x28e/0x460 [ 285.363502] __do_softirq+0x11b/0x3de [ 285.364097] do_softirq.part.0+0x72/0x90 [ 285.364721] </IRQ> [ 285.365072] <TASK> [ 285.365422] __local_bh_enable_ip+0x77/0x90 [ 285.366079] __dev_queue_xmit+0x95f/0x1550 [ 285.366732] ? __pfx_csum_and_copy_from_iter+0x10/0x10 [ 285.367526] ? __pfx___dev_queue_xmit+0x10/0x10 [ 285.368259] ? __build_skb_around+0x129/0x190 [ 285.368960] ? ip_generic_getfrag+0x12c/0x170 [ 285.369653] ? __pfx_ip_generic_getfrag+0x10/0x10 [ 285.370390] ? csum_partial+0x8/0x20 [ 285.370961] ? raw_getfrag+0xe5/0x140 [ 285.371559] ip_finish_output2+0x539/0xa40 [ 285.372222] ? __pfx_ip_finish_output2+0x10/0x10 [ 285.372954] ip_output+0x113/0x1e0 [ 285.373512] ? __pfx_ip_output+0x10/0x10 [ 285.374130] ? icmp_out_count+0x49/0x60 [ 285.374739] ? __pfx_ip_finish_output+0x10/0x10 [ 285.375457] ip_push_pending_frames+0xf3/0x100 [ 285.376173] raw_sendmsg+0xef5/0x12d0 [ 285.376760] ? do_syscall_64+0x40/0x90 [ 285.377359] ? __static_call_text_end+0x136578/0x136578 [ 285.378173] ? do_syscall_64+0x40/0x90 [ 285.378772] ? kasan_enable_current+0x11/0x20 [ 285.379469] ? __pfx_raw_sendmsg+0x10/0x10 [ 285.380137] ? __sock_create+0x13e/0x270 [ 285.380673] ? __sys_socket+0xf3/0x180 [ 285.381174] ? __x64_sys_socket+0x3d/0x50 [ 285.381725] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 285.382425] ? __rcu_read_unlock+0x48/0x70 [ 285.382975] ? ip4_datagram_release_cb+0xd8/0x380 [ 285.383608] ? __pfx_ip4_datagram_release_cb+0x10/0x10 [ 285.384295] ? preempt_count_sub+0x14/0xc0 [ 285.384844] ? __list_del_entry_valid+0x76/0x140 [ 285.385467] ? _raw_spin_lock_bh+0x87/0xe0 [ 285.386014] ? __pfx__raw_spin_lock_bh+0x10/0x10 [ 285.386645] ? release_sock+0xa0/0xd0 [ 285.387148] ? preempt_count_sub+0x14/0xc0 [ 285.387712] ? freeze_secondary_cpus+0x348/0x3c0 [ 285.388341] ? aa_sk_perm+0x177/0x390 [ 285.388856] ? __pfx_aa_sk_perm+0x10/0x10 [ 285.389441] ? check_stack_object+0x22/0x70 [ 285.390032] ? inet_send_prepare+0x2f/0x120 [ 285.390603] ? __pfx_inet_sendmsg+0x10/0x10 [ 285.391172] sock_sendmsg+0xcc/0xe0 [ 285.391667] __sys_sendto+0x190/0x230 [ 285.392168] ? __pfx___sys_sendto+0x10/0x10 [ 285.392727] ? kvm_clock_get_cycles+0x14/0x30 [ 285.393328] ? set_normalized_timespec64+0x57/0x70 [ 285.393980] ? _raw_spin_unlock_irq+0x1b/0x40 [ 285.394578] ? __x64_sys_clock_gettime+0x11c/0x160 [ 285.395225] ? __pfx___x64_sys_clock_gettime+0x10/0x10 [ 285.395908] ? _copy_to_user+0x3e/0x60 [ 285.396432] ? exit_to_user_mode_prepare+0x1a/0x120 [ 285.397086] ? syscall_exit_to_user_mode+0x22/0x50 [ 285.397734] ? do_syscall_64+0x71/0x90 [ 285.398258] __x64_sys_sendto+0x74/0x90 [ 285.398786] do_syscall_64+0x64/0x90 [ 285.399273] ? exit_to_user_mode_prepare+0x1a/0x120 [ 285.399949] ? syscall_exit_to_user_mode+0x22/0x50 [ 285.400605] ? do_syscall_64+0x71/0x90 [ 285.401124] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 285.401807] RIP: 0033:0x495726 [ 285.402233] Code: ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 09 [ 285.404683] RSP: 002b:00007ffcc25fb618 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 285.405677] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 0000000000495726 [ 285.406628] RDX: 0000000000000040 RSI: 0000000002518750 RDI: 0000000000000000 [ 285.407565] RBP: 00000000005205ef R08: 00000000005f8838 R09: 000000000000001c [ 285.408523] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000002517634 [ 285.409460] R13: 00007ffcc25fb6f0 R14: 0000000000000003 R15: 0000000000000000 [ 285.410403] </TASK> [ 285.410704] [ 285.410929] Allocated by task 144: [ 285.411402] kasan_save_stack+0x1e/0x40 [ 285.411926] kasan_set_track+0x21/0x30 [ 285.412442] __kasan_slab_alloc+0x55/0x70 [ 285.412973] kmem_cache_alloc_node+0x187/0x3d0 [ 285.413567] __alloc_skb+0x1b4/0x230 [ 285.414060] __ip_append_data+0x17f7/0x1b60 [ 285.414633] ip_append_data+0x97/0xf0 [ 285.415144] raw_sendmsg+0x5a8/0x12d0 [ 285.415640] sock_sendmsg+0xcc/0xe0 [ 285.416117] __sys_sendto+0x190/0x230 [ 285.416626] __x64_sys_sendto+0x74/0x90 [ 285.417145] do_syscall_64+0x64/0x90 [ 285.417624] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 285.418306] [ 285.418531] Freed by task 144: [ 285.418960] kasan_save_stack+0x1e/0x40 [ 285.419469] kasan_set_track+0x21/0x30 [ 285.419988] kasan_save_free_info+0x27/0x40 [ 285.420556] ____kasan_slab_free+0x109/0x1a0 [ 285.421146] kmem_cache_free+0x1c2/0x450 [ 285.421680] __netif_receive_skb_core+0x2ce/0x1870 [ 285.422333] __netif_receive_skb_one_core+0x97/0x140 [ 285.423003] process_backlog+0x100/0x2f0 [ 285.423537] __napi_poll+0x5c/0x2d0 [ 285.424023] net_rx_action+0x2be/0x560 [ 285.424510] __do_softirq+0x11b/0x3de [ 285.425034] [ 285.425254] The buggy address belongs to the object at ffff8880bad31280 [ 285.425254] which belongs to the cache skbuff_head_cache of size 224 [ 285.426993] The buggy address is located 40 bytes inside of [ 285.426993] freed 224-byte region [ffff8880bad31280, ffff8880bad31360) [ 285.428572] [ 285.428798] The buggy address belongs to the physical page: [ 285.429540] page:00000000f4b77674 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbad31 [ 285.430758] flags: 0x100000000000200(slab|node=0|zone=1) [ 285.431447] page_type: 0xffffffff() [ 285.431934] raw: 0100000000000200 ffff88810094a8c0 dead000000000122 0000000000000000 [ 285.432757] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000 [ 285.433562] page dumped because: kasan: bad access detected [ 285.434144] [ 285.434320] Memory state around the buggy address: [ 285.434828] ffff8880bad31180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 285.435580] ffff8880bad31200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 285.436264] >ffff8880bad31280: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 285.436777] ^ [ 285.437106] ffff8880bad31300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ 285.437616] ffff8880bad31380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 285.438126] ================================================================== [ 285.438662] Disabling lock debugging due to kernel taint Fix this by: 1. Changing sch_plug's .peek handler to qdisc_peek_dequeued(), a function compatible with non-work-conserving qdiscs 2. Checking the return value of qdisc_dequeue_peeked() in sch_qfq. Fixes: 462dbc9 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost") Reported-by: valis <sec@valis.email> Signed-off-by: valis <sec@valis.email> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20230901162237.11525-1-jhs@mojatatu.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> (cherry picked from commit 8fc134f) Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>

PlaidCat · 2025-02-12T21:29:01Z

I didn't catch this when assigning work but the CVE is not impacted on Rocky9 / RHEL9
The code is contained in CONFIG Flags that are not built by the default kernel
https://access.redhat.com/security/cve/CVE-2023-4921

JIRA: https://issues.redhat.com/browse/RHEL-68940 commit 031af50 Author: Mark Rutland <mark.rutland@arm.com> Date: Wed, 4 Jan 2023 15:16:26 +0000 arm64: cmpxchg_double*: hazard against entire exchange variable The inline assembly for arm64's cmpxchg_double*() implementations use a +Q constraint to hazard against other accesses to the memory location being exchanged. However, the pointer passed to the constraint is a pointer to unsigned long, and thus the hazard only applies to the first 8 bytes of the location. GCC can take advantage of this, assuming that other portions of the location are unchanged, leading to a number of potential problems. This is similar to what we fixed back in commit: fee960b ("arm64: xchg: hazard against entire exchange variable") ... but we forgot to adjust cmpxchg_double*() similarly at the same time. The same problem applies, as demonstrated with the following test: | struct big { | u64 lo, hi; | } __aligned(128); | | unsigned long foo(struct big *b) | { | u64 hi_old, hi_new; | | hi_old = b->hi; | cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78); | hi_new = b->hi; | | return hi_old ^ hi_new; | } ... which GCC 12.1.0 compiles as: | 0000000000000000 <foo>: | 0: d503233f paciasp | 4: aa0003e4 mov x4, x0 | 8: 1400000e b 40 <foo+0x40> | c: d2800240 mov x0, #0x12 // #18 | 10: d2800681 mov x1, #0x34 // #52 | 14: aa0003e5 mov x5, x0 | 18: aa0103e6 mov x6, x1 | 1c: d2800ac2 mov x2, #0x56 // #86 | 20: d2800f03 mov x3, #0x78 // #120 | 24: 48207c82 casp x0, x1, x2, x3, [x4] | 28: ca050000 eor x0, x0, x5 | 2c: ca060021 eor x1, x1, x6 | 30: aa010000 orr x0, x0, x1 | 34: d2800000 mov x0, #0x0 // #0 <--- BANG | 38: d50323bf autiasp | 3c: d65f03c0 ret | 40: d2800240 mov x0, #0x12 // #18 | 44: d2800681 mov x1, #0x34 // #52 | 48: d2800ac2 mov x2, #0x56 // #86 | 4c: d2800f03 mov x3, #0x78 // #120 | 50: f9800091 prfm pstl1strm, [x4] | 54: c87f1885 ldxp x5, x6, [x4] | 58: ca0000a5 eor x5, x5, x0 | 5c: ca0100c6 eor x6, x6, x1 | 60: aa0600a6 orr x6, x5, x6 | 64: b5000066 cbnz x6, 70 <foo+0x70> | 68: c8250c82 stxp w5, x2, x3, [x4] | 6c: 35ffff45 cbnz w5, 54 <foo+0x54> | 70: d2800000 mov x0, #0x0 // #0 <--- BANG | 74: d50323bf autiasp | 78: d65f03c0 ret Notice that at the lines with "BANG" comments, GCC has assumed that the higher 8 bytes are unchanged by the cmpxchg_double() call, and that `hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and LL/SC versions of cmpxchg_double(). This patch fixes the issue by passing a pointer to __uint128_t into the +Q constraint, ensuring that the compiler hazards against the entire 16 bytes being modified. With this change, GCC 12.1.0 compiles the above test as: | 0000000000000000 <foo>: | 0: f9400407 ldr x7, [x0, #8] | 4: d503233f paciasp | 8: aa0003e4 mov x4, x0 | c: 1400000f b 48 <foo+0x48> | 10: d2800240 mov x0, #0x12 // #18 | 14: d2800681 mov x1, #0x34 // #52 | 18: aa0003e5 mov x5, x0 | 1c: aa0103e6 mov x6, x1 | 20: d2800ac2 mov x2, #0x56 // #86 | 24: d2800f03 mov x3, #0x78 // #120 | 28: 48207c82 casp x0, x1, x2, x3, [x4] | 2c: ca050000 eor x0, x0, x5 | 30: ca060021 eor x1, x1, x6 | 34: aa010000 orr x0, x0, x1 | 38: f9400480 ldr x0, [x4, #8] | 3c: d50323bf autiasp | 40: ca0000e0 eor x0, x7, x0 | 44: d65f03c0 ret | 48: d2800240 mov x0, #0x12 // #18 | 4c: d2800681 mov x1, #0x34 // #52 | 50: d2800ac2 mov x2, #0x56 // #86 | 54: d2800f03 mov x3, #0x78 // #120 | 58: f9800091 prfm pstl1strm, [x4] | 5c: c87f1885 ldxp x5, x6, [x4] | 60: ca0000a5 eor x5, x5, x0 | 64: ca0100c6 eor x6, x6, x1 | 68: aa0600a6 orr x6, x5, x6 | 6c: b5000066 cbnz x6, 78 <foo+0x78> | 70: c8250c82 stxp w5, x2, x3, [x4] | 74: 35ffff45 cbnz w5, 5c <foo+0x5c> | 78: f9400480 ldr x0, [x4, #8] | 7c: d50323bf autiasp | 80: ca0000e0 eor x0, x7, x0 | 84: d65f03c0 ret ... sampling the high 8 bytes before and after the cmpxchg, and performing an EOR, as we'd expect. For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note that linux-4.9.y is oldest currently supported stable release, and mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run on my machines due to library incompatibilities. I've also used a standalone test to check that we can use a __uint128_t pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM 3.9.1. Fixes: 5284e1b ("arm64: xchg: Implement cmpxchg_double") Fixes: e9a4b79 ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU") Reported-by: Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/ Reported-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/Y6GXoO4qmH9OIZ5Q@hirez.programming.kicks-ass.net/ Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: stable@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230104151626.3262137-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com>

pvts-mat marked this pull request as draft February 12, 2025 03:02

pvts-mat marked this pull request as ready for review February 12, 2025 21:21

PlaidCat closed this Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LTS 9.2] net: sched: sch_qfq: Fix UAF in qfq_dequeue() #120

[LTS 9.2] net: sched: sch_qfq: Fix UAF in qfq_dequeue() #120

Uh oh!

pvts-mat commented Feb 12, 2025

Uh oh!

PlaidCat commented Feb 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

[LTS 9.2] net: sched: sch_qfq: Fix UAF in qfq_dequeue() #120

[LTS 9.2] net: sched: sch_qfq: Fix UAF in qfq_dequeue() #120

Uh oh!

Conversation

pvts-mat commented Feb 12, 2025

Problem

Solution

kABI check: passed

Boot test: passed

Kselftests: (in progress)

Specific tests: passed

Bug replication

Prerequisites

Reference ciqlts9_2 (26e535ef9bf864de514c667b4f98e75354af4bde with necessary config changes)

Patch

Footnotes

Uh oh!

PlaidCat commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reference `ciqlts9_2` (`26e535ef9bf864de514c667b4f98e75354af4bde` with necessary config changes)

PlaidCat commented Feb 12, 2025 •

edited

Loading