-
Notifications
You must be signed in to change notification settings - Fork 12
[LTS 9.4] net/sched: flower: Fix chain template offload #559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
include/net/sch_generic.h
Outdated
|
||
RH_KABI_EXTEND(void (*tmplt_reoffload)(struct tcf_chain *chain, | ||
bool add, | ||
flow_setup_cb_t *cb, | ||
void *cb_priv)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding new fields to struct tcf_proto_ops
is not kABI-safe because this struct is allocated from within drivers themselves.
Example: net/sched/cls_fw.c:423:static struct tcf_proto_ops cls_fw_ops __read_mostly = {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but aren't these drivers bundled with the kernel? The way I understand the kABI issue the binary incompatibility can arise if a driver is developed out-of-tree. Is anyone implementing a custom traffic class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just realized you address this above:
Considering that the tcf_proto_ops struct is allocated statically, but in a highly controlled and limited fashion, closely related to the definitions of kernel-native traffic classes and unlikely to ever be replicated by an out-of-tree code, and that it's deeply buried, exclusively through pointers, in the substructure of the Qdisc struct that the users actually come to contact with in the whitelisted functions, it was determined that modifying it won't cause binary compatibility problems.
While I don't disagree, we have no way of proving this. If this assumption is wrong, there will be a clear out-of-bounds access and potentially exploitable indirect branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is anyone implementing a custom traffic class?
I wouldn't be surprised. There are plenty of experimental Linux networking modules out there that exist out-of-tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like RH just decided to break the kABI with this fix, taking advantage of the 9.4 to 9.5 transition to do so. So we can't look to RH for guidance on this one it seems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let me dig a bit more, not arguing for the patch in the current form, just wanted to better understand this topic - how about this part:
if tcf_proto_ops was ever used in such way then the user would request to put the tcf_proto_ops symbol on the whitelist directly, yet it's missing.
Is it wrong to assume that someone implementing a custom traffic class would ask for tcf_proto_ops to be whitelisted explicitly, as part of driver's API? I'm assuming here we only care about the whitelisted symbols and not trying to keep ABI with everyone.
Oh! Good point. Since register_tcf_proto_ops
isn't in the kABI stablelist, and it is the only way a a tcf_proto_ops pointer can be reached through struct Qdisc
, there's no kABI breakage in the strictest sense.
I think that adding the new member onto the end of struct tcf_proto_ops
could be misleading and make it seem like struct tcf_proto_ops
needs kABI stability. Instead, I think the new member should be added to the same place as in the upstream patch, and then use RH_KABI_EXCLUDE
to appease check-kabi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the KABI checker is mad with this :\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the KABI checker is mad with this :\
It seems like - from the descriptions of RH_* macros alone - the RH_KABI_BROKEN_INSERT
would be most appropriate
* RH_KABI_BROKEN_INSERT
* RH_KABI_BROKEN_REMOVE
* Insert a field to the middle of a struct / delete a field from a struct.
* Note that this breaks kABI! It can be done only when it's certain that
* no 3rd party driver can validly reach into the struct. A typical
* example is a struct that is: both (a) referenced only through a long
* chain of pointers from another struct that is part of a whitelisted
* symbol and (b) kernel internal only, it should have never been visible
* to genksyms in the first place.
I just tested it and check-kabi
is cool with it. (I actually checked the RH_KABI_EXCLUDE
before commiting too - turns out I checked it for another CVE by mistake, sorry)
This doesn't answer the question of why RH_KABI_EXCLUDE
didn't work while it should though…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would have worked if we stuck the inserted code at the end like you originally did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically shouldn't be different since those two macros are defined the same, which is to make the code disappear when the genksyms pass looks at the header.
b0f5ac1
to
1d1f18a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
jira VULN-8198 cve CVE-2024-26669 commit-author Ido Schimmel <idosch@nvidia.com> commit 32f2a0a upstream-diff | Adding `tmplt_reoffload' field to the `tcf_proto_ops' struct breaks kABI for the whitelisted symbols: - flow_block_cb_alloc - flow_block_cb_free - flow_block_cb_lookup - flow_block_cb_setup_simple - flow_indr_block_cb_alloc - flow_indr_dev_register - flow_indr_dev_unregister - qdisc_reset Added it with the `RH_KABI_BROKEN_INSERT' tag anyway because `tcf_proto_ops' was not put on the whitelist directly and changing it affects kABI only through the `Qdisc' struct used as parameter in the functions listed above. Since `register_tcf_proto_ops' isn't in the kABI stablelist, and it is the only way a `tcf_proto_ops' pointer can be reached through struct `Qdisc', there's no kABI breakage in the strictest sense. When a qdisc is deleted from a net device the stack instructs the underlying driver to remove its flow offload callback from the associated filter block using the 'FLOW_BLOCK_UNBIND' command. The stack then continues to replay the removal of the filters in the block for this driver by iterating over the chains in the block and invoking the 'reoffload' operation of the classifier being used. In turn, the classifier in its 'reoffload' operation prepares and emits a 'FLOW_CLS_DESTROY' command for each filter. However, the stack does not do the same for chain templates and the underlying driver never receives a 'FLOW_CLS_TMPLT_DESTROY' command when a qdisc is deleted. This results in a memory leak [1] which can be reproduced using [2]. Fix by introducing a 'tmplt_reoffload' operation and have the stack invoke it with the appropriate arguments as part of the replay. Implement the operation in the sole classifier that supports chain templates (flower) by emitting the 'FLOW_CLS_TMPLT_{CREATE,DESTROY}' command based on whether a flow offload callback is being bound to a filter block or being unbound from one. As far as I can tell, the issue happens since cited commit which reordered tcf_block_offload_unbind() before tcf_block_flush_all_chains() in __tcf_block_put(). The order cannot be reversed as the filter block is expected to be freed after flushing all the chains. [1] unreferenced object 0xffff888107e28800 (size 2048): comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s) hex dump (first 32 bytes): b1 a6 7c 11 81 88 ff ff e0 5b b3 10 81 88 ff ff ..|......[...... 01 00 00 00 00 00 00 00 e0 aa b0 84 ff ff ff ff ................ backtrace: [<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320 [<ffffffff81ab374e>] __kmalloc+0x4e/0x90 [<ffffffff832aec6d>] mlxsw_sp_acl_ruleset_get+0x34d/0x7a0 [<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180 [<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280 [<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340 [<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0 [<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170 [<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0 [<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440 [<ffffffff83ac6270>] netlink_unicast+0x540/0x820 [<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0 [<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80 [<ffffffff8379d29a>] ___sys_sendmsg+0x13a/0x1e0 [<ffffffff8379d50c>] __sys_sendmsg+0x11c/0x1f0 [<ffffffff843b9ce0>] do_syscall_64+0x40/0xe0 unreferenced object 0xffff88816d2c0400 (size 1024): comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s) hex dump (first 32 bytes): 40 00 00 00 00 00 00 00 57 f6 38 be 00 00 00 00 @.......W.8..... 10 04 2c 6d 81 88 ff ff 10 04 2c 6d 81 88 ff ff ..,m......,m.... backtrace: [<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320 [<ffffffff81ab36c1>] __kmalloc_node+0x51/0x90 [<ffffffff81a8ed96>] kvmalloc_node+0xa6/0x1f0 [<ffffffff82827d03>] bucket_table_alloc.isra.0+0x83/0x460 [<ffffffff82828d2b>] rhashtable_init+0x43b/0x7c0 [<ffffffff832aed48>] mlxsw_sp_acl_ruleset_get+0x428/0x7a0 [<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180 [<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280 [<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340 [<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0 [<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170 [<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0 [<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440 [<ffffffff83ac6270>] netlink_unicast+0x540/0x820 [<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0 [<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80 [2] # tc qdisc add dev swp1 clsact # tc chain add dev swp1 ingress proto ip chain 1 flower dst_ip 0.0.0.0/32 # tc qdisc del dev swp1 clsact # devlink dev reload pci/0000:06:00.0 Fixes: bbf7383 ("net: sched: traverse chains in block with tcf_get_next_chain()") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 32f2a0a) Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>
1d1f18a
to
86113dc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So many kABI macros that all do the same thing @_@
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[LTS 9.4]
CVE-2024-26669
VULN-8198
Problem
https://web.git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=32f2a0afa95fae0d1ceec2ff06e0e816939964b8
Affected: yes
The bug-affected (or rather fix-modified) files are:
net/sched/cls_flower.c
net/sched/cls_api.c
include/net/sch_generic.h
The "flower" traffic class implemented by
net/sched/cls_flower.c
is inabled withCONFIG_NET_CLS_FLOWER
option, set tom
in all LTS 9.4 configs:Naturally, the
CONFIG_NET_CLS
option guarding thenet/sched/cls_api.c
file is enabled as well.The mainline fix 32f2a0a identifies bbf7383 as the culprit - it's present in
ciqlts9_4
history natively.Solution
The mainline fix cherry-picks cleanly but breaks the kABI. From the
check-kabi
's message:The breakage is caused by the introduction of
tmplt_reoffload
field to thetcf_proto_ops
struct:32f2a0a#diff-acd1ebb1376db4fecebf48e5007639bb90d4ae3e36fe459c2d8bb282fcd218f2R378-R381
The field was added nonetheless, although moved to the end of the struct and wrapped in the
RH_KABI_EXTEND
macro, which is the sole diff from the mainline fix. The rest of this section argues that it was safe to do so.Modified-whitelisted symbols relation
Using the same method as in #475 the relation between the affected whitelisted symbols and the modified one can be established: kabi-break-chain.log. It can be summarized in a hierarchical list as:
[struct] tcf_proto_ops
→[struct] tcf_chain
→[struct] tcf_block
→[struct] Qdisc_class_ops
→[struct] Qdisc_ops
→[struct] Qdisc
[func] qdisc_reset
[func] flow_indr_block_cb_alloc
[typedef] flow_indr_block_bind_cb_t
[func] flow_indr_dev_unregister
[func] flow_indr_dev_register
[struct] flow_block_offload
→[func] flow_block_cb_setup_simple
[struct] flow_block_indr
→[struct] flow_block_cb
[func] flow_block_cb_lookup
[func] flow_block_cb_free
[func] flow_block_cb_alloc
How to read this summary:
Notation "
X
→Y
" means "X
is used in the definition ofY
".The same meaning has the child-parent relation, that is
X
Y
is the same as "
X
→Y
".At the root of the hierarchy is the modified symbol
tcf_proto_ops
.The leaves are the whitelisted symbols.
For example, the
tcf_proto_ops
influences the definition offlow_block_cb_lookup
by the chain of[struct] tcf_proto_ops
→[struct] tcf_chain
→[struct] tcf_block
→[struct] Qdisc_class_ops
→[struct] Qdisc_ops
→[struct] Qdisc
→[struct] flow_block_indr
→[struct] flow_block_cb
→[func] flow_block_cb_lookup
.Here's an initial chain between
[struct] tcf_proto_ops
and[struct] Qdisc
common for all whitelisted symbols:[struct] tcf_proto_ops
:kernel-src-tree/include/net/sch_generic.h
Line 341 in 32c89d9
[struct] tcf_chain
:kernel-src-tree/include/net/sch_generic.h
Line 455 in 32c89d9
[struct] tcf_block
:kernel-src-tree/include/net/sch_generic.h
Line 479 in 32c89d9
[struct] Qdisc_class_ops
:kernel-src-tree/include/net/sch_generic.h
Lines 268 to 270 in 32c89d9
[struct] Qdisc_ops
:kernel-src-tree/include/net/sch_generic.h
Line 291 in 32c89d9
[struct] Qdisc
:kernel-src-tree/include/net/sch_generic.h
Line 98 in 32c89d9
Usage of the
tcf_proto_ops
structAlthough analyzing the symbol's usage within the patched kernel itself doesn't make sense kABI-preservation-wise (kABI will always be preserved by the virtue of a single cohesive build process) it paints the picture of how this symbol can be expected to be used in some downstream custom kernel branch.
Below is the complete list of
tcf_proto_ops
usages inciqlts9_4
, based on gtags, divided by the category of useAs a function argument
kernel-src-tree/include/net/pkt_cls.h
Line 25 in 32c89d9
kernel-src-tree/include/net/pkt_cls.h
Line 26 in 32c89d9
kernel-src-tree/net/sched/cls_api.c
Line 280 in 32c89d9
kernel-src-tree/net/sched/cls_api.c
Line 300 in 32c89d9
kernel-src-tree/net/sched/cls_api.c
Line 648 in 32c89d9
kernel-src-tree/net/sched/cls_api.c
Line 650 in 32c89d9
kernel-src-tree/net/sched/cls_api.c
Line 2822 in 32c89d9
kernel-src-tree/net/sched/cls_api.c
Line 2908 in 32c89d9
kernel-src-tree/net/sched/cls_api.c
Line 2969 in 32c89d9
As a local variable
kernel-src-tree/net/sched/cls_api.c
Line 232 in 32c89d9
kernel-src-tree/net/sched/cls_api.c
Line 252 in 32c89d9
kernel-src-tree/net/sched/cls_api.c
Line 282 in 32c89d9
kernel-src-tree/net/sched/cls_api.c
Line 302 in 32c89d9
kernel-src-tree/net/sched/cls_api.c
Line 354 in 32c89d9
kernel-src-tree/net/sched/cls_api.c
Line 659 in 32c89d9
kernel-src-tree/net/sched/cls_api.c
Line 2830 in 32c89d9
kernel-src-tree/net/sched/cls_api.c
Line 2937 in 32c89d9
As a struct field
kernel-src-tree/include/net/sch_generic.h
Line 420 in 32c89d9
kernel-src-tree/include/net/sch_generic.h
Line 455 in 32c89d9
As a return value
kernel-src-tree/net/sched/cls_api.c
Line 230 in 32c89d9
kernel-src-tree/net/sched/cls_api.c
Line 248 in 32c89d9
As a file-scope static variable
kernel-src-tree/net/sched/cls_basic.c
Line 318 in 32c89d9
kernel-src-tree/net/sched/cls_bpf.c
Line 682 in 32c89d9
kernel-src-tree/net/sched/cls_cgroup.c
Line 200 in 32c89d9
kernel-src-tree/net/sched/cls_flow.c
Line 693 in 32c89d9
kernel-src-tree/net/sched/cls_flower.c
Line 3580 in 32c89d9
kernel-src-tree/net/sched/cls_fw.c
Line 423 in 32c89d9
kernel-src-tree/net/sched/cls_matchall.c
Line 387 in 32c89d9
kernel-src-tree/net/sched/cls_route.c
Line 656 in 32c89d9
kernel-src-tree/net/sched/cls_u32.c
Line 1442 in 32c89d9
Observations and conclusion
Whenever
tcf_proto_ops
is passed as a function argument it's always through a pointer. This ensures preservation of stack addresses.Whenever
tcf_proto_ops
is used as a local variable it's always by a pointer. This ensures preservation of stack addresses.Whenever
tcf_proto_ops
is used as a struct field it's always through a pointer. This allows the containing struct to be used without any kABI-related restraints.Whenever
tcf_proto_ops
is returned from a function it's always through a pointer. In fact this use case showcases how atcf_proto_ops
object can be obtained in general - through the use oftcf_proto_lookup_ops(…)
function exclusively. This definestcf_proto_ops
"creation" API excluding automatic allocation on the stack. However, it does internally depend on static allocation, which is addressed in the last point.The
tcf_proto_ops
struct is used as a static variable in each and every one of the traffic classes modules. This usage defines all allocations oftcf_proto_ops
struct. Theoretically it's possible that a user defined his own custom network traffic classmyprecious
in a way analogous to those defined in the mainline kernelu32
,route
, etc., likeThis binary module could then be incompatible with the patched kernel. However, this scenario is highly unlikely, and if
tcf_proto_ops
was ever used in such way then the user would request to put thetcf_proto_ops
symbol on the whitelist directly, yet it's missing.Considering that the
tcf_proto_ops
struct is allocated statically, but in a highly controlled and limited fashion, closely related to the definitions of kernel-native traffic classes and unlikely to ever be replicated by an out-of-tree code, and that it's deeply buried, exclusively through pointers, in the substructure of theQdisc
struct that the users actually come to contact with in the whitelisted functions, it was determined that modifying it won't cause binary compatibility problems.kABI check: passed
Boot test: passed
boot-test.log
Kselftests: passed relative
Coverage
Only the net-specific tests were run (collections
net
,net/forwarding
,net/mptcp
,netfilter
)Reference
kselftests–ciqlts9_4–run1.log
Patch
kselftests–ciqlts9_4-CVE-2024-26669–run1.log
kselftests–ciqlts9_4-CVE-2024-26669–run2.log
Comparison
The tests results for the reference and patched kernel are the same.
Specific tests: skipped