Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP listen control #15

Open
l0kod opened this issue Jan 19, 2024 · 22 comments
Open

TCP listen control #15

l0kod opened this issue Jan 19, 2024 · 22 comments
Assignees
Labels
enhancement New feature or request

Comments

@l0kod
Copy link
Member

l0kod commented Jan 19, 2024

LANDLOCK_ACCESS_NET_BIND_TCP is useful to limit the scope of "bindable" ports to forbid a malicious sandboxed process to impersonate a legitimate server process. However, bind(2) might be used by (TCP) clients to set the source port to a (legitimate) value. This use case is an issue because we cannot allow a whole range of ports (e.g., >= 1024).

A LANDLOCK_ACCESS_NET_LISTEN_TCP would make more sense to control incoming connections to a specific port.

Being able to restrict listen(2) may lead to a cover channel where the sandboxed process can infer some properties (e.g. port) from the socket file descriptior (e.g. if it was passed from another process). This doesn't seem to be a security issue though.

Controlling accept(2) might not be worth it because:

  • performance impact (i.e. evaluated for each new client connection);
  • it would make more sense to be able to control access according to the client address/port but
  • sandboxing is to protect the outside of the sandbox from the inside.

See thread: https://lore.kernel.org/all/b4440d19-93b9-e234-007b-4fc4f987550b@digikod.net/

Related to #6

Cc @BoardzMaster

v1: https://lore.kernel.org/all/20240728002602.3198398-1-ivanov.mikhail1@huawei-partners.com/
v0: https://lore.kernel.org/all/20240408094747.1761850-1-ivanov.mikhail1@huawei-partners.com/

This was referenced Jan 19, 2024
@l0kod l0kod changed the title Listen TCP control TCP listen control Jan 19, 2024
@l0kod l0kod added the enhancement New feature or request label Jan 19, 2024
@l0kod l0kod added this to Landlock Feb 1, 2024
@l0kod l0kod moved this to In Progress in Landlock Feb 1, 2024
@sm1ling-knight
Copy link

sm1ling-knight commented Feb 21, 2024

The problem is that (TCP) client is unable to use ports that are restricted for server, right?

@l0kod
Copy link
Member Author

l0kod commented Feb 21, 2024

The problem is that (TCP) client is unable to use ports that are restricted for server, right?

Yes, a client would not be allowed to bind to a specific port (for a newly initiated outgoing connection) if its domain handles LANDLOCK_ACCESS_NET_BIND_TCP but no rule explicitly allows binding on this port. This should not be an issue for most use cases, but some systems fine tune their TCP connections.

@sm1ling-knight
Copy link

The problem is that (TCP) client is unable to use ports that are restricted for server, right?

Yes, a client would not be allowed to bind to a specific port (for a newly initiated outgoing connection) if its domain handles LANDLOCK_ACCESS_NET_BIND_TCP but no rule explicitly allows binding on this port. This should not be an issue for most use cases, but some systems fine tune their TCP connections.

For example, a use case where some kind of server process forks the client and the client 's available ports are limited due to the parent ruleset, right?

@l0kod
Copy link
Member Author

l0kod commented Feb 21, 2024

For example, a use case where some kind of server process forks the client and the client 's available ports are limited due to the parent ruleset, right?

If a new process is spawned per client connection, then the new process doesn't need any specific right to reply on the opened socket/file descriptor. Indeed, in this case, there is no call to connect(2) nor bind(2), only send/write and recv/read.

A use case where a client process does an explicit bind(2) and then a connect(2) on the same socket can happen on high-volume networks because the kernel may take too much time to find an available port. In this case a specific algorithm can be used by user space to efficiently deal with port allocation. This should probably not be used on a public network for security reasons though.

Another use case is to use a specific network interface or IP address.

@sm1ling-knight
Copy link

A use case where a client process does an explicit bind(2) and then a connect(2) on the same socket can happen on high-volume networks because the kernel may take too much time to find an available port. In this case a specific algorithm can be used by user space to efficiently deal with port allocation. This should probably not be used on a public network for security reasons though.

But why in this case client's domain might need to handle LANDLOCK_ACCESS_NET_BIND_TCP?

@l0kod
Copy link
Member Author

l0kod commented Feb 22, 2024

A use case where a client process does an explicit bind(2) and then a connect(2) on the same socket can happen on high-volume networks because the kernel may take too much time to find an available port. In this case a specific algorithm can be used by user space to efficiently deal with port allocation. This should probably not be used on a public network for security reasons though.

But why in this case client's domain might need to handle LANDLOCK_ACCESS_NET_BIND_TCP?

They shouldn't unless we can specify a port range (#16). What would make more sense is to use LANDLOCK_ACCESS_NET_LISTEN_TCP instead.

@l0kod
Copy link
Member Author

l0kod commented Mar 6, 2024

@BoardzMaster, do you still want to work on this or should we let this open for others?

@BoardzMaster
Copy link

Hi @l0kod. @sm1ling-knight he is my colleague and he works on this now. I will give him a hand.

@sm1ling-knight
Copy link

listen(2) can be called without any previous bind(2) call, so a malicious sandboxed process can accidentally impersonate some legitimate server process (if listen(2) assigns it a server port number). LANDLOCK_ACCESS_NET_LISTEN_TCP can also be useful to prevent such vulnerability.

@l0kod
Copy link
Member Author

l0kod commented Mar 14, 2024

I guess you meant that when listen(2) is called without explicit bind(2) call, the kernel automatically binds the socket to an ephemeral port. So this could indeed be an issue. Thanks for bringing that up.

I wrote a test and this indeed works, whereas it should not. Could you please send a patch to the LSM mailing list (with me in CC), to explain the issue and propose a fix for Landlock. I guess this should mainly add a socket_listen() hook implementation and check if binding on port 0 is allowed.

@sm1ling-knight
Copy link

I wrote a test and this indeed works, whereas it should not. Could you please send a patch to the LSM mailing list (with me in CC), to explain the issue and propose a fix for Landlock.

Ok, i'll do it.

I guess this should mainly add a socket_listen() hook implementation

I thought about adding another security plug function that would be called by the ip layer at the end of the listen() handler execution.

and check if binding on port 0 is allowed.

What do you mean by that? Binding on 0 port is also seems to be an issue, maybe we should add one more hook for LANDLOCK_ACCESS_NET_LISTEN_TCP to prevent automatic binding to a forbidden (by Landlock) port.

@l0kod
Copy link
Member Author

l0kod commented Mar 14, 2024

I guess this should mainly add a socket_listen() hook implementation

I thought about adding another security plug function that would be called by the ip layer at the end of the listen() handler execution.

Well, it might be seen as an LSM issue, so yes, it would be nice to add security_socket_bind()-like calls where appropriate.

and check if binding on port 0 is allowed.

What do you mean by that? Binding on 0 port is also seems to be an issue, maybe we should add one more hook for LANDLOCK_ACCESS_NET_LISTEN_TCP to prevent automatic binding to a forbidden (by Landlock) port.

Binding on port 0 translates to binding to an ephemeral port, see the port_specific.bind_connect_zero test. That could help for the new hook call.

This should be a fix of LANDLOCK_ACCESS_NET_BIND_TCP, so no LANDLOCK_ACCESS_NET_LISTEN_TCP for now.

@sm1ling-knight
Copy link

So, we have to make following things for managing actions with ephemeral ports, right?

case to-do
1. binding on 0 port fix for the LANDLOCK_ACCESS_NET_BIND_TCP
2. call to listen(2) without explicit bind(2) call detail of the LANDLOCK_ACCESS_NET_LISTEN_TCP implementation

@l0kod
Copy link
Member Author

l0kod commented Mar 14, 2024

Explicit binding on port 0 is OK. It was discussed with @BoardzMaster, it reflects the bind(2) behavior, and it can be controlled with a LANDLOCK_ACCESS_NET_BIND_TCP rule on port 0.

However, binding on an ephemeral port with listen(2) (i.e. same as bind(2) on port 0) should only be allowed if there is a rule with LANDLOCK_ACCESS_NET_BIND_TCP on port 0 (or if LANDLOCK_ACCESS_NET_BIND_TCP is not handled).

The LANDLOCK_ACCESS_NET_LISTEN_TCP right should be complementary to LANDLOCK_ACCESS_NET_BIND_TCP. A sandboxed process should first be allowed to bind to a port (if it is restricted with LANDLOCK_ACCESS_NET_BIND_TCP), and then this same process should only be allowed to listen to the same port if LANDLOCK_ACCESS_NET_LISTEN_TCP is handled and if a LANDLOCK_ACCESS_NET_LISTEN_TCP rule exists for such port.
To say it another way, if a sandbox handles LANDLOCK_ACCESS_NET_LISTEN_TCP but not LANDLOCK_ACCESS_NET_BIND_TCP, then a rule on port 0 for LANDLOCK_ACCESS_NET_LISTEN_TCP would be required to successfully call listen(2) without explicit binding.

@sm1ling-knight
Copy link

Thanks, got it)

@sm1ling-knight
Copy link

Hello, @l0kod ! Here you added this check in net.c:

		/*
		 * For compatibility reason, accept AF_UNSPEC for bind
		 * accesses (mapped to AF_INET) only if the address is
		 * INADDR_ANY (cf. __inet_bind).  Checking the address is
		 * required to not wrongfully return -EACCES instead of
		 * -EAFNOSUPPORT.
		 */
		if (access_request == LANDLOCK_ACCESS_NET_BIND_TCP) {
			const struct sockaddr_in *const sockaddr =
				(struct sockaddr_in *)address;

			if (sockaddr->sin_addr.s_addr != htonl(INADDR_ANY))
				return -EAFNOSUPPORT;
		}

I didn't understand, why Landlock design requires to make this network stack level check (why return of -EACCES is wrongful)?

@l0kod
Copy link
Member Author

l0kod commented Mar 26, 2024

I didn't understand, why Landlock design requires to make this network stack level check (why return of -EACCES is wrongful)?

Because network LSM hooks are called before any check on the network request (e.g. legitimate address or protocol), hooks implementations first need to check that the provided data is complete to avoid introducing bugs, and they should also return the appropriate error code (e.g. EAFNOSUPPORT) to let legitimate user space know that their request is wrong and that they should fix their code. As a side effect, this also makes Landlock (or any other LSM in theory) transparent for legitimate use case (e.g. a bad address). Returning EACCES would mask some useful information. An alternative would be to return 0 and let the network stack handle this check, but this is risky because if the network stack changes this check we would end up allowing this request. This is similar to bbf5a1d and related checks.

@sm1ling-knight
Copy link

sm1ling-knight commented Mar 26, 2024

I didn't understand, why Landlock design requires to make this network stack level check (why return of -EACCES is wrongful)?

Because network LSM hooks are called before any check on the network request (e.g. legitimate address or protocol), hooks implementations first need to check that the provided data is complete to avoid introducing bugs, and they should also return the appropriate error code (e.g. EAFNOSUPPORT) to let legitimate user space know that their request is wrong and that they should fix their code. As a side effect, this also makes Landlock (or any other LSM in theory) transparent for legitimate use case (e.g. a bad address). Returning EACCES would mask some useful information. An alternative would be to return 0 and let the network stack handle this check, but this is risky because if the network stack changes this check we would end up allowing this request. This is similar to bbf5a1d and related checks.

Am I correct in understanding that this only applies to not-socket argument checking? (e.g. addr, addrlen)

E.g. inet stack restricts calling to bind(2) with active socket (not TCP_CLOSE). But above test fails because bind_variant() returns -EACCES instead of -EINVAL - Landlock mask information about socket invalid state.

We don't want to provide appropriate check in hook, because it's not related to addr or addrlen (bind(2)) arguments, right?

TEST_F(port_specific, listen_bind)
{
	int bind_fd, ret;
	int port0, port1;

	port0 = 1024;
	port1 = 1025;

	/* Adds a rule layer with bind and connect actions. */
	if (variant->sandbox == TCP_SANDBOX) {
		const struct landlock_ruleset_attr ruleset_attr = {
			.handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP
		};
		const struct landlock_net_port_attr tcp_bind_port0 = {
			.allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP,
			.port = port0,
		};
		const struct landlock_net_port_attr tcp_bind_0 = {
			.allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP,
			.port = 0,
		};
		int ruleset_fd;

		ruleset_fd = landlock_create_ruleset(&ruleset_attr,
						     sizeof(ruleset_attr), 0);
		ASSERT_LE(0, ruleset_fd);

		/* Allow calling to listen(2) without explicit binding. */
		EXPECT_EQ(0,
			  landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
					    &tcp_bind_0, 0));
		EXPECT_EQ(0,
			  landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
					    &tcp_bind_port0, 0));

		enforce_ruleset(_metadata, ruleset_fd);
		EXPECT_EQ(0, close(ruleset_fd));
	}

	bind_fd = socket_variant(&self->srv0);
	ASSERT_LE(0, bind_fd);

	/* Port for bind_fd is chosen randomly by kernel. */
	ret = listen(bind_fd, backlog);
	EXPECT_EQ(0, ret);

	set_port(&self->srv0, port1);
	ret = bind_variant(bind_fd, &self->srv0);
	EXPECT_EQ(-EINVAL, ret);

	EXPECT_EQ(0, close(bind_fd));
}

@l0kod
Copy link
Member Author

l0kod commented Mar 27, 2024

Am I correct in understanding that this only applies to not-socket argument checking? (e.g. addr, addrlen)

E.g. inet stack restricts calling to bind(2) with active socket (not TCP_CLOSE). But above test fails because bind_variant() returns -EACCES instead of -EINVAL - Landlock mask information about socket invalid state.

We don't want to provide appropriate check in hook, because it's not related to addr or addrlen (bind(2)) arguments, right?

I think error codes should be consistent. In this case, we (and probably other LSMs) forgot to check socket's state and I think we should (if the additional check is not too complex, which should not be the case). Please, send a fix, we'll continue the discussion on the mailing list.

I have two patches that should improve the current state for all LSMs. I'll refresh them and send them. Feel free to pick them or reply with your own checks.

@l0kod
Copy link
Member Author

l0kod commented Mar 27, 2024

I have two patches that should improve the current state for all LSMs. I'll refresh them and send them. Feel free to pick them or reply with your own checks.

https://lore.kernel.org/r/20240327120036.233641-1-mic@digikod.net

@sm1ling-knight
Copy link

Hello, @l0kod !
Can I use this in the rationale?

LANDLOCK_ACCESS_NET_BIND_TCP is useful to limit the scope of "bindable" ports to forbid a malicious sandboxed process to impersonate a legitimate server process. However, bind(2) might be used by (TCP) clients to set the source port to a (legitimate) value.

@l0kod
Copy link
Member Author

l0kod commented Jul 10, 2024

Hello, @l0kod ! Can I use this in the rationale?

LANDLOCK_ACCESS_NET_BIND_TCP is useful to limit the scope of "bindable" ports to forbid a malicious sandboxed process to impersonate a legitimate server process. However, bind(2) might be used by (TCP) clients to set the source port to a (legitimate) value.

Sure! Feel free to get inspiration or even copy some parts of these discussions as long as authors are in Cc.

You should also use this tag for a related patch: Closes: https://github.com/landlock-lsm/linux/issues/15

@l0kod l0kod moved this from In Progress to In review in Landlock Aug 8, 2024
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Aug 14, 2024
LANDLOCK_ACCESS_NET_BIND_TCP is useful to limit the scope of "bindable"
ports to forbid a malicious sandboxed process to impersonate a legitimate
server process. However, bind(2) might be used by (TCP) clients to set the
source port to a (legitimate) value. Controlling the ports that can be
used for listening would allow (TCP) clients to explicitly bind to ports
that are forbidden for listening.

Such control is implemented with a new LANDLOCK_ACCESS_NET_LISTEN_TCP
access right that restricts listening on undesired ports with listen(2).

It's worth noticing that this access right doesn't affect changing
backlog value using listen(2) on already listening socket.

* Create new LANDLOCK_ACCESS_NET_LISTEN_TCP flag.
* Add hook to socket_listen(), which checks whether the socket is allowed
  to listen on a binded local port.
* Add check_tcp_socket_can_listen() helper, which validates socket
  attributes before the actual access right check.
* Update `struct landlock_net_port_attr` documentation with control of
  binding to ephemeral port with listen(2) description.
* Change ABI version to 6.

Closes: landlock-lsm#15
Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
l0kod pushed a commit that referenced this issue Sep 9, 2024
[ Upstream commit a699781 ]

A sysfs reader can race with a device reset or removal, attempting to
read device state when the device is not actually present. eg:

     [exception RIP: qed_get_current_link+17]
  #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede]
  #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3
 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4
 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300
 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c
 #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b
 #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3
 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1
 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f
 #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb

 crash> struct net_device.state ffff9a9d21336000
    state = 5,

state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100).
The device is not present, note lack of __LINK_STATE_PRESENT (0b10).

This is the same sort of panic as observed in commit 4224cfd
("net-sysfs: add check for netdevice being present to speed_show").

There are many other callers of __ethtool_get_link_ksettings() which
don't have a device presence check.

Move this check into ethtool to protect all callers.

Fixes: d519e17 ("net: export device speed and duplex via sysfs")
Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
l0kod pushed a commit that referenced this issue Dec 16, 2024
[ Upstream commit 4a74da0 ]

KASAN reports an out of bounds read:
BUG: KASAN: slab-out-of-bounds in __kuid_val include/linux/uidgid.h:36
BUG: KASAN: slab-out-of-bounds in uid_eq include/linux/uidgid.h:63 [inline]
BUG: KASAN: slab-out-of-bounds in key_task_permission+0x394/0x410
security/keys/permission.c:54
Read of size 4 at addr ffff88813c3ab618 by task stress-ng/4362

CPU: 2 PID: 4362 Comm: stress-ng Not tainted 5.10.0-14930-gafbffd6c3ede #15
Call Trace:
 __dump_stack lib/dump_stack.c:82 [inline]
 dump_stack+0x107/0x167 lib/dump_stack.c:123
 print_address_description.constprop.0+0x19/0x170 mm/kasan/report.c:400
 __kasan_report.cold+0x6c/0x84 mm/kasan/report.c:560
 kasan_report+0x3a/0x50 mm/kasan/report.c:585
 __kuid_val include/linux/uidgid.h:36 [inline]
 uid_eq include/linux/uidgid.h:63 [inline]
 key_task_permission+0x394/0x410 security/keys/permission.c:54
 search_nested_keyrings+0x90e/0xe90 security/keys/keyring.c:793

This issue was also reported by syzbot.

It can be reproduced by following these steps(more details [1]):
1. Obtain more than 32 inputs that have similar hashes, which ends with the
   pattern '0xxxxxxxe6'.
2. Reboot and add the keys obtained in step 1.

The reproducer demonstrates how this issue happened:
1. In the search_nested_keyrings function, when it iterates through the
   slots in a node(below tag ascend_to_node), if the slot pointer is meta
   and node->back_pointer != NULL(it means a root), it will proceed to
   descend_to_node. However, there is an exception. If node is the root,
   and one of the slots points to a shortcut, it will be treated as a
   keyring.
2. Whether the ptr is keyring decided by keyring_ptr_is_keyring function.
   However, KEYRING_PTR_SUBTYPE is 0x2UL, the same as
   ASSOC_ARRAY_PTR_SUBTYPE_MASK.
3. When 32 keys with the similar hashes are added to the tree, the ROOT
   has keys with hashes that are not similar (e.g. slot 0) and it splits
   NODE A without using a shortcut. When NODE A is filled with keys that
   all hashes are xxe6, the keys are similar, NODE A will split with a
   shortcut. Finally, it forms the tree as shown below, where slot 6 points
   to a shortcut.

                      NODE A
              +------>+---+
      ROOT    |       | 0 | xxe6
      +---+   |       +---+
 xxxx | 0 | shortcut  :   : xxe6
      +---+   |       +---+
 xxe6 :   :   |       |   | xxe6
      +---+   |       +---+
      | 6 |---+       :   : xxe6
      +---+           +---+
 xxe6 :   :           | f | xxe6
      +---+           +---+
 xxe6 | f |
      +---+

4. As mentioned above, If a slot(slot 6) of the root points to a shortcut,
   it may be mistakenly transferred to a key*, leading to a read
   out-of-bounds read.

To fix this issue, one should jump to descend_to_node if the ptr is a
shortcut, regardless of whether the node is root or not.

[1] https://lore.kernel.org/linux-kernel/1cfa878e-8c7b-4570-8606-21daf5e13ce7@huaweicloud.com/

[jarkko: tweaked the commit message a bit to have an appropriate closes
 tag.]
Fixes: b2a4df2 ("KEYS: Expand the capacity of a keyring")
Reported-by: syzbot+5b415c07907a2990d1a3@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000cbb7860611f61147@google.com/T/
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
l0kod pushed a commit that referenced this issue Dec 16, 2024
[ Upstream commit 4a74da0 ]

KASAN reports an out of bounds read:
BUG: KASAN: slab-out-of-bounds in __kuid_val include/linux/uidgid.h:36
BUG: KASAN: slab-out-of-bounds in uid_eq include/linux/uidgid.h:63 [inline]
BUG: KASAN: slab-out-of-bounds in key_task_permission+0x394/0x410
security/keys/permission.c:54
Read of size 4 at addr ffff88813c3ab618 by task stress-ng/4362

CPU: 2 PID: 4362 Comm: stress-ng Not tainted 5.10.0-14930-gafbffd6c3ede #15
Call Trace:
 __dump_stack lib/dump_stack.c:82 [inline]
 dump_stack+0x107/0x167 lib/dump_stack.c:123
 print_address_description.constprop.0+0x19/0x170 mm/kasan/report.c:400
 __kasan_report.cold+0x6c/0x84 mm/kasan/report.c:560
 kasan_report+0x3a/0x50 mm/kasan/report.c:585
 __kuid_val include/linux/uidgid.h:36 [inline]
 uid_eq include/linux/uidgid.h:63 [inline]
 key_task_permission+0x394/0x410 security/keys/permission.c:54
 search_nested_keyrings+0x90e/0xe90 security/keys/keyring.c:793

This issue was also reported by syzbot.

It can be reproduced by following these steps(more details [1]):
1. Obtain more than 32 inputs that have similar hashes, which ends with the
   pattern '0xxxxxxxe6'.
2. Reboot and add the keys obtained in step 1.

The reproducer demonstrates how this issue happened:
1. In the search_nested_keyrings function, when it iterates through the
   slots in a node(below tag ascend_to_node), if the slot pointer is meta
   and node->back_pointer != NULL(it means a root), it will proceed to
   descend_to_node. However, there is an exception. If node is the root,
   and one of the slots points to a shortcut, it will be treated as a
   keyring.
2. Whether the ptr is keyring decided by keyring_ptr_is_keyring function.
   However, KEYRING_PTR_SUBTYPE is 0x2UL, the same as
   ASSOC_ARRAY_PTR_SUBTYPE_MASK.
3. When 32 keys with the similar hashes are added to the tree, the ROOT
   has keys with hashes that are not similar (e.g. slot 0) and it splits
   NODE A without using a shortcut. When NODE A is filled with keys that
   all hashes are xxe6, the keys are similar, NODE A will split with a
   shortcut. Finally, it forms the tree as shown below, where slot 6 points
   to a shortcut.

                      NODE A
              +------>+---+
      ROOT    |       | 0 | xxe6
      +---+   |       +---+
 xxxx | 0 | shortcut  :   : xxe6
      +---+   |       +---+
 xxe6 :   :   |       |   | xxe6
      +---+   |       +---+
      | 6 |---+       :   : xxe6
      +---+           +---+
 xxe6 :   :           | f | xxe6
      +---+           +---+
 xxe6 | f |
      +---+

4. As mentioned above, If a slot(slot 6) of the root points to a shortcut,
   it may be mistakenly transferred to a key*, leading to a read
   out-of-bounds read.

To fix this issue, one should jump to descend_to_node if the ptr is a
shortcut, regardless of whether the node is root or not.

[1] https://lore.kernel.org/linux-kernel/1cfa878e-8c7b-4570-8606-21daf5e13ce7@huaweicloud.com/

[jarkko: tweaked the commit message a bit to have an appropriate closes
 tag.]
Fixes: b2a4df2 ("KEYS: Expand the capacity of a keyring")
Reported-by: syzbot+5b415c07907a2990d1a3@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000cbb7860611f61147@google.com/T/
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
l0kod pushed a commit that referenced this issue Dec 16, 2024
commit 9af2efe upstream.

The fields in the hist_entry are filled on-demand which means they only
have meaningful values when relevant sort keys are used.

So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in
the hist entry can be garbage.  So it shouldn't access it
unconditionally.

I got a segfault, when I wanted to see cgroup profiles.

  $ sudo perf record -a --all-cgroups --synth=cgroup true

  $ sudo perf report -s cgroup

  Program received signal SIGSEGV, Segmentation fault.
  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  48		return RC_CHK_ACCESS(map)->dso;
  (gdb) bt
  #0  0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48
  #1  0x00005555557aa39b in map__load (map=0x0) at util/map.c:344
  #2  0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385
  #3  0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true)
      at util/hist.c:644
  #4  0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761
  #5  0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0,
      sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779
  #6  0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015
  #7  0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0)
      at util/hist.c:1260
  #8  0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0,
      machine=0x5555560388e8) at builtin-report.c:334
  #9  0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232
  #10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128,
      sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271
  #11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0,
      file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354
  #12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132
  #13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245
  #14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324
  #15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342
  #16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60)
      at util/session.c:780
  #17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688,
      file_path=0x555556038ff0 "perf.data") at util/session.c:1406

As you can see the entry->ms.map was NULL even if he->ms.map has a
value.  This is because 'sym' sort key is not given, so it cannot assume
whether he->ms.sym and entry->ms.sym is the same.  I only checked the
'sym' sort key here as it implies 'dso' behavior (so maps are the same).

Fixes: ac01c8c ("perf hist: Update hist symbol when updating maps")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Matt Fleming <matt@readmodwrite.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20240826221045.1202305-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In review
Development

No branches or pull requests

3 participants