-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCP listen control #15
Comments
The problem is that (TCP) client is unable to use ports that are restricted for server, right? |
Yes, a client would not be allowed to bind to a specific port (for a newly initiated outgoing connection) if its domain handles |
For example, a use case where some kind of server process forks the client and the client 's available ports are limited due to the parent ruleset, right? |
If a new process is spawned per client connection, then the new process doesn't need any specific right to reply on the opened socket/file descriptor. Indeed, in this case, there is no call to A use case where a client process does an explicit Another use case is to use a specific network interface or IP address. |
But why in this case client's domain might need to handle |
They shouldn't unless we can specify a port range (#16). What would make more sense is to use |
@BoardzMaster, do you still want to work on this or should we let this open for others? |
Hi @l0kod. @sm1ling-knight he is my colleague and he works on this now. I will give him a hand. |
|
I guess you meant that when I wrote a test and this indeed works, whereas it should not. Could you please send a patch to the LSM mailing list (with me in CC), to explain the issue and propose a fix for Landlock. I guess this should mainly add a |
Ok, i'll do it.
I thought about adding another security plug function that would be called by the ip layer at the end of the listen() handler execution.
What do you mean by that? Binding on 0 port is also seems to be an issue, maybe we should add one more hook for |
Well, it might be seen as an LSM issue, so yes, it would be nice to add
Binding on port 0 translates to binding to an ephemeral port, see the This should be a fix of |
So, we have to make following things for managing actions with ephemeral ports, right?
|
Explicit binding on port 0 is OK. It was discussed with @BoardzMaster, it reflects the However, binding on an ephemeral port with The |
Thanks, got it) |
Hello, @l0kod ! Here you added this check in net.c: /*
* For compatibility reason, accept AF_UNSPEC for bind
* accesses (mapped to AF_INET) only if the address is
* INADDR_ANY (cf. __inet_bind). Checking the address is
* required to not wrongfully return -EACCES instead of
* -EAFNOSUPPORT.
*/
if (access_request == LANDLOCK_ACCESS_NET_BIND_TCP) {
const struct sockaddr_in *const sockaddr =
(struct sockaddr_in *)address;
if (sockaddr->sin_addr.s_addr != htonl(INADDR_ANY))
return -EAFNOSUPPORT;
} I didn't understand, why Landlock design requires to make this network stack level check (why return of |
Because network LSM hooks are called before any check on the network request (e.g. legitimate address or protocol), hooks implementations first need to check that the provided data is complete to avoid introducing bugs, and they should also return the appropriate error code (e.g. |
Am I correct in understanding that this only applies to not-socket argument checking? (e.g. E.g. inet stack restricts calling to We don't want to provide appropriate check in hook, because it's not related to TEST_F(port_specific, listen_bind)
{
int bind_fd, ret;
int port0, port1;
port0 = 1024;
port1 = 1025;
/* Adds a rule layer with bind and connect actions. */
if (variant->sandbox == TCP_SANDBOX) {
const struct landlock_ruleset_attr ruleset_attr = {
.handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP
};
const struct landlock_net_port_attr tcp_bind_port0 = {
.allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP,
.port = port0,
};
const struct landlock_net_port_attr tcp_bind_0 = {
.allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP,
.port = 0,
};
int ruleset_fd;
ruleset_fd = landlock_create_ruleset(&ruleset_attr,
sizeof(ruleset_attr), 0);
ASSERT_LE(0, ruleset_fd);
/* Allow calling to listen(2) without explicit binding. */
EXPECT_EQ(0,
landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
&tcp_bind_0, 0));
EXPECT_EQ(0,
landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT,
&tcp_bind_port0, 0));
enforce_ruleset(_metadata, ruleset_fd);
EXPECT_EQ(0, close(ruleset_fd));
}
bind_fd = socket_variant(&self->srv0);
ASSERT_LE(0, bind_fd);
/* Port for bind_fd is chosen randomly by kernel. */
ret = listen(bind_fd, backlog);
EXPECT_EQ(0, ret);
set_port(&self->srv0, port1);
ret = bind_variant(bind_fd, &self->srv0);
EXPECT_EQ(-EINVAL, ret);
EXPECT_EQ(0, close(bind_fd));
} |
I think error codes should be consistent. In this case, we (and probably other LSMs) forgot to check socket's state and I think we should (if the additional check is not too complex, which should not be the case). Please, send a fix, we'll continue the discussion on the mailing list. I have two patches that should improve the current state for all LSMs. I'll refresh them and send them. Feel free to pick them or reply with your own checks. |
https://lore.kernel.org/r/20240327120036.233641-1-mic@digikod.net |
Hello, @l0kod !
|
Sure! Feel free to get inspiration or even copy some parts of these discussions as long as authors are in Cc. You should also use this tag for a related patch: |
LANDLOCK_ACCESS_NET_BIND_TCP is useful to limit the scope of "bindable" ports to forbid a malicious sandboxed process to impersonate a legitimate server process. However, bind(2) might be used by (TCP) clients to set the source port to a (legitimate) value. Controlling the ports that can be used for listening would allow (TCP) clients to explicitly bind to ports that are forbidden for listening. Such control is implemented with a new LANDLOCK_ACCESS_NET_LISTEN_TCP access right that restricts listening on undesired ports with listen(2). It's worth noticing that this access right doesn't affect changing backlog value using listen(2) on already listening socket. * Create new LANDLOCK_ACCESS_NET_LISTEN_TCP flag. * Add hook to socket_listen(), which checks whether the socket is allowed to listen on a binded local port. * Add check_tcp_socket_can_listen() helper, which validates socket attributes before the actual access right check. * Update `struct landlock_net_port_attr` documentation with control of binding to ephemeral port with listen(2) description. * Change ABI version to 6. Closes: landlock-lsm#15 Signed-off-by: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
[ Upstream commit a699781 ] A sysfs reader can race with a device reset or removal, attempting to read device state when the device is not actually present. eg: [exception RIP: qed_get_current_link+17] #8 [ffffb9e4f2907c48] qede_get_link_ksettings at ffffffffc07a994a [qede] #9 [ffffb9e4f2907cd8] __rh_call_get_link_ksettings at ffffffff992b01a3 #10 [ffffb9e4f2907d38] __ethtool_get_link_ksettings at ffffffff992b04e4 #11 [ffffb9e4f2907d90] duplex_show at ffffffff99260300 #12 [ffffb9e4f2907e38] dev_attr_show at ffffffff9905a01c #13 [ffffb9e4f2907e50] sysfs_kf_seq_show at ffffffff98e0145b #14 [ffffb9e4f2907e68] seq_read at ffffffff98d902e3 #15 [ffffb9e4f2907ec8] vfs_read at ffffffff98d657d1 #16 [ffffb9e4f2907f00] ksys_read at ffffffff98d65c3f #17 [ffffb9e4f2907f38] do_syscall_64 at ffffffff98a052fb crash> struct net_device.state ffff9a9d21336000 state = 5, state 5 is __LINK_STATE_START (0b1) and __LINK_STATE_NOCARRIER (0b100). The device is not present, note lack of __LINK_STATE_PRESENT (0b10). This is the same sort of panic as observed in commit 4224cfd ("net-sysfs: add check for netdevice being present to speed_show"). There are many other callers of __ethtool_get_link_ksettings() which don't have a device presence check. Move this check into ethtool to protect all callers. Fixes: d519e17 ("net: export device speed and duplex via sysfs") Fixes: 4224cfd ("net-sysfs: add check for netdevice being present to speed_show") Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com> Link: https://patch.msgid.link/8bae218864beaa44ed01628140475b9bf641c5b0.1724393671.git.jamie.bainbridge@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4a74da0 ] KASAN reports an out of bounds read: BUG: KASAN: slab-out-of-bounds in __kuid_val include/linux/uidgid.h:36 BUG: KASAN: slab-out-of-bounds in uid_eq include/linux/uidgid.h:63 [inline] BUG: KASAN: slab-out-of-bounds in key_task_permission+0x394/0x410 security/keys/permission.c:54 Read of size 4 at addr ffff88813c3ab618 by task stress-ng/4362 CPU: 2 PID: 4362 Comm: stress-ng Not tainted 5.10.0-14930-gafbffd6c3ede #15 Call Trace: __dump_stack lib/dump_stack.c:82 [inline] dump_stack+0x107/0x167 lib/dump_stack.c:123 print_address_description.constprop.0+0x19/0x170 mm/kasan/report.c:400 __kasan_report.cold+0x6c/0x84 mm/kasan/report.c:560 kasan_report+0x3a/0x50 mm/kasan/report.c:585 __kuid_val include/linux/uidgid.h:36 [inline] uid_eq include/linux/uidgid.h:63 [inline] key_task_permission+0x394/0x410 security/keys/permission.c:54 search_nested_keyrings+0x90e/0xe90 security/keys/keyring.c:793 This issue was also reported by syzbot. It can be reproduced by following these steps(more details [1]): 1. Obtain more than 32 inputs that have similar hashes, which ends with the pattern '0xxxxxxxe6'. 2. Reboot and add the keys obtained in step 1. The reproducer demonstrates how this issue happened: 1. In the search_nested_keyrings function, when it iterates through the slots in a node(below tag ascend_to_node), if the slot pointer is meta and node->back_pointer != NULL(it means a root), it will proceed to descend_to_node. However, there is an exception. If node is the root, and one of the slots points to a shortcut, it will be treated as a keyring. 2. Whether the ptr is keyring decided by keyring_ptr_is_keyring function. However, KEYRING_PTR_SUBTYPE is 0x2UL, the same as ASSOC_ARRAY_PTR_SUBTYPE_MASK. 3. When 32 keys with the similar hashes are added to the tree, the ROOT has keys with hashes that are not similar (e.g. slot 0) and it splits NODE A without using a shortcut. When NODE A is filled with keys that all hashes are xxe6, the keys are similar, NODE A will split with a shortcut. Finally, it forms the tree as shown below, where slot 6 points to a shortcut. NODE A +------>+---+ ROOT | | 0 | xxe6 +---+ | +---+ xxxx | 0 | shortcut : : xxe6 +---+ | +---+ xxe6 : : | | | xxe6 +---+ | +---+ | 6 |---+ : : xxe6 +---+ +---+ xxe6 : : | f | xxe6 +---+ +---+ xxe6 | f | +---+ 4. As mentioned above, If a slot(slot 6) of the root points to a shortcut, it may be mistakenly transferred to a key*, leading to a read out-of-bounds read. To fix this issue, one should jump to descend_to_node if the ptr is a shortcut, regardless of whether the node is root or not. [1] https://lore.kernel.org/linux-kernel/1cfa878e-8c7b-4570-8606-21daf5e13ce7@huaweicloud.com/ [jarkko: tweaked the commit message a bit to have an appropriate closes tag.] Fixes: b2a4df2 ("KEYS: Expand the capacity of a keyring") Reported-by: syzbot+5b415c07907a2990d1a3@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000cbb7860611f61147@google.com/T/ Signed-off-by: Chen Ridong <chenridong@huawei.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4a74da0 ] KASAN reports an out of bounds read: BUG: KASAN: slab-out-of-bounds in __kuid_val include/linux/uidgid.h:36 BUG: KASAN: slab-out-of-bounds in uid_eq include/linux/uidgid.h:63 [inline] BUG: KASAN: slab-out-of-bounds in key_task_permission+0x394/0x410 security/keys/permission.c:54 Read of size 4 at addr ffff88813c3ab618 by task stress-ng/4362 CPU: 2 PID: 4362 Comm: stress-ng Not tainted 5.10.0-14930-gafbffd6c3ede #15 Call Trace: __dump_stack lib/dump_stack.c:82 [inline] dump_stack+0x107/0x167 lib/dump_stack.c:123 print_address_description.constprop.0+0x19/0x170 mm/kasan/report.c:400 __kasan_report.cold+0x6c/0x84 mm/kasan/report.c:560 kasan_report+0x3a/0x50 mm/kasan/report.c:585 __kuid_val include/linux/uidgid.h:36 [inline] uid_eq include/linux/uidgid.h:63 [inline] key_task_permission+0x394/0x410 security/keys/permission.c:54 search_nested_keyrings+0x90e/0xe90 security/keys/keyring.c:793 This issue was also reported by syzbot. It can be reproduced by following these steps(more details [1]): 1. Obtain more than 32 inputs that have similar hashes, which ends with the pattern '0xxxxxxxe6'. 2. Reboot and add the keys obtained in step 1. The reproducer demonstrates how this issue happened: 1. In the search_nested_keyrings function, when it iterates through the slots in a node(below tag ascend_to_node), if the slot pointer is meta and node->back_pointer != NULL(it means a root), it will proceed to descend_to_node. However, there is an exception. If node is the root, and one of the slots points to a shortcut, it will be treated as a keyring. 2. Whether the ptr is keyring decided by keyring_ptr_is_keyring function. However, KEYRING_PTR_SUBTYPE is 0x2UL, the same as ASSOC_ARRAY_PTR_SUBTYPE_MASK. 3. When 32 keys with the similar hashes are added to the tree, the ROOT has keys with hashes that are not similar (e.g. slot 0) and it splits NODE A without using a shortcut. When NODE A is filled with keys that all hashes are xxe6, the keys are similar, NODE A will split with a shortcut. Finally, it forms the tree as shown below, where slot 6 points to a shortcut. NODE A +------>+---+ ROOT | | 0 | xxe6 +---+ | +---+ xxxx | 0 | shortcut : : xxe6 +---+ | +---+ xxe6 : : | | | xxe6 +---+ | +---+ | 6 |---+ : : xxe6 +---+ +---+ xxe6 : : | f | xxe6 +---+ +---+ xxe6 | f | +---+ 4. As mentioned above, If a slot(slot 6) of the root points to a shortcut, it may be mistakenly transferred to a key*, leading to a read out-of-bounds read. To fix this issue, one should jump to descend_to_node if the ptr is a shortcut, regardless of whether the node is root or not. [1] https://lore.kernel.org/linux-kernel/1cfa878e-8c7b-4570-8606-21daf5e13ce7@huaweicloud.com/ [jarkko: tweaked the commit message a bit to have an appropriate closes tag.] Fixes: b2a4df2 ("KEYS: Expand the capacity of a keyring") Reported-by: syzbot+5b415c07907a2990d1a3@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000cbb7860611f61147@google.com/T/ Signed-off-by: Chen Ridong <chenridong@huawei.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 9af2efe upstream. The fields in the hist_entry are filled on-demand which means they only have meaningful values when relevant sort keys are used. So if neither of 'dso' nor 'sym' sort keys are used, the map/symbols in the hist entry can be garbage. So it shouldn't access it unconditionally. I got a segfault, when I wanted to see cgroup profiles. $ sudo perf record -a --all-cgroups --synth=cgroup true $ sudo perf report -s cgroup Program received signal SIGSEGV, Segmentation fault. 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 48 return RC_CHK_ACCESS(map)->dso; (gdb) bt #0 0x00005555557a8d90 in map__dso (map=0x0) at util/map.h:48 #1 0x00005555557aa39b in map__load (map=0x0) at util/map.c:344 #2 0x00005555557aa592 in map__find_symbol (map=0x0, addr=140736115941088) at util/map.c:385 #3 0x00005555557ef000 in hists__findnew_entry (hists=0x555556039d60, entry=0x7fffffffa4c0, al=0x7fffffffa8c0, sample_self=true) at util/hist.c:644 #4 0x00005555557ef61c in __hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, block_info=0x0, sample=0x7fffffffaa90, sample_self=true, ops=0x0) at util/hist.c:761 #5 0x00005555557ef71f in hists__add_entry (hists=0x555556039d60, al=0x7fffffffa8c0, sym_parent=0x0, bi=0x0, mi=0x0, ki=0x0, sample=0x7fffffffaa90, sample_self=true) at util/hist.c:779 #6 0x00005555557f00fb in iter_add_single_normal_entry (iter=0x7fffffffa900, al=0x7fffffffa8c0) at util/hist.c:1015 #7 0x00005555557f09a7 in hist_entry_iter__add (iter=0x7fffffffa900, al=0x7fffffffa8c0, max_stack_depth=127, arg=0x7fffffffbce0) at util/hist.c:1260 #8 0x00005555555ba7ce in process_sample_event (tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at builtin-report.c:334 #9 0x00005555557b30c8 in evlist__deliver_sample (evlist=0x555556039010, tool=0x7fffffffbce0, event=0x7ffff7c14128, sample=0x7fffffffaa90, evsel=0x555556039ad0, machine=0x5555560388e8) at util/session.c:1232 #10 0x00005555557b32bc in machines__deliver_event (machines=0x5555560388e8, evlist=0x555556039010, event=0x7ffff7c14128, sample=0x7fffffffaa90, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1271 #11 0x00005555557b3848 in perf_session__deliver_event (session=0x5555560386d0, event=0x7ffff7c14128, tool=0x7fffffffbce0, file_offset=110888, file_path=0x555556038ff0 "perf.data") at util/session.c:1354 #12 0x00005555557affaf in ordered_events__deliver_event (oe=0x555556038e60, event=0x555556135aa0) at util/session.c:132 #13 0x00005555557bb605 in do_flush (oe=0x555556038e60, show_progress=false) at util/ordered-events.c:245 #14 0x00005555557bb95c in __ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND, timestamp=0) at util/ordered-events.c:324 #15 0x00005555557bba46 in ordered_events__flush (oe=0x555556038e60, how=OE_FLUSH__ROUND) at util/ordered-events.c:342 #16 0x00005555557b1b3b in perf_event__process_finished_round (tool=0x7fffffffbce0, event=0x7ffff7c15bb8, oe=0x555556038e60) at util/session.c:780 #17 0x00005555557b3b27 in perf_session__process_user_event (session=0x5555560386d0, event=0x7ffff7c15bb8, file_offset=117688, file_path=0x555556038ff0 "perf.data") at util/session.c:1406 As you can see the entry->ms.map was NULL even if he->ms.map has a value. This is because 'sym' sort key is not given, so it cannot assume whether he->ms.sym and entry->ms.sym is the same. I only checked the 'sym' sort key here as it implies 'dso' behavior (so maps are the same). Fixes: ac01c8c ("perf hist: Update hist symbol when updating maps") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Matt Fleming <matt@readmodwrite.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20240826221045.1202305-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
LANDLOCK_ACCESS_NET_BIND_TCP
is useful to limit the scope of "bindable" ports to forbid a malicious sandboxed process to impersonate a legitimate server process. However,bind(2)
might be used by (TCP) clients to set the source port to a (legitimate) value. This use case is an issue because we cannot allow a whole range of ports (e.g., >= 1024).A
LANDLOCK_ACCESS_NET_LISTEN_TCP
would make more sense to control incoming connections to a specific port.Being able to restrict
listen(2)
may lead to a cover channel where the sandboxed process can infer some properties (e.g. port) from the socket file descriptior (e.g. if it was passed from another process). This doesn't seem to be a security issue though.Controlling accept(2) might not be worth it because:
See thread: https://lore.kernel.org/all/b4440d19-93b9-e234-007b-4fc4f987550b@digikod.net/
Related to #6
Cc @BoardzMaster
v1: https://lore.kernel.org/all/20240728002602.3198398-1-ivanov.mikhail1@huawei-partners.com/
v0: https://lore.kernel.org/all/20240408094747.1761850-1-ivanov.mikhail1@huawei-partners.com/
The text was updated successfully, but these errors were encountered: