Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pend() assertion can allow user threads to crash the kernel #22999

Closed
andrewboie opened this issue Feb 21, 2020 · 3 comments · Fixed by #23000
Closed

pend() assertion can allow user threads to crash the kernel #22999

andrewboie opened this issue Feb 21, 2020 · 3 comments · Fixed by #23000
Assignees
Labels
bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@andrewboie
Copy link
Contributor

andrewboie commented Feb 21, 2020

Describe the bug
There's an assertion deep in the kernel scheduling code which, when a thread is being pended on a wait queue, to raise an assertion of the timeout is negative and not equal to 0. For some reason, this is immediately followed by a runtime check to just set timeout to 0 if assertions aren't built in.

static void pend(struct k_thread *thread, _wait_q_t *wait_q, s32_t timeout)
{
	...

	if (timeout != K_FOREVER) {
		s32_t ticks;

		__ASSERT(timeout >= 0,
			"Only non-negative values are accepted.");

		if (timeout < 0) {
			timeout = 0;
		}

		ticks = _TICK_ALIGN + k_ms_to_ticks_ceil32(timeout);

		z_add_thread_timeout(thread, ticks);
	}
}

Unfortunately, this assertion is problematic, as any system call with a timeout parameter can cause the kernel to fail this assertion. A user thread can make a call like k_sem_take(&sem, -2) and cause a crash in kernel mode. This is particularly bad because callers of pend() are either holding some kind of spinlock or have locked interrupts; a kernel crash in a critical section is an unrecoverable condition.

User threads making system calls are not allowed to ever cause the kernel to fail an assertion or otherwise trigger a kernel panic due to the risk of kernel corruption beyond the scope of the calling thread. Fatal errors in system calls are only permitted in a controlled fashion using Z_OOPS() in syscall verification functions.

I can conceive of three ways of addressing this:

  1. Delete the assertion. We have a runtime correction any way, what's the point?
  2. Delete the assertion and modify pend() to propagate a return value all the way up to callers of z_pend_* APIs
  3. Add bounds checks to the implementation functions of all system calls that accept a timeout parameter.

To Reproduce

diff --git a/tests/kernel/semaphore/semaphore/src/main.c b/tests/kernel/semaphore/semaphore/src/main.c
index 29ef44ac80..a2f8825bf5 100644
--- a/tests/kernel/semaphore/semaphore/src/main.c
+++ b/tests/kernel/semaphore/semaphore/src/main.c
@@ -432,7 +432,7 @@ void test_sem_take_timeout_forever(void)
 
        k_sem_reset(&simple_sem);
 
-       ret_value = k_sem_take(&simple_sem, K_FOREVER);
+       ret_value = k_sem_take(&simple_sem, -2);
        zassert_true(ret_value == 0, "k_sem_take failed");
        k_thread_abort(&sem_tid);
 
@andrewboie andrewboie added the bug The issue is a bug, or the PR is fixing a bug label Feb 21, 2020
@andrewboie
Copy link
Contributor Author

@nordic-krch according to git blame you added this assertion, do you have time to fix this?

cc @pabigot @andyross

@andrewboie andrewboie added the priority: medium Medium impact/importance bug label Feb 21, 2020
@andrewboie andrewboie changed the title timeout assertion can allow user threads to crash the kernel pend() assertion can allow user threads to crash the kernel Feb 21, 2020
andrewboie pushed a commit to andrewboie/zephyr that referenced this issue Feb 21, 2020
This assertion, if built in, allows users threads to crash
the kernel in a critical section by passing a negative timeout
value, creating a DoS attack vector.

Remove this assertion, immediately below it there's a check
which just resets it to 0 anyway.

Fixes: zephyrproject-rtos#22999

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
@nordic-krch
Copy link
Contributor

@andrewboie it's been discussed #19591 when conversion functions changed signing and was giving different results than before for negative numbers. Because of that behavior changed when negative number was provided. Assert was added to detect cases when negative number is used and fix them.

We can remove assert but then we will not detect misuse and api states:

Non-negative waiting period to take the semaphore (in milliseconds), or one of the special values K_NO_WAIT and  K_FOREVER.

@pabigot
Copy link
Collaborator

pabigot commented Feb 21, 2020

The override of an invalid timeout to a timeout with defined behavior is there specifically to avoid unspecified behavior (probably a sleep for a very long time) when assertions are disabled, and was added because doing runtime validation of the parameter was rejected during review as being too expensive.

I'm ok with any of the proposed solutions. The one with least impact on existing behavior and code is deleting the assertion. I suspect the functional safety approved solution would be to validate the inputs at the system calls.

andrewboie pushed a commit that referenced this issue Feb 21, 2020
This assertion, if built in, allows users threads to crash
the kernel in a critical section by passing a negative timeout
value, creating a DoS attack vector.

Remove this assertion, immediately below it there's a check
which just resets it to 0 anyway.

Fixes: #22999

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants