Skip to content

Conversation

@koalatux
Copy link
Contributor

@koalatux koalatux commented Apr 1, 2025

Add a reboot implementation for the native sim which restarts the current executable keeping the command line arguments after closing all open file descriptors.

Test it for example with: west build -p -b native_sim samples/subsys/shell/shell_module -t run -DCONFIG_REBOOT=y -DCONFIG_POSIX_API=n and then enter in the zephyr shell kernel reboot.

Note: The pseudotty disconnects while the application restarts.

I could not get it working with CONFIG_POSIX_API enabled.

Probably only works on Linux because it relies on the proc filesystem.

@github-actions github-actions bot added the area: native port Host native arch port (native_sim) label Apr 1, 2025
@github-actions github-actions bot requested a review from aescolar April 1, 2025 16:04
@koalatux koalatux force-pushed the af/native-sim-reboot branch 5 times, most recently from c2e3845 to e4f4a85 Compare April 2, 2025 12:05
Copy link
Member

@aescolar aescolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @koalatux
The reason why it does not build with the POSIX_API and why it won't work in some other cases, is because some of calls you intend to do to the Linux host are getting linked into the embedded code.
You need to split the reboot file in 2 files.
One of them, containing the code that interacts with the host side, which needs to be compiled in the native_simulator runner context.
The other, which provides sys_arch_reboot() and calls into a function provided from the first one, which is compiled with the embedded side of things.
Similarly as how we do with the native_sim drivers which interact with the host:
https://github.com/zephyrproject-rtos/zephyr/blob/main/drivers/entropy/CMakeLists.txt#L22-L27

see this branch: https://github.com/aescolar/zephyr/tree/native-sim-reboot

@aescolar
Copy link
Member

aescolar commented Apr 3, 2025

Probably only works on Linux because it relies on the proc filesystem.

native_sim only works on Linux, so no issue there :)

Note: The pseudotty disconnects while the application restarts.

A new file descriptor will be opened for the new pty. So that would be expected. But if one uses --attach_uart the new PTY will be attached in the same way as the first.

@koalatux
Copy link
Contributor Author

koalatux commented Apr 3, 2025

@aescolar thanks a lot for your explanations, your branch looks quite good to me.

Just one question, is it guaranteed that maybe_reboot() will only be called once and not from multiple threads simultaneously? Or can we ignore such a case?

Also I found there is a more efficient close_range(), which is available since Linux 5.9 and glibc 2.34. Is it ok to have this as a requirement?

@aescolar
Copy link
Member

aescolar commented Apr 3, 2025

Just one question, is it guaranteed that maybe_reboot() will only be called once and not from multiple threads simultaneously? Or can we ignore such a case?

It should only be called from the first native simulator thread (which takes care of running the HW models/HW scheduling)

Also I found there is a more efficient close_range(), which is available since Linux 5.9 and glibc 2.34. Is it ok to have this as a requirement?

Better no. Lot's of users out there run on old kernels, and the performance benefit is really irrelevant in this case.

By the way, if I do

diff --git a/samples/hello_world/src/main.c b/samples/hello_world/src/main.c
index c550ab461cb..2a4f295186e 100644
--- a/samples/hello_world/src/main.c
+++ b/samples/hello_world/src/main.c
@@ -5,10 +5,13 @@
  */
 
 #include <stdio.h>
+#include <zephyr/kernel.h>
+#include <zephyr/sys/reboot.h>
 
 int main(void)
 {
        printf("Hello World! %s\n", CONFIG_BOARD_TARGET);
-
+       k_sleep(K_SECONDS(1));
+       sys_reboot(0);
        return 0;
 }

And run it as zephyr.exe --no-rt (so it reboots in a loop very fast) I see crashes every now and then (while not using ASAN). I haven't dug enough to figure out why. But you are welcome to. At this point I'm guessing a glibc race.
Anyhow, even with that occasional rare crash, I think it would be fine.

@koalatux
Copy link
Contributor Author

koalatux commented Apr 3, 2025

And run it as zephyr.exe --no-rt (so it reboots in a loop very fast) I see crashes every now and then (while not using ASAN). I haven't dug enough to figure out why. But you are welcome to. At this point I'm guessing a glibc race.
Anyhow, even with that occasional rare crash, I think it would be fine.

I get

libgcc_s.so.1 must be installed for pthread_exit to work
Aborted (core dumped)

from time to time

@koalatux koalatux force-pushed the af/native-sim-reboot branch from e4f4a85 to 3934c70 Compare April 3, 2025 16:07
@koalatux
Copy link
Contributor Author

koalatux commented Apr 3, 2025

@aescolar I pushed the changes including the ones from your branch. Do you want me to add a Co-authored-by: line to the commit message?

@aescolar
Copy link
Member

aescolar commented Apr 3, 2025

I get..

same here (ubuntu 24.04, kernel 6.8.0-57, glibc 2.39-0ubuntu8.4)

@aescolar
Copy link
Member

aescolar commented Apr 3, 2025

Do you want me to add a Co-authored-by:

Up to you

config NATIVE_SIM_REBOOT
bool "Reboot native sim"
depends on REBOOT
default y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
default y
default n

This does not work with valgrind, and I guess with some other tools.
And I'd expect that a few developers only trigger reboots in error conditions (so they will prefer native_sim to error out), so I'm inclined to leave the default as n.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it. I did not have this use case in mind, I'm using native_sim to test behaviour where the real hardware also does a reboot.

@koalatux
Copy link
Contributor Author

koalatux commented Apr 4, 2025

I get..

same here (ubuntu 24.04, kernel 6.8.0-57, glibc 2.39-0ubuntu8.4)

Debian 12 (bookworm), kernel 6.12.12+bpo-amd64, glibc 2.36-9+deb12u10

@koalatux
Copy link
Contributor Author

koalatux commented Apr 4, 2025

Just one question, is it guaranteed that maybe_reboot() will only be called once and not from multiple threads simultaneously? Or can we ignore such a case?

It should only be called from the first native simulator thread (which takes care of running the HW models/HW scheduling)

I also assume native_set_reboot_on_exit() will also only be called only once, because of the call to irq_lock() before sys_arch_reboot() in lib/os/reboot.c.

@koalatux koalatux force-pushed the af/native-sim-reboot branch 2 times, most recently from d0772e8 to 30b1c85 Compare April 4, 2025 08:48
@aescolar
Copy link
Member

aescolar commented Apr 4, 2025

I also assume native_set_reboot_on_exit() will also only be called only once,

Yes, unless somebody calls that API from somewhere else. But it should not really matter how many times it is called as it only sets that flag to true.

@koalatux
Copy link
Contributor Author

koalatux commented Apr 4, 2025

By the way, if I do

diff --git a/samples/hello_world/src/main.c b/samples/hello_world/src/main.c
index c550ab461cb..2a4f295186e 100644
--- a/samples/hello_world/src/main.c
+++ b/samples/hello_world/src/main.c
@@ -5,10 +5,13 @@
  */
 
 #include <stdio.h>
+#include <zephyr/kernel.h>
+#include <zephyr/sys/reboot.h>
 
 int main(void)
 {
        printf("Hello World! %s\n", CONFIG_BOARD_TARGET);
-
+       k_sleep(K_SECONDS(1));
+       sys_reboot(0);
        return 0;
 }

And run it as zephyr.exe --no-rt (so it reboots in a loop very fast) I see crashes every now and then (while not using ASAN). I haven't dug enough to figure out why. But you are welcome to. At this point I'm guessing a glibc race. Anyhow, even with that occasional rare crash, I think it would be fine.

The crash seems to only happen if the reboot is done really early at startup. Adding a bit more delay like the following is enough for the crash not to happen (it is now running since hours on my machine):

	for (int i = 0; i<100; i++) {
		k_sleep(K_SECONDS(1));
	}

@koalatux koalatux requested a review from aescolar April 4, 2025 11:40
aescolar
aescolar previously approved these changes Apr 4, 2025
@aescolar
Copy link
Member

aescolar commented Apr 4, 2025

The CI failure is due to #88159
Once that is merged we need to retrigger the failed CI job in this PR

@koalatux
Copy link
Contributor Author

koalatux commented Apr 4, 2025

The crash seems to only happen if the reboot is done really early at startup. Adding a bit more delay like the following is enough for the crash not to happen (it is now running since hours on my machine):

still crashed after 205m23.264s and 2931398 reboots 😞

Add a reboot implementation for the native_sim target which restarts
the current executable keeping the command line arguments after closing
all open file descriptors.

Signed-off-by: Adrian Friedli <adrian.friedli@husqvarnagroup.com>
Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
Co-authored-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
@kartben kartben merged commit ffdc50c into zephyrproject-rtos:main Apr 29, 2025
23 checks passed
@aescolar
Copy link
Member

@koalatux why was it that this code was closing all open file descriptors?

@koalatux
Copy link
Contributor Author

@koalatux why was it that this code was closing all open file descriptors?

The reason for closing all open file descriptors is, that they are not leaked into the restarted process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: native port Host native arch port (native_sim)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants