serial: uart_native_pty: IRQ support #94478

tpambor · 2025-08-13T20:26:28Z

Add support for the interrupt-driven API. Interrupts are emulated using a polling thread.

This was merged in a previous version as #93957 but caused problems, see #94425 and reverted in #94426.

The issue stemmed from false RX interrupts being triggered when select indicated that data was available to read, but the subsequent read operation failed. In the updated implementation, a dedicated polling thread now feeds incoming data into a ring buffer. Interrupts are no longer triggered directly by select; instead, the ring buffer is used to signal valid RX interrupts, ensuring more reliable and accurate handling.

(Locally) this is now passing samples/sensor/sensor_shell and tests/drivers/can/host, which previously failed in #94425.

dcpleung · 2025-08-13T22:09:16Z

I wonder why CI didn't catch it last time.

tpambor · 2025-08-13T23:39:40Z

I wonder why CI didn't catch it last time.

tests/drivers/can/host isn't run in CI.

For samples/sensor/sensor_shell, there was also 7c0e4ae, which helped with the symptoms and was sufficient to pass. With this PR, the sensor shell would actually pass with 7c0e4ae reverted but I think it doesn't do no bad, so there is no reason to revert.

henrikbrixandersen

This change makes tests using the native_sim UART with pytest and the twister_harness Shell much slower than before.

As an example, running tests/drivers/can/host without this change:

INFO - 1/1 native_sim/native/64 drivers.can.host PASSED (native 1.865s <host>)

But with this change:

INFO - 1/1 native_sim/native/64 drivers.can.host PASSED (native 40.506s <host>)

tpambor · 2025-08-14T08:35:00Z

This change makes tests using the native_sim UART with pytest and the twister_harness Shell much slower than before.

As an example, running tests/drivers/can/host without this change:

INFO - 1/1 native_sim/native/64 drivers.can.host PASSED (native 1.865s <host>)

But with this change:

INFO - 1/1 native_sim/native/64 drivers.can.host PASSED (native 40.506s <host>)

This performance degradation is due to a locking issue in the shell code. Enabling CONFIG_LOG=y and by default CONFIG_SHELL_LOG_BACKEND=y which changes how locking in shell works, results in performance comparable to the polling implementation.

aescolar

I think this approach is too complicated and heavy (i.e. no need for ring buffers).
I think instead something that would work:
When we want to check if there is something ready in stdin, we read 1 character.
If we read() == 0, we know we reached EOF. We flag it and never try to read again.
If we read() == -1, we know there was nothing there yet.
if we read() == 1, we save that character we just read ahead in data (data->char_read_ahead or whatever), and set a flag (data->stdin_ready_char or whatever).
Next time we want to check if there is data we first check this flag (if the flag is set we already answer yes, otherwise we do the check with one character above).

When there is an attempt to actually read data by the user (len > 0).
If that flag (data->stdin_ready_char) was set we first copy that character to the user buffer, clear data->stdin_ready_char, and continue trying to read len-1.
if the flag was not set, we attempt to read len. If we get read() == 0 => we flag eof.

That should solve the issue, and reduce a lot the number of syscalls.
We should probably check if we need to do the select() at all. Or if we can just use the read() to detect if data is available in stdin or not. (for a real pty we don't need the select).

ghost · 2025-08-14T11:11:00Z

I think this approach is too complicated and heavy (i.e. no need for ring buffers). I think instead something that would work: When we want to check if there is something ready in stdin, we read 1 character. If we read() == 0, we know we reached EOF. We flag it and never try to read again. If we read() == -1, we know there was nothing there yet. if we read() == 1, we save that character we just read ahead in data (data->char_read_ahead or whatever), and set a flag (data->stdin_ready_char or whatever). Next time we want to check if there is data we first check this flag (if the flag is set we already answer yes, otherwise we do the check with one character above).

When there is an attempt to actually read data by the user (len > 0). If that flag (data->stdin_ready_char) was set we first copy that character to the user buffer, clear data->stdin_ready_char, and continue trying to read len-1. if the flag was not set, we attempt to read len. If we get read() == 0 => we flag eof.

That should solve the issue, and reduce a lot the number of syscalls. We should probably check if we need to do the select() at all. Or if we can just use the read() to detect if data is available in stdin or not. (for a real pty we don't need the select).

How does removing the ring-buffer help. Would we not expect better performance with the ring-buffer (for example less system calls?) How would your approach help?

The performance issue may be unrelated to the driver.

Some thoughts:

Maybe someone could make an analysis using tracing
can_shell is storing the const struct shell * and using it later from the work-queue to implement "async"-shell commands. This seems a bit unique to me, not sure if that way of doing "async"-shell commands is supported/recommended. An attempt to make them synchronous was rejected.
There is probably a deadlock (or live lock) situation in the can_shell, that can be worked around by CONFIG_SHELL_TX_TIMEOUT_MS=0 but needs actual fixing due degrading performance with that approach, when CONFIG_LOG is disabled. Without CONFIG_SHELL_TX_TIMEOUT_MS=0 we have the deadlock with CONFIG_LOG disabled.

tpambor · 2025-08-15T08:01:12Z

I identified the performance problem. The sleep in the polling thread caused high latency for the emulated TX interrupt used to write data. This, together with short writes issued by the shell backend (max. 8 bytes), resulted in throttling TX performance to <400 Bps.

This caused the shell to be unable to keep up, which manifested, among other things, in failures to obtain the shell lock and dropped messages.

The issue is now resolved by waiting for either the TX IRQ to be activated or a short period of time to poll for new input data.

In addition, I simplified the poll_in functions and dropped the select call, so that this works now only using read.

I tried the following tests with this PR

tests/drivers/can/host - Tests pass, performance improved over polling
samples/sensor/sensor_shell - Tests pass
samples/sensor/sensor_shell with 7c0e4ae reverted - Tests pass
(sleep 0.01; echo -e "help\n") | ./build/zephyr/zephyr.exe -uart_stdinout | head -n 80 - works as expected
samples/subsys/shell/shell_module + echo -e "help\n" | ./build/zephyr/zephyr.exe -uart_stdinout | head -n 80 - no output, same as before, see Shell: Race during init w CONFIG_SHELL_BACKEND_SERIAL_API_INTERRUPT_DRIVEN=y #94436
samples/subsys/shell/shell_module + ./build/zephyr/zephyr.exe -uart_stdinout - shell works over stdio as expected
samples/subsys/shell/shell_module + ./build/zephyr/zephyr.exe - shell works over pty as expected

@henrikbrixandersen I would appreciate if you can verify the performance of tests/drivers/can/host from your side

henrikbrixandersen · 2025-08-15T10:17:40Z

The updated patch greatly improves the performance, but it is still significantly slower than when using polling (which seems odd from a user perspective):

As an example, running tests/drivers/can/host without this change:

INFO - 1/1 native_sim/native/64 drivers.can.host PASSED (native 1.865s <host>)

But with this updated patch:

INFO - 1/1 native_sim/native/64 drivers.can.host PASSED (native 3.797s <host>)

tpambor · 2025-08-15T10:51:12Z

The updated patch greatly improves the performance, but it is still significantly slower than when using polling (which seems odd from a user perspective):

As an example, running tests/drivers/can/host without this change:

INFO - 1/1 native_sim/native/64 drivers.can.host PASSED (native 1.865s <host>)

But with this updated patch:

INFO - 1/1 native_sim/native/64 drivers.can.host PASSED (native 3.797s <host>)

This behavior isn’t too surprising, under the hood, the IRQ emulation also relies on polling. However, it introduces an additional abstraction layer, which leads to significantly more context switches. Data is transferred in blocks of up to 8 bytes between the shell task and the IRQ emulation task, which then polls the data out.

That said, this PR is not focused on performance improvements. Its goal is to enhance feature completeness, enabling testing of device drivers or other functionality that only supports IRQ mode.

If shell performance is a concern for this sample, one can set SHELL_BACKEND_SERIAL_INTERRUPT_DRIVEN=n. Alternatively, we could consider making this the default for native-sim, so there would be no performance regressions by default, and users could opt in to use IRQ mode when needed.

henrikbrixandersen · 2025-08-15T11:18:39Z

That said, this PR is not focused on performance improvements. Its goal is to enhance feature completeness, enabling testing of device drivers or other functionality that only supports IRQ mode.

Sure, but if introducing a new feature negatively impacts performance of an existing feature, we cannot just pull it in and say we'll address the introduced performance issue at a later time.

If shell performance is a concern for this sample, one can set SHELL_BACKEND_SERIAL_INTERRUPT_DRIVEN=n. Alternatively, we could consider making this the default for native-sim, so there would be no performance regressions by default, and users could opt in to use IRQ mode when needed.

I reckon most developers will be using the native_sim UART for logging and shell functionality, so defaulting to polling there might not be a bad idea. I'll leave that up to @aescolar.

aescolar · 2025-08-16T08:43:27Z

@tpambor thanks. I was out yesterday, I will take a look at this next week and probably ping you in discord to ask about the motivation for some of the choices.

aescolar · 2025-08-18T15:42:49Z

Thanks @tpambor I tried to optimize the code a bit here: https://github.com/aescolar/zephyr/tree/pr_94478
and fix a couple of issues. The changes are:

Not change stdin and stdout to non-blocking
Not start/stop the thread when there is no work, but instead let it sleep for ever/wake it. (start/stoping threads has a quite big overhead). Removed the semaphore and just k_sleep and k_wakeup instead.
The irq thread loop tries to be a bit leaner and hopefully waits more when possible
The wait time is now 10ms instead of 1ms when there is no data (20ms or even higher would also be ok: if there was no data, the chance of having data in a couple of ms is very low. And anyhow, the default tick period is 10ms)
Removed the ringbuffer use and instead try to read only one char ahead to detect data/eof. The greatest majority (~99%) of the attempts there won't be anything to read, and when there is, it would be better to let the user read as much data as it can on its own.
The WARNing on not having callback is changed to an error (otherwise I'd expect an infinite loop in that case)
The fifo writing now writes once n bytes instead of looping per byte.
Once stdin disconnects, it stops trying to do any rx work all together on it (incl. scheduling the thread)
(Compared to the original code: select is replaced with poll so check for data takes 1/3 of the cycles)

With these changes that CAN test is "only" ~75% slower in real time (with the native_sim code using >2x the instructions).
The way the shell runs in IRQ mode is quite heavy overall.

aescolar · 2025-08-18T15:48:01Z

If you want to profile it you can do

diff --git a/scripts/pylib/pytest-twister-harness/src/twister_harness/device/binary_adapter.py b/scripts/pylib/pytest-twister-harness/src/twister_harness/device/binary_adapter.py
index a6fe30da9ec..00ad7304878 100755
--- a/scripts/pylib/pytest-twister-harness/src/twister_harness/device/binary_adapter.py
+++ b/scripts/pylib/pytest-twister-harness/src/twister_harness/device/binary_adapter.py
@@ -43,7 +43,9 @@ class BinaryAdapterBase(DeviceAdapter, abc.ABC):
             msg = 'Run command is empty, please verify if it was generated properly.'
             logger.error(msg)
             raise TwisterHarnessException(msg)
+        self.command= ["valgrind"] + ["--tool=callgrind"] + self.command
         log_command(logger, 'Running command', self.command, level=logging.DEBUG)
+
         try:
             self._process = subprocess.Popen(self.command, **self.process_kwargs)
         except subprocess.SubprocessError as exc:

Run the test normally (after setting up things as described in https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/drivers/can/host/README.rst#running-on-native_sim ) with
twister -v -p native_sim/native/64 -X can:zcan0 -T tests/drivers/can/host/ -n
find the latest callgrind output:
find twister-out -name "callgrind*"
and open it with kcachegrind
kcachegrind ./twister-out/native_sim_native_64/host/tests/drivers/can/host/drivers.can.host/callgrind.out.xxxxxx &

aescolar · 2025-08-18T15:49:35Z

Alternatively, we could consider making this the default for native-sim, so there would be no performance regressions by default, and users could opt in to use IRQ mode when needed.

I think that unless we can improve the IRQ mode performance to be practically the same as the polling mode (which I doubt) we will want to do this.

aescolar · 2025-08-18T16:00:38Z

Here profiling of that tests/drivers/can/host/ showing the thread loop, note the amount of calls from the shell into the driver for Tx and so forth:

ghost · 2025-08-19T07:41:16Z

Thank you @aescolar @henrikbrixandersen @tpambor for working on this.
So the approach would be to merge your developements of the irq-mode @aescolar but make it default to polling mode on native_sim so as not to impact performance of most users, that do not want to explicitly test a driver that requires irq-mode?

aescolar · 2025-08-19T08:11:58Z

So the approach would be to merge your developements of the irq-mode @aescolar but make it default to polling mode on native_sim so as not to impact performance of most users, that do not want to explicitly test a driver that requires irq-mode?

I thinks so yes.

tpambor · 2025-08-19T09:52:11Z

Thanks for the optimizations @aescolar. I pulled your changes and added another commit on top to change the shell by default to polling mode for uart_native_pty.

Add support for the interrupt-driven API. Interrupts are emulated using a polling thread. Signed-off-by: Tim Pambor <tim.pambor@codewrights.de> Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>

The interrupt-driven UART API is emulated via polling on native_sim, which introduces additional overhead. Defaulting to poll mode improves performance by avoiding this emulation cost. Signed-off-by: Tim Pambor <tim.pambor@codewrights.de>

sonarqubecloud · 2025-08-19T12:12:45Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

aescolar · 2025-08-19T12:16:41Z

@dcpleung @jakub-uC given how the shell interacts with this driver in irq mode, I wonder if there is something misunderstood about the driver API in either side..

dcpleung · 2025-08-19T16:28:00Z

@dcpleung @jakub-uC given how the shell interacts with this driver in irq mode, I wonder if there is something misunderstood about the driver API in either side..

Come to think of it, maybe using UART ASYNC API would be better for this then the pure interrupt API?

zephyrbot added area: native port Host native arch port (native_sim) area: UART Universal Asynchronous Receiver-Transmitter area: CAN labels Aug 13, 2025

zephyrbot requested review from aescolar, alexanderwachter, dcpleung, henrikbrixandersen, martinjaeger and str4t0m August 13, 2025 20:27

zephyrbot assigned aescolar Aug 13, 2025

tpambor force-pushed the pty-irq-2 branch 3 times, most recently from 1aa7fbf to ff8994d Compare August 13, 2025 21:10

henrikbrixandersen requested changes Aug 14, 2025

View reviewed changes

tpambor requested a review from jakub-uC August 14, 2025 08:45

aescolar requested changes Aug 14, 2025

View reviewed changes

henrikbrixandersen mentioned this pull request Aug 14, 2025

test: drivers: can: Always wait for TX to complete #94474

Closed

tpambor force-pushed the pty-irq-2 branch 4 times, most recently from 0ca5771 to 5af186c Compare August 15, 2025 07:58

tpambor requested review from aescolar and henrikbrixandersen August 15, 2025 08:01

tpambor force-pushed the pty-irq-2 branch from 5af186c to 201054c Compare August 19, 2025 09:46

zephyrbot added the area: Shell Shell subsystem label Aug 19, 2025

zephyrbot requested a review from carlescufi August 19, 2025 09:46

tpambor added 2 commits August 19, 2025 13:46

serial: uart_native_pty: IRQ support

188dd91

Add support for the interrupt-driven API. Interrupts are emulated using a polling thread. Signed-off-by: Tim Pambor <tim.pambor@codewrights.de> Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>

aescolar force-pushed the pty-irq-2 branch from 201054c to 1067efc Compare August 19, 2025 11:46

aescolar approved these changes Aug 19, 2025

View reviewed changes

henrikbrixandersen approved these changes Aug 19, 2025

View reviewed changes

dcpleung approved these changes Aug 19, 2025

View reviewed changes

kartben merged commit e3a99bd into zephyrproject-rtos:main Aug 19, 2025
34 of 36 checks passed

tpambor mentioned this pull request Oct 8, 2025

MAINTAINERS: Add tpambor as collaborator #97186

Merged

Uh oh!

serial: uart_native_pty: IRQ support #94478

serial: uart_native_pty: IRQ support #94478

Uh oh!

Conversation

tpambor commented Aug 13, 2025

Uh oh!

dcpleung commented Aug 13, 2025

Uh oh!

tpambor commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henrikbrixandersen left a comment

Choose a reason for hiding this comment

Uh oh!

tpambor commented Aug 14, 2025

Uh oh!

aescolar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost commented Aug 14, 2025 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tpambor commented Aug 15, 2025

Uh oh!

henrikbrixandersen commented Aug 15, 2025

Uh oh!

tpambor commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henrikbrixandersen commented Aug 15, 2025

Uh oh!

aescolar commented Aug 16, 2025

Uh oh!

aescolar commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aescolar commented Aug 18, 2025

Uh oh!

aescolar commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aescolar commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Aug 19, 2025

Uh oh!

aescolar commented Aug 19, 2025

Uh oh!

tpambor commented Aug 19, 2025

Uh oh!

sonarqubecloud bot commented Aug 19, 2025

Quality Gate passed

Uh oh!

aescolar commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcpleung commented Aug 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

tpambor commented Aug 13, 2025 •

edited

Loading

aescolar left a comment •

edited

Loading

ghost commented Aug 14, 2025 •

edited by ghost

Loading

tpambor commented Aug 15, 2025 •

edited

Loading

aescolar commented Aug 18, 2025 •

edited

Loading

aescolar commented Aug 18, 2025 •

edited

Loading

aescolar commented Aug 18, 2025 •

edited

Loading

aescolar commented Aug 19, 2025 •

edited

Loading