-
Couldn't load subscription status.
- Fork 8.1k
serial: uart_native_pty: IRQ support #94478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1aa7fbf to
ff8994d
Compare
|
I wonder why CI didn't catch it last time. |
For |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes tests using the native_sim UART with pytest and the twister_harness Shell much slower than before.
As an example, running tests/drivers/can/host without this change:
INFO - 1/1 native_sim/native/64 drivers.can.host PASSED (native 1.865s <host>)
But with this change:
INFO - 1/1 native_sim/native/64 drivers.can.host PASSED (native 40.506s <host>)
This performance degradation is due to a locking issue in the shell code. Enabling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach is too complicated and heavy (i.e. no need for ring buffers).
I think instead something that would work:
When we want to check if there is something ready in stdin, we read 1 character.
If we read() == 0, we know we reached EOF. We flag it and never try to read again.
If we read() == -1, we know there was nothing there yet.
if we read() == 1, we save that character we just read ahead in data (data->char_read_ahead or whatever), and set a flag (data->stdin_ready_char or whatever).
Next time we want to check if there is data we first check this flag (if the flag is set we already answer yes, otherwise we do the check with one character above).
When there is an attempt to actually read data by the user (len > 0).
If that flag (data->stdin_ready_char) was set we first copy that character to the user buffer, clear data->stdin_ready_char, and continue trying to read len-1.
if the flag was not set, we attempt to read len. If we get read() == 0 => we flag eof.
That should solve the issue, and reduce a lot the number of syscalls.
We should probably check if we need to do the select() at all. Or if we can just use the read() to detect if data is available in stdin or not. (for a real pty we don't need the select).
How does removing the ring-buffer help. Would we not expect better performance with the ring-buffer (for example less system calls?) How would your approach help? The performance issue may be unrelated to the driver. Some thoughts:
|
0ca5771 to
5af186c
Compare
|
I identified the performance problem. The sleep in the polling thread caused high latency for the emulated TX interrupt used to write data. This, together with short writes issued by the shell backend (max. 8 bytes), resulted in throttling TX performance to <400 Bps. This caused the shell to be unable to keep up, which manifested, among other things, in failures to obtain the shell lock and dropped messages. The issue is now resolved by waiting for either the TX IRQ to be activated or a short period of time to poll for new input data. In addition, I simplified the I tried the following tests with this PR
@henrikbrixandersen I would appreciate if you can verify the performance of |
|
The updated patch greatly improves the performance, but it is still significantly slower than when using polling (which seems odd from a user perspective): As an example, running
But with this updated patch:
|
This behavior isn’t too surprising, under the hood, the IRQ emulation also relies on polling. However, it introduces an additional abstraction layer, which leads to significantly more context switches. Data is transferred in blocks of up to 8 bytes between the shell task and the IRQ emulation task, which then polls the data out. That said, this PR is not focused on performance improvements. Its goal is to enhance feature completeness, enabling testing of device drivers or other functionality that only supports IRQ mode. If shell performance is a concern for this sample, one can set |
Sure, but if introducing a new feature negatively impacts performance of an existing feature, we cannot just pull it in and say we'll address the introduced performance issue at a later time.
I reckon most developers will be using the native_sim UART for logging and shell functionality, so defaulting to polling there might not be a bad idea. I'll leave that up to @aescolar. |
|
@tpambor thanks. I was out yesterday, I will take a look at this next week and probably ping you in discord to ask about the motivation for some of the choices. |
|
Thanks @tpambor I tried to optimize the code a bit here: https://github.com/aescolar/zephyr/tree/pr_94478
With these changes that CAN test is "only" ~75% slower in real time (with the native_sim code using >2x the instructions). |
|
If you want to profile it you can do diff --git a/scripts/pylib/pytest-twister-harness/src/twister_harness/device/binary_adapter.py b/scripts/pylib/pytest-twister-harness/src/twister_harness/device/binary_adapter.py
index a6fe30da9ec..00ad7304878 100755
--- a/scripts/pylib/pytest-twister-harness/src/twister_harness/device/binary_adapter.py
+++ b/scripts/pylib/pytest-twister-harness/src/twister_harness/device/binary_adapter.py
@@ -43,7 +43,9 @@ class BinaryAdapterBase(DeviceAdapter, abc.ABC):
msg = 'Run command is empty, please verify if it was generated properly.'
logger.error(msg)
raise TwisterHarnessException(msg)
+ self.command= ["valgrind"] + ["--tool=callgrind"] + self.command
log_command(logger, 'Running command', self.command, level=logging.DEBUG)
+
try:
self._process = subprocess.Popen(self.command, **self.process_kwargs)
except subprocess.SubprocessError as exc:Run the test normally (after setting up things as described in https://github.com/zephyrproject-rtos/zephyr/blob/main/tests/drivers/can/host/README.rst#running-on-native_sim ) with |
I think that unless we can improve the IRQ mode performance to be practically the same as the polling mode (which I doubt) we will want to do this. |
|
Thank you @aescolar @henrikbrixandersen @tpambor for working on this. |
I thinks so yes. |
|
Thanks for the optimizations @aescolar. I pulled your changes and added another commit on top to change the shell by default to polling mode for uart_native_pty. |
Add support for the interrupt-driven API. Interrupts are emulated using a polling thread. Signed-off-by: Tim Pambor <tim.pambor@codewrights.de> Signed-off-by: Alberto Escolar Piedras <alberto.escolar.piedras@nordicsemi.no>
The interrupt-driven UART API is emulated via polling on native_sim, which introduces additional overhead. Defaulting to poll mode improves performance by avoiding this emulation cost. Signed-off-by: Tim Pambor <tim.pambor@codewrights.de>
|




Add support for the interrupt-driven API. Interrupts are emulated using a polling thread.
This was merged in a previous version as #93957 but caused problems, see #94425 and reverted in #94426.
The issue stemmed from false RX interrupts being triggered when
selectindicated that data was available to read, but the subsequent read operation failed. In the updated implementation, a dedicated polling thread now feeds incoming data into a ring buffer. Interrupts are no longer triggered directly by select; instead, the ring buffer is used to signal valid RX interrupts, ensuring more reliable and accurate handling.(Locally) this is now passing
samples/sensor/sensor_shellandtests/drivers/can/host, which previously failed in #94425.