-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rtio: pollable devices #62605
base: main
Are you sure you want to change the base?
rtio: pollable devices #62605
Conversation
The doc comment relating to mpsc atomics was worded poorly. Remove the poorly worded doc comment related to atomics and caches. Signed-off-by: Tom Burdick <thomas.burdick@intel.com>
In some instances IO devices are better off being polled for work completion or moving state machines forward. Shifting the responsibility of device polling to the context avoids duplicating this possible need in each iodev. In addition it allows for submit to be simply a queue push while poll is where real work is done if needed avoiding a possible deep call stack where submitting results in more submits and interrupts. Signed-off-by: Tom Burdick <thomas.burdick@intel.com>
@kmeinhar maybe take a look at the idea here and comment if interesting |
Pollable devices sounds like a great addition to the RTIO subsystem! Sadly for now we moved away from the RTIO system and implemented something similar to such a pollable device API. In our use case we have a sensor that was accessed by multiple Zephyr threads which might access different sensor values. I am only a junior dev so I might have missed how the RTIO system could be used in such a scenario. If you have an idea I am very open to it! |
In the scenario you describe you might have a struct rtio per thread You could view each rtio context as a a single producer/consumer where the requests get fanned out to iodevs and the completions get fanned in from iodevs. So if you wish to have multiple threads accessing the same device you'd have multiple contexts, that's a supported path. Notably these things do not need to be threads though, they can be ISR driven state machines as well. Which is beneficial in the case of sensors where you may wish to have the sensor itself drive a context without needing to allocate stack for a thread. E.g. sensor triggers fifo wm/motion detect/etc, in the gpio isr you can generate a spi read request, on completion of that you can generate another spi read request, and so on. Ultimately maybe feeding back to a thread asking to read the FIFO on a sensor provided trigger. See #60063 |
Thanks for the suggestion! Regarding the polling API I would like to explain how I understand it's design. That would probably make it easier to understand where I am coming from with my feedback: My questions regarding this approach:
For my understanding it would help to see this API implemented by a driver. |
Thats exactly right, every execution context ( a thread is one such context) has its own rtio context, and each iodev accepts submissions from them all ideally in a lock-free manner (the lockfree mpsc queue). An example can be seen I suppose in the spi_sam.c driver, which has a builtin context for dealing with blocking calls, but also accepts submission from any other rtio context like any number of sensor drivers. On the tdk_robokit1 which I did a lot of the initial work that would mean for example the icm42688 imu sensor. The thinking I had was the imu sensor could drive a state machine through interrupts. The cost of starting a SPI transfer is low (a few register writes) so why is a whole context swap to a thread/work queue, and the potential downsides of each, needed at all. Let the IMU driver directly start a transfer or put it in the queue in the gpio ISR. So in a way, this state machine starts from the gpio ISR (e.g. signaling a fifo watermark or data ready event), the state machine starts a SPI transfer, the SPI transfer completes and then gives the data to a thread where actual computation can be done. That avoids the entire burden of either having a thread per sensor and its associated trigger, or trying to queue everything up in a work queue with head line blocking from say... one slow transfer to a sensor on an i2c bus. The async calls weren't enough for the above though, as the drivers don't have a queue of requests but a queue of pending threads to take ownership for blocking calls. So the idea to shift to a queue of operations rather than a queue of threads made good sense to me, can add to the queue of requests from any context given its lock-free! Bonus, some hardware lets you have hardware run the queue itself (DMA with intelligent peripheral handling, e.g. CS line selects and all can be done in DMA requests). Meaning more work is done by the hardware and the CPU is doing even less. I think some diagrams in the RTIO docs would greatly help clarify this point perhaps. By fanning out, I mean a single rtio_submit can result in concurrent I/O operations happening from the callers perspective, where each submission is in effect like a little task you've asked to happen (I/O specific mind you) and these may take varying amounts of time to complete. E.g. sqe0 is reading 6 bytes from an i2c connected sensor rtio_submit() is called From the callers perspective, unless you've linked any of these submissions, they all are sent in parallel to the peripheral devices (iodevs) from one call context, and now you can sleep your thread dealing with the results as they come in... rtio_cqe_consume_block(); If you start linking sqe's or creating transactions, then other rules apply. They become a chain of futures in effect, where sqe0 perhaps finishes and then sqe1 starts, sqe1 finishes and sqe2 starts, and so on. If anything fails then the remaining requests in the chain also fail.
Thats absolutely spot on, imagine the above scenario again with i2c, spi, and uart involvement but you'd like to not use interrupts because maybe you are bit banging, or maybe the peripheral IP doesn't have a sensible FIFO (e.g. sam's peripheral blocks), or maybe interrupts would be detrimental to performance/power management and you'd rather poll these things. rtio_poll would help solve that exact scenario. You fan out N requests, M require polling, so the calling context continuously polls (from the thread, with whatever priority it has) the devices ensuring that progress is made by, hopefully, directly polling hardware registers and updating things as necessary.
Submissions to iodevs are always FIFO, as you noted in #60767 there's no priority boosting of I/O requests today. That's effectively true of the blocking APIs in Zephyr as well though, as SPI uses a semaphore, and k_mutex needs to have some Kconfigs I believe set to actually perform priority inheritance and boosting. If you want to maintain ordering of submissions across iodevs you need to use chaining/linking.
I was tempted and considered allowing the iodev poll call to return something like this, to signal the caller that polling is not yet done. This might make some good sense actually as maybe poll is really fast and pushing/popping from an mpsc queue could actually slow it down. Otherwise my thought was, something calls iodev's poll function, and if it needs to be polled again the iodev re-adds itself to the poll queue of the callers rtio context.
Absolutely, its on my long TODO list, the sam spi driver would be a good start as it already has a janky blocking call with short SPI transfers today. |
@yperess ping |
Sorry, haven't had a chance to write the unit tests for this PR yet |
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
Removing stale label so it's kept open until we turn it into an issue. |
Not every I/O peripheral is best suited to be driven by interrupts and at times polling is the best option. Being able to move work out of the submit call avoids deep call trees and enables hardware polling for I/O completion or incrementally moving states in a state machine an I/O device (iodev) may internalize.
This adds the initial API without implementing much of the structural work needed to support polling mode devices.
The idea is that on submit if the iodev needs to be polled it adds itself to the contexts poll queue. This poll queue is then iterated over at some point in the future. This iteration could be done in a timer ISR, a dedicated thread, a work queue task, or any other sort of delayed future work sort of setup. That includes doing it directly in line after something like rtio_submit with a blocking wait count is called or rtio_cqe_consume (blocking or not) is called.
No specified polling iterator is setup here as its not clear what the best option, if there even is a best general option, would be.
A use case in tree today is the sam spi driver which may best be served by doing blocking spi transcieves for small transfers which does not make sense to setup interrupts for. In this scenario a small transfer request would result in the iodev adding itself to the poll queue, and only performing the transfer once polled. Further polling the SPI peripheral registers and moving the transfer forward each poll call. In effect it would externalize some of the looping that occurs in transceive.
This way rather than blocking on submit (which is wrong) the work is effectively shoved off until it makes sense to do it in the poll call. Each device may break up the blocking work into multiple steps, e.g. poll hardware registers, move the work forward, repeating until completion.
As an added benefit this removes the deeply nested call tree that could result from a blocking call like this, where submit blocks, does work, and calls complete, resulting in more work perhaps being started for the same iodev (inadvertent recursion/deep call stack).