-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hexagon] Asynchronous DMA support #12411
Conversation
72be034
to
cdbc2be
Compare
6691907
to
0247fd9
Compare
Attention @arangasa, please review. Are the |
Hi Krzysztof @kparzysz-quic , Could you please suggest Adam @adstraw ? Thank you. |
The guards are necessary: earlier architectures don't have DMA, and if a program attempts to execute a DMA instruction, it will trigger a hardware exception. For earlier architectures we could just do a synchronous memcpy. |
0247fd9
to
2375006
Compare
@kparzysz-quic and @arangasa I put the Hexagon arch check around the calls to the 1D sync DMA flow in the Hexagon Device API and also |
8ba2682
to
06b4ef8
Compare
06b4ef8
to
bd1f7d5
Compare
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurred to me that if we maintain only a single command queue / descriptor chain over the lifetime of a program, that eventually a DMA descriptor that was complete, may be "resurected" by the ring buffer as a new descriptor. E.g.
Ringbuffer[size=4]
Initial state:
desc1@0[finished] , desc2@1[finished] , desc3@2[finished] , desc4@3[finished]
New DMA needed:
desc5@0[inflight] , desc2@1[finished] , desc3@2[finished] , desc4@3[finished] , desc5@0[inflight]
With the last descriptor occupying the same memory space as the first.
Do you think this will be problematic? An integration tests with the UserDMA class to test this could reassure us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would change your example slightly to make the descriptor IDs zero based. This is how it works in the code. The first descriptor has ID=0 and uses ring buffer slot 0. The second has ID=1 and uses slot 1. When ID is equivalent to ring buffer size we reuse ring buffer slot 0. And so on.
Ringbuffer[size=4]
Initial state:
desc0@0[finished] , desc1@1[finished] , desc2@2[finished] , desc3@3[finished]
New DMA needed:
desc4@0[inflight] , desc1@1[finished] , desc2@2[finished] , desc3@3[finished]
The case you have described above should work OK. I believe we can cover it with RingBuffer
unit tests. I believe existing tests already hit this case, to some degree. I will add a test case to explicitly cover this scenario.
When we go to add desc4
we first check if the ring buffer is full by checking that InFlight
count (0) is less than the size of the ring buffer (4). All good. Then, we GetAddr
of the "next" ID = 4 which reuses ring buffer slot 0. Then, we increment the "next" ID to 5. Then we return a pointer to ring buffer slot 0.
There is one issue with this flow that I thought of ... the DMA descriptor at ring buffer slot 0 is "finished" and we simply return a pointer to it and expect that the user will add an "inflight" DMA descriptor at that location. If the user were to call InFlight
before adding the "inflight" descriptor then ring buffer slot 0 would be treated as "finished" from the perspective of the RingBuffer
class where it is still "inflight" from the perspective of the user.
I had initially coded an Add
function which took a function pointer which would add a T to the RingBuffer rather that Next
to get around this issue. It felt to complicated but it does expose a gap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, it's interesting you were thinking about corner cases because I woke up thinking about the Next
vs. Add
API as described above. And, I also realized there is a bug in the InFlight
method where we might increment the "oldest" ID beyond the "next" ID. Will add a test case that can hit that bug and fix it in the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, it's interesting you were thinking about corner cases because I woke up thinking about the Next vs. Add API as described above. And, I also realized there is a bug in the InFlight method where we might increment the "oldest" ID beyond the "next" ID. Will add a test case that can hit that bug and fix it in the next commit.
This is now fixed.
There is one issue with this flow that I thought of ... the DMA descriptor at ring buffer slot 0 is "finished" and we simply return a pointer to it and expect that the user will add an "inflight" DMA descriptor at that location. If the user were to call InFlight before adding the "inflight" descriptor then ring buffer slot 0 would be treated as "finished" from the perspective of the RingBuffer class where it is still "inflight" from the perspective of the user.
I decided to leave this as-is and encoded this corner case in a test called wrap_corner
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked with @csullivan and his point was less about the RingBuffer
class and more about the HexagonUserDMA engine and whether there would be any issue if we are overwriting descriptors (even if they are done
) in a HexagonUserDMA chain. I.e. is it OK to overwrite the head of the descriptor chain once it is done? Perhaps @kparzysz-quic or @arangasa have some comment. A test could not hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a unit test which overflows the HexagonUserDMA ring buffer for DMA descriptors. The test is named overflow_ring_buffer
. Also added a return of DMA_RETRY
from the HexagonUserDMA Copy
function in case the ring buffer is full. This passes the onus to the user of the class to retry the Copy
. The other way to handle it is for Copy
to Wait
until there is a free slot in the ring buffer to use. Not sure what is best at this moment. I am hoping the unit test provides the desired coverage. Would like to add integration tests in a separate test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good, thanks for the great feature and improvements over the past week! We can clarify the DMA ownership (remove static singleton) in a follow up PR after we refactor the overall resource ownership model.
|
||
#define DMA_SUCCESS 0 | ||
#define DMA_FAILURE -1 | ||
#define DMA_RETRY 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: A docstring on the meaning of DMA_RETRY would be helpful for quicker understanding.
Adds adds asynchronous DMA support through the Hexagon User DMA engine with unit tests to validate basic functionality. Asynchronous DMA support here means the ability to "kick off" asynchronously a number of DMAs using the Copy API and then to Poll for or Wait on a number of "in flight" (not done) DMAs. Enables future testing and development for asynchronous memory copy on Hexagon. For now, Hexagon DMA support remains synchronous in nature through existing hexagon_user_dma_1d_sync interface which uses asynchronous capable HexagonUserDMA class in a synchronous way --- calling Copy and Wait back to back for each request. * use ring buffer to store DMA descriptors * add RingBuffer class; used by HexUserDMA to store descriptors * add test to overflow the HexagonUserDMA ring buffer
@adstraw @kparzysz-quic It seems that this PR removed the guard |
Adds adds asynchronous DMA support through the Hexagon User DMA engine with unit tests to validate basic functionality. Asynchronous DMA support here means the ability to "kick off" asynchronously a number of DMAs using the
Copy
API and then toPoll
for orWait
on a number of "in flight" (not done) DMAs. Enables future testing and development for asynchronous memory copy on Hexagon. For now, Hexagon DMA support remains synchronous in nature through existinghexagon_user_dma_1d_sync
interface which uses asynchronous capableHexagonUserDMA
class in a synchronous way --- callingCopy
andWait
back to back for each request.cc @mehrdadh