-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dma-trace: Align dma trace buffer correctly, fixed Zephyr dtrace issue #5329
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first one needs to be made platform-specific, but otherwise look good and test results show this is fixing #5120 which has been causing a lot of CI/validation pain.
3e51b82
to
f37830a
Compare
Force DMA buffer to be correctly aligned. If the buffer is only 64-byte aligned, when using hda-dma, then the last 64-bytes of the DMA buffer is not transferred correctly to host side. Signed-off-by: Jyri Sarha <jyri.sarha@intel.com>
Add check for DMA buffer address alignment to hda_dma_set_config(). Wrongly aligned DMA buffer does not work correctly. Signed-off-by: Jyri Sarha <jyri.sarha@intel.com>
It is possible that that in trace_work() sof_ipc_dma_trace_posn ipc message reaches host side sof driver before the dma_copy_to_host_nowait() is finished. Change the dma_copy_to_host_nowait() implementation to dma_copy_to_host() by adding DMA_COPY_BLOCKING flag to dma_copy() command. The flag is already there on the !defined(CONFIG_DMA_GW) version of the function and the old name of the function was a bit misleading. This commit changes also probe.c probe_task() so that it also uses the new blocking DMA copy. I could not see any justification to use nowait version there either. Signed-off-by: Jyri Sarha <jyri.sarha@intel.com>
f37830a
to
0f65d69
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems very reasonable from my perspective. @teburd will want to note this behavior for the Zephyr driver too. In Zephyrese that would probably be expressed as an assertion in the driver where it picks up the buffer pointers I guess, though a soft error isn't impossible (not sure if there's a return code where needed in our DMA API).
So... do we get to claim that Zephyr saw this problem more severely because its allocator is more fragmentation-resistant and thus more likely to produce a misaligned output buffer? It pleases me to think that, anyway.
Yes, I had this in mind and had an assertion on alignment in fact in a first pass as I noted the alignment mentioned in the docs, before I got sucked into a rabbit hole of getting a test working. |
This is fascinating: the bug disappeared in the latest daily test run without merging this PR. The bug saw the PR, got scared and ran away. It's magic.
|
XTOS works just by change, or actually because of its different heap implementation. If one enforces the DMA buffer to be only 64-byte aligned but not 128-byte aligned (the way it usually is in the Zephyr version), the xtos version fails as miserably as Zephyr. |
My main point was: this PR does not fix the issue at all in my (Zephyr) configuration. I'm testing different Zephyr versions now. EDIT: I combined this PR with all the SOF and Zephyr commits used by the daily runs references above. In every combination the sof-logger prints This is with kernel 920d034ba4f2 in case that matters. I don't have any TGL device at home. |
I tried to put together a branch with all the mentioned pieces, even used the same linux hash, and tested that on my TGL board, but can not reproduce the problem. @marc-hb could you share exact branches and tests you run so I could try again? Also (a remote) access to same piece of HW could be helpful. My debug hacks here should help in finding out what is still wrong with the dtrace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jsarha , looks good! I think even there are stilll more issues, this seems like a step to the right direction and makes the DMA trace code more robust.
Ack - this fixes the HW alignment as mentioned in the spec. We can now focus on why its not working with @marc-hb setup. |
Apologies, last night I was cherry-picking only the last commit in this PR :-( When cherry-picking the first, alignment commit then then problem does instantly go away in my configuration: from systematic corruption to none. Yesterday was a way too long day. It's still fascinating that the issue went mostly away even before this PR was merged (involving zero fat finger of mine) but that's not new: we always knew this corruption was "randomly deterministic"! Thanks a lot @jsarha , you've hopefully just made a huge difference to test results. |
The first commit should be the the only one needed to fix #5120 .
The second commit should make sure the same problem does not bite us again. The third commit is pretty much optional, but I think the fix is correct and it is also cleaning up the code a bit.