-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dma-trace: Fixes aimed to make the re-configuration robust and cleaner #4879
dma-trace: Fixes aimed to make the re-configuration robust and cleaner #4879
Conversation
src/trace/dma-trace.c
Outdated
mtrace_printf(LOG_LEVEL_ERROR, "dma_trace_enable: buffer_init failed"); | ||
goto out; | ||
/* Allocate and initialize the dma trace buffer if needed */ | ||
if (!d->dmatb.addr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably this check should be moved within dma_trace_buffer_alloc()
?
b335a12
to
bb3e8b4
Compare
Changes since v1:
|
Ok, btw I think I only noticed a single red box on QB this mroning and now there are more related to Stream start. Lets try again. |
SOFCI TEST |
It can only be the last commit which introduces any functional changas, but it should only have effect in case of reconfiguration and I don't see that. So far this is not really doing what it supposed to do. The other thing which might be of issue is that the check for the addr might need to be inside of a lock? |
@ujfalusi best to bisect the PR and we can incrementally merge. |
d5ab186
to
9a101ae
Compare
I'm not certain if it is related to this PR, with the latest devixe test (https://sof-ci.01.org/sofpr/PR4879/build10753/devicetest/): On the previous one (https://sof-ci.01.org/sofpr/PR4879/build10750/devicetest/) without the two topmost patch: Scheduled another PR test run. |
The PR test run (7496) also have two failures: The two dmesg starts identical with a firmware boot issue:
But no issue on TGLH_RVP_HDA this time. |
Another PR test (7500) have again different failure pattern: Looks like the board does not wake up and the playback version have:
in the logs. |
I have tried to bisect via CI the CML TIMEOUT we have in this PR but it was inconclusive as with an empty commit on top of |
SOFCI TEST |
9a101ae
to
750c177
Compare
@@ -321,6 +328,9 @@ static int dma_trace_start(struct dma_trace_data *d) | |||
d->dc.chan = NULL; | |||
err = dma_copy_set_stream_tag(&d->dc, d->stream_tag); | |||
} | |||
|
|||
/* Re-initialize the dtrace buffer */ | |||
dma_trace_buffer_init(d, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can a memory leak happen? Is start()
called without a stop()
, because the latter does free the buffer? Is stop()
never called in such a case, just a repeated start()
? Maybe worth mentioning this in the commit message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record: it is SOF_IPC_TRACE_DMA_FREE
and the dma_trace_disable()
for the trace stop which if it is called then stops/put the DMA channel and free up the buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check w/o this series tomorrow against the SOF clients version. I might be able just close this PR.
dma_trace_buffer_init() can only fail if the rballoc fails so there is no need to call dma_trace_buffer_free() as the buffer has not been allocated Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
The current dma_trace_buffer_init() which allocates and initializes the dtrace buffer is re-tasked to simply (re-)initialize the dtrace buffer along with placing the markers at the same function. The allocation of the buffer is split out to dma_trace_buffer_alloc() which calls the dma_trace_buffer_init() to get the buffer initialized. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
…nfig We should stop the task before the dtrace channel is stopped when we need to re-configure it. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Do not attempt to allocate the dtrace buffer again if it has been already allocated. This only possible if we are re-configuring the dtrace which implies that we also must have the DMA channel. In such case, skip the allocation and do a re-init of the buffer after the DMA channel has been stopped. Reported-by: Keyon Jie <yang.jie@linux.intel.com> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
750c177
to
81aff7b
Compare
Changes since v2:
|
I still see two TIMEOUTs on the tests which makes me a bit nervous. I don't see how this PR can cause it to be honest, but let me try to clarify if it is really due to this or something else. Please do not merge it! |
@ujfalusi can you rebase and repush, this will retest with latest revert |
Drafts cannot be merged, I mean not without deliberately changing them to non-drafts.
No need to rebase and create I keep saying this because I notice people keep missing that tests are always performed on a moving target whether they want that or not. Asking to rebase maintains this misconception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The situation is complicated because the current design (before this PR) does not make sense: dma_trace_init_early()
pretends to
Do this early so we can log at initialization time even before the DMA runs. The rest happens later in dma_trace_init_complete() and dma_trace_enable()
But in fact what dma_trace_init_early()
does is nowhere near enough because it does not initialize the DMA trace buffer (= what this PR touches).
So why is dma_trace_init_early()
lying and why do we have this very gradual and more initialization split over multiple functions called at different points in time? Very good question. The answer is: because old commit eca2089 was merged without commit message and without code review. Before that commit, dma_trace_init_early()
used to do what it says and earlier DMA traces (which still exist!) where not silently discarded. For more details check Restore early tracing
#4334 which I want to go and try to revive now that a bunch of DMA issues have been fixed.
As #4334 is incompatible with this (and apparently new to people here) I'm defensively blocking this for now until someone proves that #4334 is a bad idea and this is a better one.
Back to the initial point: why should the DMA trace buffer be freed and re-allocated? Why not make it immutable?
Yes, this comment is completely and without a doubt not stand as of today, regardless of this PR.
I agree, with the current code you can not store the dtrace messages between
What you need to take into account is that we now have a new IPC (
I don't see any incompatibility, this PR is not freeing up the buffer. It is making sure that we re-use it if it has been already allocated - which is now unlikely as the
It is a good question. If we release the buffer while the dtrace is disabled, not in use then we do loose the messages (and I guess we can not fall them back to mtrace). Either way, I don't see this PR in any way blocking the early dtrace or I would rather re-name it to nonstop dtrace support. |
I gave up with the bisecting of TIMEOUT on CML, it just does not worth the time (#4925). I'll move it out from draft and let's continue the discussion around this. |
@ujfalusi correct me but I believe this one is a nice to have compared to some other trace PRs in flight like for instance thesofproject/linux#3136 (or IPC4). One significant trace change at a time. |
I think the memleak can still happen if the fuzzer is used but not with Linux. I did run a test with current sof-dev and with the client support without this PR and all looked fine (we do release the trace buffer and DMA channel on dtrace free). |
Closing this PR as #5106 contains most of the fixes but it is extending it further to plug one more leak. |
Within #4860 from @keyonjie there were a fix for memory leak in dtrace which would have broke the SOF client support - we need to be able to remove and insert the dtrace, re-configuration is normal.
The problem is valid, but it has to be handled in a different way, this PR is providing that with a cleaner (I hope) implementation and a the same time makes the dtrace reconfiguration a bit more nicer.
I have tested it with the SOF client support (thesofproject/linux#3136):
run
aplay -Dplughw:1,0 -fdat /dev/urandom
to keep the DSP on and run in loop: