-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dma-trace: Full support for re-configuration and make dmatb persistent from the time it is allocated #5106
dma-trace: Full support for re-configuration and make dmatb persistent from the time it is allocated #5106
Conversation
src/trace/dma-trace.c
Outdated
@@ -291,6 +302,8 @@ static int dma_trace_start(struct dma_trace_data *d) | |||
} | |||
|
|||
if (d->dc.chan) { | |||
unsigned int flags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just noticed, that this file uses unsigned int
for spin-lock flags. It should actually be uint32_t
. Not something to change in this PR, but a separate PR could update that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced with k_spinlock_key_t
as we have switched lock API...
src/trace/dma-trace.c
Outdated
spin_lock_irq(&d->lock, flags); | ||
|
||
d->old_host_offset = 0; | ||
d->posn.host_offset = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
above dma_stop()
or dma_copy_set_stream_tag()
could've failed. Don't you need to abort in some of those cases? I'd reformat this function with the usual goto
s to make error handling more obvious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lyakh, we do abort at line 339.
This is a bit tricky. I would say that we can (and need) to reset the the host_offset in any case, so this can be moved beyond line 341.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved after the error check/abort.
SOFCI TEST https://sof-ci.01.org/sofpr/PR5106/build11466/devicetest/?model=TGLU_UP_HDA_ZEPHYR&testcase=check-playback-all-formats looks like issue #5120 #5120 seems "randomly deterministic", meaning it always happen in some configurations and never in other configurations. So it is possible that this PR causes a deterministic TGLU_UP_HDA_ZEPHYR regression for no obvious reason. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me (when other comments are addressed), but you'll need to rebase this after my dtrace buffer alignment PR.
6a858e2
to
115743a
Compare
Changes since v1 (I think):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi some red showing on Zephyr APL, can you split this PR into more parts, based on risk and change size. This way it should be more obvious which parts are causing the CI to report issues. Thanks.
@lgirdwood, we have exactly the same failures present in unrelated kernel pr, like this: https://sof-ci.01.org/linuxpr/PR3424/build7156/devicetest/ I can split the PR into two, but they can not go in parallel as the patches are depending on each other. |
@ujfalusi no need to split, just seen the other PR showing red too in the same place for APL Zephyr and this has been happening for a while now, we must have been lucky on Friday when the CI consistently showed green. |
src/trace/dma-trace.c
Outdated
d->old_host_offset = 0; | ||
d->posn.host_offset = 0; | ||
|
||
k_spin_unlock(&d->lock, key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, not sure I understand the use of locking well here. I looked through dma-trace.c and I couldn't understand what this lock is actually protecting. As for these host_offset
fields - looks like in other locations they are accessed and modified without holding the lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, there is no need for the lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you going to remove the lock ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I will.
key = k_spin_lock(&d->lock); | ||
|
||
rfree(buffer->addr); | ||
memset(buffer, 0, sizeof(*buffer)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed? I see, that it is just moved from another location, just wondering if we know why we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is to reset the variables within buffer.
SOFCI TEST |
Rerun the CI since this is a trace change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what effect it will have on
I haven't found the time for a full review yet sorry.
src/trace/dma-trace.c
Outdated
*/ | ||
#define SOF_BANNER_COMMON \ | ||
"FW ABI 0x%x DBG ABI 0x%x tags SOF:" SOF_GIT_TAG ZEPHYR_VER_OPT \ | ||
" src hash 0x%08x (ldc hash " META_QUOTE(SOF_SRC_HASH) ")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please move these macros down, back together with the only place they are used?
I think a simple revert as it is in #4334 will no longer going to cut it to have early dtrace buffer as the We should revisit the early dtrace buffer allocation, I agree. I would also add a Kconfig option to make it selectable with opt-in. I would also add mtrace for wrap notification when the host is not yet (or withdrawn it) provided a host buffer to flush the dtrace buffer to. |
dma_trace_buffer_init() can only fail if the rballoc fails so there is no need to call dma_trace_buffer_free() as the buffer has not been allocated Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Printing the error via dtrace is not going to work when the memory for the dtrace is not allocated. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Move the tag printing out from dma_trace_enable() to dma_trace_buffer_init() to be close proximity of the buffer allocation for easier reading. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
…nfig We should stop the task before the dtrace channel is stopped when we need to re-configure it. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
115743a
to
6698e43
Compare
I have found other memory leak. I will update this PR with an additional patch (the host buffer's elem_array is not freed on error and on dtrace_free) |
We need to free up the allocated SG in case of a failure after a successful ipc_process_host_buffer() call, which does the allocation. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Do not attempt to allocate the dtrace buffer again if it has been already allocated. This only possible if we are re-configuring the dtrace which implies that we also must have the DMA channel. In such case, skip the allocation and do a re-init of the buffer after the DMA channel has been stopped. Reported-by: Keyon Jie <yang.jie@linux.intel.com> Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
When the dtrace is disabled (via the SOF_IPC_TRACE_DMA_FREE IPC message) we need to also free up the allocated dma_sg_elem_array to avoid leaking memory on the next dtrace start. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
…ocations dma_sg_alloc() is called with SOF_MEM_ZONE_SYS which can not be freed up. dma config or start error skip the free as it is not going to work anyways. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
…tion Do not free the dmatb so that the trace code can continue collecting information continuously in it's buffer. We also have another memory leak (when CONFIG_DMA_GW is set) which can be handled this way: Every time the dma_trace_start() is called we allocate a new dma_sg. This is fine when the dma_trace_start() is only called once right after the DSP finished booting, but multiple calls would eventually going to lead to OOM. Move the dma_sg allocation at the same place where the dmatb is allocated to make it persistent as well. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
6698e43
to
8a051c2
Compare
Changes since v3:
|
Hi,
This PR is replacing #4879.
Since last time I have noticed another path for a memory leak which was a hard one to crack:
Within dma_trace_start() when CONFIG_DMA_GW is set we were allocating a dma_sg (to SOF_MEM_ZONE_SYS) and we would do that for every start. The config is only used in that function to configure the DMA channel, but the dma_sg can not be freed up as it is allocated to SOF_MEM_ZONE_SYS. Other zone allocation failed on my tgl-h laptop.
The solution is to make dmatb and this dma_sg persistent from the point they are allocated.
This comes with a bonus that we can still collect dtrace logs to SOF firmware for some extent (if the would be dtrace client is removed we would still have some memory of the events).
The other change compared to #4879 is that I don't reset the firmware dmatb, only the host pointers as when the re-configuration happens it is indicating that we are going to have a new host destination along with 0 offsets.