-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread analyzer adsp fix #71408
Thread analyzer adsp fix #71408
Conversation
@ujfalusi , @kv2019i , @lyakh and @lgirdwood , could you give this a brief SOF internal review, before I mark this ready for review. |
I think the code look good for wider review. I think the rationale could be expanded a bit. Which piece of data was problematic without this patch? Zephyr allocates most of the its data uncached on xtensa as well, so a bit surprising the caches are causing a problem here on Zephyr side. E.g. global static data is uncached by default and should be safe to use across cores (evein on these xtensa platforms). It would help to pinpoint the exact data object (and where it is accessed) that is causing problems, and explain why these additions are needed. |
ecd5d4e
to
8958b2c
Compare
The kernel thread list can be looped through, but it appeared that calling k_thread_runtime_stats_all_get() hung the DSP (I never got any crash, the DSP just became dead), also asking isr stack from neighboring core hung the DSP. There was a build error for non SMP build, I'll try to fix all the automatic testing issues and the mark this ready for review. |
d1abd85
to
adedca7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kv2019i Surely the problem here is that on SMP builds with adsp, the thread stacks which this utility needs to inspect are in cached/CPU-local memory. So the data it sees will be an essentially arbitrary mix of stale, previously-flushed cache blocks and not the sequentially consistent data it expects. But agree that the comments should be clearer on what this is doing.
Nitpick-wise: Might be worth tying this directly to CONFIG_KERNEL_COHERENCE, which is the actual feature that enables the cache-incoherent thread stacks. If you don't have that =y there's no need for the per-CPU thread analyzer.
Likewise this is using CONFIG_SCHED_CPU_MASK APIs, which is an optional feature that should be listed in a "depends on" in kconfig.
But on the whole this is a reasonable and straightforward workaround for the problem.
|
||
static K_THREAD_STACK_ARRAY_DEFINE(analyzer_thread_stacks, CONFIG_MP_MAX_NUM_CPUS, | ||
CONFIG_THREAD_ANALYZER_AUTO_STACK_SIZE); | ||
static struct k_thread analyzer_thread[CONFIG_MP_MAX_NUM_CPUS]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know that you need N separate threads, it would be enough to have just one that sequentially re-pins itself to each core in sequence, via e.g. a k_timer or something (you can't pin a running thread, so it can't move itself directly).
But then I guess that requires knowing which CPUs are actually unsuspended, which would be difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should in theory be possible to survive with two threads. One all the time at core 0, and the other being pinned by the core 0 thread on different cores, synchronized with semaphores, and having some timeout mechanism to skip suspended cores... but then again I don't think the save is worth all that complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the savings is very significant. A thread stack is the largest kernel data structure you're going to allocate. The DSPs are big but ~8k of stack memory on MTL is still a big waste. And again, I don't see that this is so complicated? Just pend the thread for each change of CPU and bounce to a k_timer to do the migration. You'll pay a cost of 0-1 ticks (MTL is still 12kHz ticks I think?), which seems reasonable for a debug feature.
I tried setting CONFIG_KERNEL_COHERENCE=n and taking away my changes, but the FW still crashed with the change. So there is something else to it than just that.
Thanks, I'll add the dependency.
|
d271f5d
to
ca6a826
Compare
@nashif , Liam suggested me to ping you about this PR. |
No, I meant predicate this feature so that it gets enabled when COHERENCE=y. You can't meaningfully run SMP kernels on intel_adsp without COHERENCE (well, you can, but it maps all memory as uncached -- this used to work but might have bitrotten). |
8d5390d
to
8c570a5
Compare
703c6e9
to
4d69f51
Compare
Applied the change suggested here. |
4d69f51
to
bbbc478
Compare
Fix CI errors by adding couple of ARG_UNUSED(). |
@npitre this is now done. Is there something more I should do here to get this merged? |
bbbc478
to
273f9cd
Compare
Oops, k_thread_foreach_filter_by_cpu() implementation had wrong name. Apparently I only tested the unlocked version. Now fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsarha Latter two patches ok, but one clarifying question on first patch. There seems to be a semantics difference between locked and unlocked versions. Am I missing something?
…sion Add functions k_thread_foreach_unlocked_filter_by_cpu() and k_thread_foreach_filter_by_cpu() to loop through the threads on the specified cpu only. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
Add k_thread_runtime_stats_cpu_get() to get runtime statistics of the specified core. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
273f9cd
to
5bce654
Compare
…tely Add implementation to analyze threads on each cpu separately. This feature can be enabled with THREAD_ANALYZER_AUTO_SEPARATE_CORES Kconfig option. If enabled, an analyzer thread is started for each cpu, and the threads will only analyze thread on the cpu its running on. This feature is needed for Intel ADSP platform, where cpu specific caches are not synchronized between the cpu. It is also probably needed by other platforms having CONFIG_KERNEL_COHERENCE=y, so default to THREAD_ANALYZER_AUTO_SEPARATE_CORES=y for those platform. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>
5bce654
to
17e7a84
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good now!
@andyross , there was one-line bug-fix, but otherwise this should be the same as the version you approved. |
Argh, forgot about that this is still lying around. @andyross , a gentle ping! @lgirdwood I'am not sure how to get this moving. |
@peter-mitsis , could you spare time to review this? |
@jsarha - Taking a look at this again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks good here
@kv2019i @lyakh might have input too. But broadly SOF tracks Zephy upstream closely. At least so far there hasn't been an attempt to tie a particular SOF release to a Zephyr LTS. And honestly even if there were, this is a fix to a debug feature that isn't used in production audio firmware, so seems like it would be low priority. (Just FWIW: I'm fairly conservative about bugfix backports. Security bugs, 100%. Bugs reported by actual shipping apps, absolutely. But just general fixes? There are too many and IMHO the churn caused by the fixes themselves are a higher risk than the bugs they fix in most cases.) |
Hi @jsarha! To celebrate this milestone and showcase your contribution, we'd love to award you the Zephyr Technical Contributor badge. If you're interested, please claim your badge by filling out this form: Claim Your Zephyr Badge. Thank you for your valuable input, and we look forward to seeing more of your contributions in the future! 🪁 |
Due to Intel ADSP's cache architecture the Zephyr thread analyzer does not work out of the box on SOF platform. The core specific caches are not synchronized between the cores and accessing thread date from other core usually causes a crash. This PR adds on option for thread analyzer thread to only "analyze" threads on the same core its running on, and starts thread for each core.