Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard IRQ latency tracking #248

Merged
merged 4 commits into from
Aug 16, 2021
Merged

Conversation

UmanShahzad
Copy link
Contributor

@UmanShahzad UmanShahzad changed the title [WIP] Hard IRQ latency tracking Hard IRQ latency tracking Aug 6, 2021
@UmanShahzad UmanShahzad requested a review from thiagoftsm August 6, 2021 10:57
@UmanShahzad UmanShahzad marked this pull request as ready for review August 6, 2021 10:58
@UmanShahzad UmanShahzad requested a review from Ferroin as a code owner August 6, 2021 10:58
@UmanShahzad UmanShahzad requested a review from vlvkobal August 6, 2021 10:58
@UmanShahzad
Copy link
Contributor Author

@thiagoftsm @vlvkobal I have yet to create the netdata PR for this, but the current model should suffice: it maps from IRQ IDs which can be found in /proc/interrupts (which proc.plugin actually parses already) to their names & total, incremental latency values.

The netdata PR will involve a new hard IRQ thread which traverses all IRQ IDs in /proc/interrupts and looks them up in all per-CPU maps exposed by this eBPF collector, from user space. A stacked chart using incremental values will be created where each dimension's key is the IRQ name and value is the total latency.

thiagoftsm
thiagoftsm previously approved these changes Aug 6, 2021
vlvkobal
vlvkobal previously approved these changes Aug 9, 2021
@UmanShahzad
Copy link
Contributor Author

Before we merge, I noticed that some hardware interrupt types are not being recorded by the irq/irq_handler_entry tracepoint. There are many such interrupts which require their own, special tracepoints!

The full list of tracepoints related to IRQs can be found using:

$ cat /sys/kernel/debug/tracing/available_events | grep irq

But to narrow things down, looking at irq_vectors:* is enough. And among them, vector_* ones seem to be meta and not what we want:

$ cat /sys/kernel/debug/tracing/available_events | grep 'irq_vectors' | grep -v ':vector_'
irq_vectors:thermal_apic_exit
irq_vectors:thermal_apic_entry
irq_vectors:deferred_error_apic_exit
irq_vectors:deferred_error_apic_entry
irq_vectors:threshold_apic_exit
irq_vectors:threshold_apic_entry
irq_vectors:call_function_single_exit
irq_vectors:call_function_single_entry
irq_vectors:call_function_exit
irq_vectors:call_function_entry
irq_vectors:reschedule_exit
irq_vectors:reschedule_entry
irq_vectors:irq_work_exit
irq_vectors:irq_work_entry
irq_vectors:x86_platform_ipi_exit
irq_vectors:x86_platform_ipi_entry
irq_vectors:error_apic_exit
irq_vectors:error_apic_entry
irq_vectors:spurious_apic_exit
irq_vectors:spurious_apic_entry
irq_vectors:local_timer_exit
irq_vectors:local_timer_entry

So I'm going to have to add literally 22 new tracepoints in this PR.

Fortunately, all of these new ones will be almost exactly the same, so I'll just use a macro to easily generate them.

$ for i in `cat /sys/kernel/debug/tracing/available_events | grep irq_vectors | grep -v ':vector_'` ; do cat /sys/kernel/debug/tracing/events/irq_vectors/${i#"irq_vectors:"}/format | tail -n+4 > /tmp/${i#"irq_vectors:"}.format ; done
$ diff --from-file /tmp/*.format
# no difference in formats.

@UmanShahzad
Copy link
Contributor Author

@thiagoftsm @vlvkobal @underhood this is ready for review and will be needed to properly review netdata/netdata#11410

@thiagoftsm
Copy link
Contributor

netdata/product#2155

Please, add to description the kernels you tested this PR, thank you!

Copy link
Contributor

@thiagoftsm thiagoftsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this PR on kernels 5.13.8, 4.14.239 and 3.10.0-1160, this last using CentOS 7.9.2009, everything worked as expected, so LGTM!
Congratulations @UmanShahzad !

@UmanShahzad
Copy link
Contributor Author

UmanShahzad commented Aug 16, 2021

Please ignore my previous (deleted) message: looks like I didn't refresh the page properly 😆

@UmanShahzad UmanShahzad merged commit 55a47f1 into netdata:master Aug 16, 2021
@UmanShahzad UmanShahzad deleted the uman/hardirq branch August 16, 2021 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants