-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timer based profiler #6642
Timer based profiler #6642
Conversation
…bring up on embedded ("bare metal") systems that may not have a full OS with threads.
in multiprocessor situations. (Both as an option on systems where threads would be fine and on embedded systems which don't have time shared threads but cores are dedicated to the Halide threadpool.)
@@ -1843,6 +1843,13 @@ extern struct halide_profiler_state *halide_profiler_get_state(); | |||
* This function grabs the global profiler state's lock on entry. */ | |||
extern struct halide_profiler_pipeline_stats *halide_profiler_get_pipeline_state(const char *pipeline_name); | |||
|
|||
/** Collects profiling information. Intended to be called from a timer | |||
* interrupt handler if timer based profiling is being used. | |||
* State argument is acquired via halide_profiler_get_pipeline_state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you call this (and other new calls) from thread-based profiling? Even if it's just UB maybe worth calling out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In thread based profiling, Halide fires up a thread to call halide_profiler_sample
. It can be called directly as well, though I can't think of a reason to do so. It would simply result in profile samples being collected more often than otherwise I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other calls probably result in undefined references as they are only provided for a timer profiling based runtime.
Any sense of the relative accuracy of this vs a sampling thread? Can test/performance/profiler.cpp somehow be modified to test this as well? I'm a bit wary of merging this without any test. |
I tested it by manually changing the profiler implementation to use timer based profiling and then running performance_profiler, the test Andrew mentioned, on Linux using the posix_timer_profiler support. It produces about the same values. The timer implementation was running about 10ms higher on ~1450ms total across an average of a few runs of each. I did not push to get statistically valid results nor try a more sophisticated pipeline with multiple Funcs, etc. (Though the reason this code exists is because I needed it for real work, so I have used it to profile real stuff.) The only obstacle to testing is how to control whether it is enabled or not. The easiest way would be to add a target flag and then change the above mentioned test to execute both kinds. Happy to do that, just wasn't sure it was worth burning a target flag for it. Theoretically the bare metal target I'll be adding would test it, but it is unlikely we'll be able to run most tests there. The other place I think this may be useful is WebAssembly, but it's not clear yet if that is the case. |
What's the status on this PR? |
I think it's stalled on me trying it out. Doing so now. |
This PR adds support for timer interrupt based profiling. The intent is to use this support in embedded applications without much of an OS, where just using threads is not possible. (It is also potentially useful for applications where a process should not have any threads other than the main thread of control. These are however very rare and I'm not sure that is worth supporting.)
This is in preparation for adding a target that uses this support (SiFive bare metal SDK on RISC V). I have tested the Posix support to ensure it works, but did not feel it was worth adding a target option to select this mode vs. the threads based mode currently used.