Description
Heya,
I've noticed that the repository recently added an initial draft for a SDK which can be used for profiling / tracing tools to more easily add support for Intel GPUs to their applications.
I installed the current version on my system (Ubuntu 22.04, Intel Core i7-1260P) which was working mostly fine, though I ran into some issues with xtpi
because oneAPI is installed as a module on my system which wasn't found by CMake.
Skimming through the headers and available methods, the interface looks fine, though I would need to implement it into a tool to check if it fits my requirements. However, I noticed one thing already: Right now, I don't see a way to convert timestamps given by the PTI-SDK.
Timestamp conversion
As far as I can see, PTI-SDK uses nanosecond resolution timers to collect its events. That's perfect, since some operations will take a very small amount of time to complete. However, UNIX systems might not only offer a single timer, but several ones to choose from. This option might be available to the user and will only change timers used by the application itself, with PTI-SDK still delivering the same timestamps.
For pure calculations of the computing time of an action, this is fine. However, more detailed analysis of program executions might rely on comparing timestamps between host and device activities. Here, the current implementation of PTI-SDK will fail.
This is just an example, there are more reasons for timestamp conversion for example related to output formats.
Other interfaces show similar issues. OpenMP for example does have a translate_time
function in their specifications. However, the implementation in ROCm 5.7.1 translates those timestamps to seconds, making them useless for meaningful analysis. CUDA also didn't have a native way to translate timestamps when using CUPTI until CUDA 11.6, where a direct callback was introduced and tools could register their timestamp function via cuptiActivityRegisterTimestampCallback
.
For those interfaces, timestamp conversion had to be done manually, by acquiring timestamps at least twice during program execution and calculating a conversion rate.
For PTI-SDK, there are additional hindrances for this approach though. Since we (seemingly) do not get events outside of buffer requests and buffer completions at this point and also do not have a function to get the timestamp, like cuptiGetTimestamp
or get_device_time
from OMPT, in PTI-SDK itself, there's no real way to convert timestamps at all. I'm not familiar enough with Level0 if there's a way to acquire timestamps that way, but having a direct way though PTI-SDK would be preferred.
Proposal
There are two ways to solve this issue. Either add a function to get the current timestamp used inside PTI-SDK, for example via
uint64_t PTI_EXPORT
pti[prefix]GetTimestamp()
or add the option to use tool defined timestamps via a callback function, like CUPTI uses already (see here)