-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2024.10.17 preset extensions #284
base: master
Are you sure you want to change the base?
Conversation
} | ||
|
||
if ( print_tabular ) { | ||
printf( " Name Code " ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The header of Name Code Avail Deriv Description (Note)
, currently sits off centered for both the PAPI Preset Events and PAPI Component Presets:
Name Code Avail Deriv Description (Note)
Start printing index = 0
PAPI_L1_DCM 0x80000000 No No Level 1 data cache misses
Name Code Avail Deriv Description (Note)
PAPI_CUDA_HP_FMA 0x80000080 Yes No CUDA Half precision FMA instructions
:device=0
Mandatory device qualifier [0]
The master code does not have this:
Name Code Avail Deriv Description (Note)
PAPI_L1_DCM 0x80000000 Yes No Level 1 data cache misses
|
||
if ( !print_event_info ) { | ||
if ( print_avail_only == PAPI_PRESET_ENUM_CPU_AVAIL || print_avail_only == PAPI_PRESET_ENUM_AVAIL ) { | ||
printf( "Of %d available events, %d ", avail_count, deriv_count ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a suggestive comment, I think it may be a good idea to have two different printouts. One printout for PAPI Preset Events and one for PAPI Component Presets.
|
||
/* check to see if a valid modifier is provided */ | ||
if (modifier != PAPI_ENUM_EVENTS && | ||
modifier != PAPI_ENUM_FIRST && | ||
modifier != PAPI_ENUM_ALL && | ||
modifier != PAPI_PRESET_ENUM_AVAIL && | ||
modifier != PAPI_PRESET_ENUM_CPU && | ||
modifier != PAPI_PRESET_ENUM_CPU_AVAIL && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the doxygen for PAPI_enum_event
to reflect the addition of PAPI_PRESET_ENUM_CPU
, PAPI_PRESET_ENUM_CPU_AVAIL
, and PAPI_ENUM_FIRST_COMP
?
i &= PAPI_PRESET_AND_MASK; | ||
|
||
/* Iterate over all or all available presets. */ | ||
if ( modifier == PAPI_ENUM_EVENTS || modifier == PAPI_PRESET_ENUM_AVAIL ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested both of these modifiers:
PAPI_ENUM_EVENTS
, does go over all available CPU Events, but for the Cuda presets I outputPAPI_CUDA_HP_FMA
(the first listed Cuda preset on hopper1 at Oregon) and the other three events returnPAPI_ENOTPRESET
.PAPI_PRESET_ENUM_AVAIL
, does not output just the available preset events.
i &= PAPI_PRESET_AND_MASK; | ||
|
||
/* Iterate over all or all available presets. */ | ||
if ( modifier == PAPI_ENUM_EVENTS || modifier == PAPI_PRESET_ENUM_AVAIL ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the CPU:
PAPI_PRESET_ENUM_AVAIL
is showing 108 available preset events on the machine I am testing on. While the master branch is only showing 47 (we should get 47 back).
For the GPU:
PAPI_ENUM_EVENTS
is only showing PAPI_CUDA_HP_FMA
when starting at 0x80000080. Likewise for PAPI_PRESET_ENUM_AVAIL
.
prstPtr->num_quals++; | ||
|
||
/* Store the qualifier in the preset struct. */ | ||
prstPtr->quals[prstPtr->num_quals-1] = (char*)malloc((1+strlen(qualPtr))*sizeof(char)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not increment prstPtr->num_quals++
further down? Then you would not have to do prstPtr->num_quals-1
. I may be missing something though and you actually need to increment at that point.
/* Store the qualifier's description in the preset struct. */ | ||
prstPtr->descrs[prstPtr->num_quals-1] = (char*)malloc((1+strlen(qualPtr))*sizeof(char)); | ||
if( NULL != prstPtr->descrs[prstPtr->num_quals-1] ) { | ||
status = sprintf(prstPtr->descrs[prstPtr->num_quals-1], "%s", qualPtr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned previously, I would probably go with snprintf
instead of sprintf
. There are a few other occurrences below this as well.
935ea80
to
eaa020e
Compare
b57602c
to
6c536d1
Compare
Create fields in the vector struct for components to define presets. These changes have been tested on the NVIDIA Hopper architecture.
6c536d1
to
1212e11
Compare
Add functions to facilitate CUDA presets. These changes have been tested on the NVIDIA Hopper architecture.
Update configure to track both the number of presets per component and the arrays of presets belonging to each component. These changes have been tested on the NVIDIA Hopper architecture.
Updates to framework to facilitate preset events defined by native events of non-perf_event components. These changes have been tested on the NVIDIA Hopper architecture.
This is an aesthetic change to improve the development process. These changes have been tested on the NVIDIA Grace-Hopper architecture.
Replace modifiers with only those that enumerate the CPU preset events. These changes have been tested on the NVIDIA Grace-Hopper architecture.
Enumerate presets for components as well as the CPU. These changes have been tested on the NVIDIA Grace-Hopper architecture.
1212e11
to
6406ed1
Compare
Pull Request Description
This PR adds support for presets defined using native events from non-CPU components. The only component that has been modified in this PR is the 'cuda' component, which is limited to loading the preset table for 'cuda' presets.
These changes have been tested on the NVIDIA Grace-Hopper architecture.
Author Checklist
Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
Commits are self contained and only do one thing
Commits have a header of the form:
module: short description
Commits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
The PR needs to pass all the tests