-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scoped NVTX ranges for improved profiling #827
Merged
Merged
Changes from 1 commit
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
b263e87
Basic implementation of `ScopedDeviceProfiling` using nvtx
esseivaju e311888
function declaration in non-cuda build
esseivaju ba58528
Address comments
esseivaju ce4c7b9
Register string message
esseivaju 564d126
Renamed class as it is no longer restricted to device profiling
esseivaju 45f8538
disable `operator new`
esseivaju 25ebd02
Add input struct to specify payload, category and color
esseivaju 5a2624e
insert and check return value to determine if we need to register the…
esseivaju ef4e8a2
address comments
esseivaju 12e953e
Add note about nvtx api usage
esseivaju 96f45af
comments, disable copy and move ctor and assignment operator
esseivaju 74a4eae
docs, includes and constructor for input
esseivaju 490366a
review comments
esseivaju 70199e9
Explicitly enable profiling using environment variable
esseivaju 85428a2
Merge branch 'develop' into nvtx-range
esseivaju File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution is to just
insert
with anullptr
for the handle, and if insertion succededed to register it and save in the returned iterator.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sethrj Actually, we can't really
CELER_ENSURE
that iter->second is non-null. If we're not running the application through a tool that can handle nvtx, i.e. ncu with --nvtx opthon, then all API calls will return 0 (ref).Also we can't really make any assumption on what values are returned. I tried having multiple nested ranges and printing each value returned by
nvtxDomainRegisterStringA
. The first call returns 0, the second call returns 1 so they're not really pointers as you'd expect (relevant doc).TLDR; just don't call CELER_ENSURE and trust the implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting. I guess "handle" doesn't mean "pointer" in this case. (Maybe replace
nullptr
with{}
?) That's really good to know. Could you add a note in the implementation that the API calls may fail if the profiler is disabled?