-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scoped NVTX ranges for improved profiling #827
Conversation
auto message_handle = message_registry().find(message); | ||
// recheck that nobody inserted the same message as we had to release the | ||
// shared lock not strictly necessary as insert will do nothing if the key | ||
// already exists | ||
if (message_handle == message_registry().end()) | ||
{ | ||
auto handle | ||
= nvtxDomainRegisterStringA(domain_handle(), message.c_str()); | ||
message_registry().insert({message, handle}); | ||
return handle; | ||
} | ||
else | ||
{ | ||
return message_handle->second; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution is to just insert
with a nullptr
for the handle, and if insertion succededed to register it and save in the returned iterator.
auto message_handle = message_registry().find(message); | |
// recheck that nobody inserted the same message as we had to release the | |
// shared lock not strictly necessary as insert will do nothing if the key | |
// already exists | |
if (message_handle == message_registry().end()) | |
{ | |
auto handle | |
= nvtxDomainRegisterStringA(domain_handle(), message.c_str()); | |
message_registry().insert({message, handle}); | |
return handle; | |
} | |
else | |
{ | |
return message_handle->second; | |
} | |
auto [iter, inserted] = message_registry().insert({message, nullptr}); | |
if (inserted) | |
{ | |
iter->second | |
= nvtxDomainRegisterStringA(domain_handle(), message.c_str()); | |
} | |
CELER_ENSURE(iter->second); | |
return iter->second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sethrj Actually, we can't really CELER_ENSURE
that iter->second is non-null. If we're not running the application through a tool that can handle nvtx, i.e. ncu with --nvtx opthon, then all API calls will return 0 (ref).
Also we can't really make any assumption on what values are returned. I tried having multiple nested ranges and printing each value returned by nvtxDomainRegisterStringA
. The first call returns 0, the second call returns 1 so they're not really pointers as you'd expect (relevant doc).
TLDR; just don't call CELER_ENSURE and trust the implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting. I guess "handle" doesn't mean "pointer" in this case. (Maybe replace nullptr
with {}
?) That's really good to know. Could you add a note in the implementation that the API calls may fail if the profiler is disabled?
ScopedDeviceProfiling
using nvtx as backend
src/corecel/sys/ScopedProfiling.hh
Outdated
// Deactivate profiling | ||
~ScopedProfiling(); | ||
// RAII semantics | ||
void* operator new(std::size_t count) = delete; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never seen this pattern before... is it necessary? Do we need to add it to all our RAII classes? For comparison, scoped_lock deletes the assignment operator but doesn't delete new
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not technically needed but per the name of the class, dynamically allocating a new instance is not "scoped profiling", so does it make sense to allow it? The C++ API of nvtx does that for their implementation of the scoped_range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think delete
and such should be to prevent accidental misuses, since it's not possible for us to prevent all intentional misuse.
Besides, if wanted to (for example) put it in a unique_ptr
or shared_ptr
to change/share the scope at runtime, I think that's a reasonable use case. A dumb user could new
it and never release it, and then it just becomes an open range? Which seems like the expected behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I agree, but for this class and the stack semantics of nested ranges I'm not sure it would be legal to outlive the scope. Suppose we have Scope1
active from a caller function and we instantiate Scope2
and have it managed by a unique_ptr
to outlive Scope1
. Then, the Scope1
dtor would, I assume, end the range from the nested scope.
Also those range are thread_local so passing the ownership to another thread shouldn't be done which could easily happen with smart pointers or heap allocation.
auto message_handle = message_registry().find(message); | ||
// recheck that nobody inserted the same message as we had to release the | ||
// shared lock not strictly necessary as insert will do nothing if the key | ||
// already exists | ||
if (message_handle == message_registry().end()) | ||
{ | ||
auto handle | ||
= nvtxDomainRegisterStringA(domain_handle(), message.c_str()); | ||
message_registry().insert({message, handle}); | ||
return handle; | ||
} | ||
else | ||
{ | ||
return message_handle->second; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting. I guess "handle" doesn't mean "pointer" in this case. (Maybe replace nullptr
with {}
?) That's really good to know. Could you add a note in the implementation that the API calls may fail if the profiler is disabled?
src/corecel/sys/ScopedProfiling.hh
Outdated
struct ScopedProfilingInput | ||
{ | ||
std::string name; //!< Name of the range | ||
uint32_t color{0xFF00FF00}; //!< ARGB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requires #include <cstdint>
and since it's unsigned:
uint32_t color{0xFF00FF00}; //!< ARGB | |
uint32_t color{0xFF00FF00u}; //!< ARGB |
src/corecel/sys/ScopedProfiling.hh
Outdated
// Can't outlive the scope it was created in to correctly match each push / | ||
// pop. Must strictly follows RAII semantics (automatic storage duration) | ||
void* operator new(std::size_t) = delete; | ||
ScopedProfiling(ScopedProfiling const&) = delete; | ||
ScopedProfiling& operator=(ScopedProfiling const&) = delete; | ||
ScopedProfiling(ScopedProfiling&&) = delete; | ||
ScopedProfiling& operator=(ScopedProfiling&&) = delete; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the operator new
specialization, and the &&
instances are automatically deleted if &
is deleted so you can omit those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see allowing operator new
even though using it is asking for trouble.
Regarding the move operations, should we be explicit and follow C.21?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. It looks like ScopedMem
, Stream
, and ScopedSignalHandler
all need that as well, and a few "scoped" classes are entirely missing the deleted copy/assign:
- src/celeritas/ext/ScopedGeantExceptionHandler.hh
- src/celeritas/ext/ScopedGeantLogger.hh
- src/celeritas/ext/ScopedRootErrorHandler.hh
- src/corecel/io/ScopedStreamFormat.hh
- src/corecel/io/ScopedStreamRedirect.hh
- src/corecel/io/ScopedTimeAndRedirect.hh
- src/corecel/io/ScopedTimeLog.hh
- src/corecel/sys/ScopedMpiInit.hh
Should those updates be a separate PR too? If you haven't pushed the restoration of ..&&) = delete
I think we should do them all in a separate best-practices merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, we can do that in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work Julien. Do you want to add these to the individual actions/kernel launches in a separate PR?
Yeah, I can do that in a separate PR though there is one thing I don't like; If nvtx isn't enabled, you still end up paying a small cost calling |
Maybe to avoid the cost, we use the same trick as |
yeah, that's sounds like a good solution. I'd say in this case we want to explicitly enable profiling with |
Use the NVTX library instead of cuda profiler. For now this implements the bare minimum functionalities to be useful. We'd probably want to use registered strings and possibly allow to specify a payload / category?