-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove halide_abort_if_false() usage in runtime/metal #6398
Conversation
This converts all the usage of `halide_abort_if_false()` in runtime/metal into either an explicit runtime check-and-return-error-code (if the check looks plausible), or `halide_debug_assert()` (if the check seems to be stating an invariant that shouldn't be possible in well-structured code). These changes are admittedly subjective, so feedback is especially welcome. Also, driveby change to sync-common.h to use `halide_debug_assert()` rather than a local equivalent.
@@ -340,7 +340,7 @@ extern "C" { | |||
// previous call (if any) has not yet been released via halide_release_metal_context. | |||
WEAK int halide_metal_acquire_context(void *user_context, mtl_device **device_ret, | |||
mtl_command_queue **queue_ret, bool create) { | |||
halide_abort_if_false(user_context, &thread_lock != nullptr); | |||
halide_debug_assert(user_context, &thread_lock != nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should actually trigger an abort, if I understand the check correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this could ever fail, TBH -- we're taking the address of a (weak) global var. How can that fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're probably right-- it just looks strange to potentially deref a null pointer, even if it can't be null. If it really can't happen, I wonder if we should elide the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modulo nits & the one place where I think we should fail hard, LGTM
halide_abort_if_false(user_context, (device == nullptr) || (queue != nullptr)); | ||
if (device != nullptr && queue == nullptr) { | ||
error(user_context) << "halide_metal_acquire_context: device initialized but queue is not.\n"; | ||
return -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: do we have a set policy on returning -1 vs returning an error code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the prevalence (here and elsewhere) of just returning -1 or similar, apparently not :-/
(This is something we really should improve; at least making a typedef for the return codes and attempting to return error values. But that's probably a separate issue.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, runtime functions that are only ever called by other runtime functions often return anything non-zero back upwards, and then the calling function selects an appropriate error code (e.g. if this acquire context failed while trying to do a device copy, we ultimately return a device_copy_failed error). Sometimes you need to know the calling context to return the appropriate error.
This convention sort of makes sense, but it's not very clear in the code that there even is a convention happening here. There's no reason this couldn't return an failed_to_acquire_context error or something, and then the calling function could replace the error code if it wants to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runtime functions that are only ever called by other runtime functions
there's also not a clean demarcation here -- I will bet you a dollar that we can find 'public' runtime functions that shortcut out in this way too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The demarcation is supposed to be that the exposed functions in device_interface.cpp are the entrypoints, and return real error codes. The specific backend implementations, and the default implementations in device_interface.cpp don't need to.
src/runtime/metal.cpp
Outdated
@@ -542,10 +555,11 @@ WEAK int halide_metal_initialize_kernels(void *user_context, void **state_ptr, c | |||
mtl_library *library{}; | |||
if (!compilation_cache.kernel_state_setup(user_context, state_ptr, metal_context.device, library, | |||
new_library_with_source, metal_context.device, | |||
source, source_size)) { | |||
source, source_size) || | |||
library == nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This what clang-format did, let me see if I can pacify it otherwise
src/runtime/metal.cpp
Outdated
halide_abort_if_false(user_context, from_host || src->device); | ||
halide_abort_if_false(user_context, to_host || dst->device); | ||
if (!(from_host || src->device)) { | ||
error(user_context) << "halide_metal_buffer_copy: impossible copy source\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: impossible -> invalid
The change on line 343 seems to be the only controversial thing? |
I'm ok to merge as is, honestly. I just find it a weird check to do if it can't fail. |
This converts all the usage of
halide_abort_if_false()
in runtime/metal into either an explicit runtime check-and-return-error-code (if the check looks plausible), orhalide_debug_assert()
(if the check seems to be stating an invariant that shouldn't be possible in well-structured code). These changes are admittedly subjective, so feedback is especially welcome.Also, driveby change to sync-common.h to use
halide_debug_assert()
rather than a local equivalent.