-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use aligned_alloc() as default allocator for halide_malloc() on most platforms #7189
Conversation
Not sure how useful this will be, but this will allow us to experiment to see if it pays for itself.
On platforms where it's available, why would we ever not use aligned_alloc? Is the target feature just for testing it out? |
For now, yes. (Long-term it would be preferable to move to making it mandatory on all platforms that support it, of course.) |
check(c(in2, offset4, out4)); | ||
}); | ||
const float megapixels = (W * H) / (1024.f * 1024.f); | ||
printf("Benchmark: %dx%d -> %f mpix/s for %s\n", W, H, megapixels / time, t.to_string().c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a performance test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe eventually? Need to see real-world examples first to see where we can expect to use it.
Thinking out loud... maybe the right option is actually to make aligned_alloc the default and have a |
PTAL |
|
||
void run_test(Target t) { | ||
// Must call this so that the changes in cached runtime are noticed | ||
Halide::Internal::JITSharedRuntime::release_all(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised we don't invalidate the cache if the target changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too. Want me to open an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL |
…latforms (See also #7189) Modify H::R::Buffer to default to using `aligned_alloc()` instead of `malloc()`, except: - If user code passes a non-null `allocate_fn` or `deallocate_fn`, we always use those (and/or malloc/free) - If the code is compiling under MSVC, never use `aligned_alloc` (Windows doesn't support it) - If HALIDE_RUNTIME_BUFFER_USE_ALIGNED_ALLOC is defined to be 0, never use `aligned_alloc` (this is to allow for usage on e.g. older Android and OSX versions which don't provide `aligned_alloc()` in the stdlib, regardless of C++ versions.) Also, as with #7189, this ensures that the allocated space has the start of the host data as 128-aligned, and also now ensures that the size allocated 128-aligned (rounding up as needed).
@steven-johnson - Thanks for this. You are right, the pointer math tricks can be avoided. I think we can replace this with memalign on QuRT. I am working from India at present but only for a few days more before I am out on vacation, so, I'll get to this for Qurt once I am back in early Jan. |
Currently, halide_malloc_alignment() is WEAK_INLINE, so even if you attempt to override it, it won't have any effect (the call sites in halide_malloc() will have inlined the implementation selected by LLVM_Runtime_Linker). This promotes halide_malloc_alignment() to a documented piece of the runtime you can override, with the caveat that it must return the same value every time it is called. To reduce overhead in halide_malloc(), we copy the value into a global at startup time and use that instead. Also, added the (oops) missing documentation to HalideRuntime.h about the new alignment requirements.
Currently, halide_malloc_alignment() is WEAK_INLINE, so even if you attempt to override it, it won't have any effect (the call sites in halide_malloc() will have inlined the implementation selected by LLVM_Runtime_Linker). This promotes halide_malloc_alignment() to a documented piece of the runtime you can override, with the caveat that it must return the same value every time it is called. To reduce overhead in halide_malloc(), we copy the value into a global at startup time and use that instead. By itself, this isn't that useful, but it's essential to have in order for #7189 to be robust; I'm pulling this change into a separate PR so that it can land separately, enabling downstream users to be able to update their halide_malloc() implementations ahead of the "real" change.
halide_malloc_alignment was weak_inline so that it would resolve to a constant at compile time, which makes all the masking etc operations on constants. Making it overridable adds a function call overhead to halide_malloc and possibly makes the alignment math more expensive. I don't think it's worth it. Users that want custom alignment should just override halide_malloc entirely. Edit: Ignore the above, I only read the title and reacted to that and didn't look at the details of the PR. |
Closing this in favor of #7206 |
…latforms (#7190) Use aligned_alloc() as default allocator for HalideBuffer.h on most platforms (See also #7189) Modify H::R::Buffer to default to using `aligned_alloc()` instead of `malloc()`, except: - If user code passes a non-null `allocate_fn` or `deallocate_fn`, we always use those (and/or malloc/free) - If the code is compiling under MSVC, never use `aligned_alloc` (Windows doesn't support it) - If HALIDE_RUNTIME_BUFFER_USE_ALIGNED_ALLOC is defined to be 0, never use `aligned_alloc` (this is to allow for usage on e.g. older Android and OSX versions which don't provide `aligned_alloc()` in the stdlib, regardless of C++ versions.) Also, as with #7189, this ensures that the allocated space has the start of the host data as 128-aligned, and also now ensures that the size allocated 128-aligned (rounding up as needed).
…latforms (halide#7190) Use aligned_alloc() as default allocator for HalideBuffer.h on most platforms (See also halide#7189) Modify H::R::Buffer to default to using `aligned_alloc()` instead of `malloc()`, except: - If user code passes a non-null `allocate_fn` or `deallocate_fn`, we always use those (and/or malloc/free) - If the code is compiling under MSVC, never use `aligned_alloc` (Windows doesn't support it) - If HALIDE_RUNTIME_BUFFER_USE_ALIGNED_ALLOC is defined to be 0, never use `aligned_alloc` (this is to allow for usage on e.g. older Android and OSX versions which don't provide `aligned_alloc()` in the stdlib, regardless of C++ versions.) Also, as with halide#7189, this ensures that the allocated space has the start of the host data as 128-aligned, and also now ensures that the size allocated 128-aligned (rounding up as needed).
C++17 includes C11, which includes
aligned_alloc
as part of the standard library. We should move to prefer using it overmalloc()
inhalide_malloc()
, since we currently have to do an overallocate-and-adjust trick with malloc() to get what we want.We can't move to it everywhere, alas, because:
aligned_alloc
is offered there and if so, what version. (Would welcome input from @pranavb-ca et al on this)That said, on ~all modern Unixy platforms, it should be available and arguably preferable, so this PR makes it the default, with a
no_aligned_alloc
feature to allow dropping back to using plain malloc()-with-hackery. (This option is ignored for Windows and QuRT, of course.)It's worth pointing out that this subtlely modifies the (implicit) contract of
halide_malloc
: in addition to guaranteeing that the returned pointer is aligned tohalide_malloc_alignment()
, thealigned_alloc()
allocator also requires/guarantees that the memory returned is an integral number of align-sizes long (rounding up, obviously). In order to avoid subtle behavior differences between the two malloc-based allocators remaining (posix and qurt) I modified them to also make the same guarantee. (I guess I should document this and make sure that all the overrides ofhalide_malloc()
I know of in Google and other places make this same guarantee...)I considered adding a new
halide_aligned_alloc()
runtime function instead, and then migrating all the calls we make for buffer memory there (since that's the only place we really care about this alignment), leavinghalide_malloc()
to be used for whatever places we need to allocate memory that doesn't need special alignment... but (1) I'm not sure if there are enough of the latter to be worthwhile, and (2) we'd have to 'dumb down'halide_malloc()
to make the exercise worthwhile, and that would leave us risk of hard-to-repro crashes if we got one of the allocations wrong.This will likely need some torture testing, since it's possible that some platforms offer
aligned_alloc
but with inferior performance.