-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge latest upstream bits into our fork #7
Commits on Mar 31, 2022
-
-mtune=
/-mcpu=
support for x86 AMD CPU's (halide#6655)* `-mtune=`/`-mcpu=` support for x86 AMD CPU's * Move processor tune into it's own enum, out of features * clang-format * Target: make Processor more optional * Processor: add explanatory comments which CPU is what * Drop outdated changes * Make comments in Processor more readable / fix BtVer2 comment * Target: don't require passing Processor * Make processor more optional in the features string serialization/verification * Address review notes * Undo introduction of halide_target_processor_t * Fix year for btver2/jaguar
Configuration menu - View commit details
-
Copy full SHA for 40f895d - Browse repository at this point
Copy the full SHA 40f895dView commit details
Commits on Apr 1, 2022
-
Fix GPU depredication/scalarization (halide#6669)
* Scalarize predicated Loads * Cleanup * Fix gpu_vectorize scalarization for D3D12 * Fix OpenCL scalarization * Minor fixes * Formatting * Address review comments * Move Shuffle impl to CodeGen_GPU_C class * Extra space removal Co-authored-by: Shoaib Kamil <kamil@adobe.com>
Configuration menu - View commit details
-
Copy full SHA for f56614e - Browse repository at this point
Copy the full SHA f56614eView commit details
Commits on Apr 5, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 43af5b6 - Browse repository at this point
Copy the full SHA 43af5b6View commit details -
Configuration menu - View commit details
-
Copy full SHA for fdd6500 - Browse repository at this point
Copy the full SHA fdd6500View commit details -
Fix ctors for Realization (halide#6675)
For vector-of-Buffers, the ctor took a non-const ref to the argument, which was weird and nonsensical. Replaced with a const-ref version and and an rvalue-ref version; it turns out that literally *all* of the internal calls were able to use the latter, trivially saving some copies.
Configuration menu - View commit details
-
Copy full SHA for 9866df2 - Browse repository at this point
Copy the full SHA 9866df2View commit details
Commits on Apr 6, 2022
-
-mtune=native
CPU autodetection for AMD Zen 3 CPU (halide#6648)* `-mtune=native` CPU autodetection for AMD Zen 3 CPU * Address review notes. * Fix MSVC build * Address review notes
Configuration menu - View commit details
-
Copy full SHA for 72ad2e6 - Browse repository at this point
Copy the full SHA 72ad2e6View commit details -
Bump development Halide version to 15.0.0 (halide#6678)
* Bump development Halide version to 15.0.0 * trigger buildbots
Configuration menu - View commit details
-
Copy full SHA for 12270a5 - Browse repository at this point
Copy the full SHA 12270a5View commit details -
Clean up Python extensions in python_bindings (halide#6670)
* Remove the nobuild/partialbuildmethod tests from python_bindings/ They no longer serve a purpose and are redundant to other tests. * WIP * Update pystub.py * wip * wip * wip * Update TargetExportScript.cmake * Update PythonExtensionHelpers.cmake * PyExtensionGen didn't handle zero-dimensional buffers
Configuration menu - View commit details
-
Copy full SHA for ad0408e - Browse repository at this point
Copy the full SHA ad0408eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 1d1b556 - Browse repository at this point
Copy the full SHA 1d1b556View commit details
Commits on Apr 7, 2022
-
Fix "set but not used" warnings/errors (halide#6683)
* Fix "set but not used" warnings/errors Apparently XCode 13.3 has smarter warnings about unused code and emits warnings/errors for these, so let's clean them up. * Also fix missing `ssize_t` usage
Configuration menu - View commit details
-
Copy full SHA for fe96aaa - Browse repository at this point
Copy the full SHA fe96aaaView commit details -
Remove deprecated
Halide::Output
type (halide#6685)It was deprecated (in favor of `OutputFileType` in Halide 14; let's remove it entirely for Halide 15.
Configuration menu - View commit details
-
Copy full SHA for f64bd08 - Browse repository at this point
Copy the full SHA f64bd08View commit details
Commits on Apr 8, 2022
-
Remove deprecated
build()
support from Generators (halide#6684)This was deprecated in Halide 14; let's remove it entirely for Halide 15.
Configuration menu - View commit details
-
Copy full SHA for e549be7 - Browse repository at this point
Copy the full SHA e549be7View commit details -
Drop support for LLVM12 (halide#6686)
* Drop support for LLVM12 Halide 15 only needs to support LLVM13 and LLVM13. Drop all the special-casing for LLVM12. * Update packaging.yml * Update presubmit.yml * 13 * more * Update presubmit.yml * woo * Update presubmit.yml * Update run-clang-tidy.sh * Update run-clang-tidy.sh * Update .clang-tidy * Update .clang-tidy * wer * Update Random.cpp * wer * sdf * sdf * Update packaging.yml
Configuration menu - View commit details
-
Copy full SHA for b5840f7 - Browse repository at this point
Copy the full SHA b5840f7View commit details -
Upgrade to clang-format 13 (halide#6689)
Goal here: eliminate the need for a local version of llvm/clang-12, and don't stay too far behind the toolchain. As always, clang-format doesn't promise backwards compatibility, but the main differences in formatting are: - more regularization of spaces at the start of comments (I like this change) - minor difference of formatting of function-pointer-type declarations (not a fan of this, but I can't find a way to disable it and it's only really used in a handful of place in the Python bindings)
Configuration menu - View commit details
-
Copy full SHA for 887d340 - Browse repository at this point
Copy the full SHA 887d340View commit details -
Always mark _ucon as 'unused' in Codegen_C (halide#6691)
* Always mark _ucon as 'unused' in Codegen_C, even if asserts are enabled, since generated closure functions may not use it * halide_unused -> halide_maybe_unused * fix test_internal * More halide_unused -> halide_maybe_unused
Configuration menu - View commit details
-
Copy full SHA for 54f3977 - Browse repository at this point
Copy the full SHA 54f3977View commit details
Commits on Apr 11, 2022
-
Configuration menu - View commit details
-
Copy full SHA for d568469 - Browse repository at this point
Copy the full SHA d568469View commit details -
Silence "unknown warning" in Clang 13 (halide#6693)
Clang 13 removed the `return-std-move-in-c++11` warning entirely, so specifying it now warns that the warning is unknown.
Configuration menu - View commit details
-
Copy full SHA for f906eba - Browse repository at this point
Copy the full SHA f906ebaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 08325a4 - Browse repository at this point
Copy the full SHA 08325a4View commit details
Commits on Apr 12, 2022
-
Faster
widening_mul(int16x, int16x) -> int32x
for x86 (AVX2 and SSE……2) (halide#6677) * add widening_mul using vpmaddwd for AVX2 * add vpmaddwd/pmaddwd test * add widening_mul with pmaddwd for SSE2
Configuration menu - View commit details
-
Copy full SHA for 3944fb0 - Browse repository at this point
Copy the full SHA 3944fb0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 009d86f - Browse repository at this point
Copy the full SHA 009d86fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4da8932 - Browse repository at this point
Copy the full SHA 4da8932View commit details -
Drop support for Matlab extensions (halide#6696)
* Drop support for Matlab extensions Anecdotally, this hasn't been used in ~years, and the original author (@dsharletg) had suggested dropping it a while back. I'm going to propose we go ahead and drop it for Halide 15 and see who complains. * Fixes for top-of-tree LLVM * Update force_include_types.cpp * trigger buildbots * Update CodeGen_LLVM.cpp
Configuration menu - View commit details
-
Copy full SHA for 3d7b977 - Browse repository at this point
Copy the full SHA 3d7b977View commit details -
llvm no longer wants a type suffix on vst intrinsics (halide#6701)
* llvm no longer wants a type suffix on vst intrinsics * Fix silly mistake * Change 64-bit only Co-authored-by: Andrew Adams <anadams@adobe.com>
Configuration menu - View commit details
-
Copy full SHA for 87c0cc9 - Browse repository at this point
Copy the full SHA 87c0cc9View commit details
Commits on Apr 13, 2022
-
Python: make Func implicitly convertible to Stage (halide#6702) (hali…
…de#6704) This allows for `compute_with` and `rfactor` to work more seamlessly in Python. Also: - Move two compute_with() variant bindings from PyFunc and PyStage to PyScheduleMethods, as they are identical between the two - drive-by removal of redundant `py::implicitly_convertible<ImageParam, Func>();` call
Configuration menu - View commit details
-
Copy full SHA for 77f7f5e - Browse repository at this point
Copy the full SHA 77f7f5eView commit details
Commits on Apr 14, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 60a909f - Browse repository at this point
Copy the full SHA 60a909fView commit details
Commits on Apr 19, 2022
-
Remove the last remaining call to getPointerElementType() (halide#6715)
* Remove the last remaining call to getPointerElementType() LLVM is moving to opaque pointers, we must have missed this one in previous work * ARM vst mangling needs to be conditional on opaque ptrs The fixes from last week regarding mangling of arm vst intrinsics needs to be made conditional on whether the pointer is opaque or not; this will change based on whether `-D CLANG_ENABLE_OPAQUE_POINTERS=ON|OFF` is defined when LLVM is built, but should be sniffed via this API, according to my LLVM contact. * Revert "ARM vst mangling needs to be conditional on opaque ptrs" This reverts commit 9901314.
Configuration menu - View commit details
-
Copy full SHA for 4df3c5d - Browse repository at this point
Copy the full SHA 4df3c5dView commit details -
ARM vst mangling needs to be conditional on opaque ptrs (halide#6716)
The fixes from last week regarding mangling of arm vst intrinsics needs to be made conditional on whether the pointer is opaque or not; this will change based on whether `-D CLANG_ENABLE_OPAQUE_POINTERS=ON|OFF` is defined when LLVM is built, but should be sniffed via this API, according to my LLVM contact.
Configuration menu - View commit details
-
Copy full SHA for 01ca823 - Browse repository at this point
Copy the full SHA 01ca823View commit details -
Combine string constants in combine_strings() (halide#6717)
* Combine string constants in combine_strings() This is a pretty trivial optimization, but when printing (or enabling `debug`), it cuts the number of `halide_string_to_string()` calls we generate by ~half. * Update IROperator.cpp
Configuration menu - View commit details
-
Copy full SHA for 65ba16e - Browse repository at this point
Copy the full SHA 65ba16eView commit details
Commits on Apr 20, 2022
-
Update CodeGen_PTX_Dev to use new PassManager (halide#6718)
* Update CodeGen_PTX_Dev to use new PassManager This was still using the LegacyPassManager for optimization, which will be going away at some point. (Code changes by @alinas; I'm just opening this PR on her behalf) * Fixes after review
Configuration menu - View commit details
-
Copy full SHA for 460c77e - Browse repository at this point
Copy the full SHA 460c77eView commit details -
Closure functions for parallel tasks should be internal, not external (…
…halide#6720) Minor optimization.
Configuration menu - View commit details
-
Copy full SHA for a07d3e4 - Browse repository at this point
Copy the full SHA a07d3e4View commit details -
Smarten type_of<> for fn ptrs; fix async_parallel for C backend (hali…
…de#6719) * Smarten type_of<> for fn ptrs; fix async_parallel for C backend (Fixes halide#2093) This basically just adds the right type annotations to make the parallel code produced by the C backend compile properly. This could have been fixed by inserted some brute-force void* casting into the C backend, but this felt a lot cleaner. The one thing here I'm a little unsure about is how I extended the Type code to be able to handle function-pointer types correctly; it works but doesn't feel very elegant. * Update Makefile * Update LowerParallelTasks.cpp * FunctionTypedef
Configuration menu - View commit details
-
Copy full SHA for 3b3e89e - Browse repository at this point
Copy the full SHA 3b3e89eView commit details
Commits on Apr 21, 2022
-
Remove legacy::FunctionPassManager usage in Codegen_PTX_Dev (halide#6722
Configuration menu - View commit details
-
Copy full SHA for accc644 - Browse repository at this point
Copy the full SHA accc644View commit details -
get_amd_processor()
: implement detection for the rest of supported ……AMD CPU's (halide#6711) I have *not* personally tested that these are detected correctly, Cross-reference between * https://github.com/llvm/llvm-project/blob/955cff803e081640e149fed0742f57ae1b84db7d/llvm/lib/Support/Host.cpp#L968-L1041 * https://github.com/llvm/llvm-project/blob/955cff803e081640e149fed0742f57ae1b84db7d/compiler-rt/lib/builtins/cpu_model.c#L520-L586 * https://github.com/gcc-mirror/gcc/blob/000c1b89d259fadb466e1f2e63c79da45fd17372/gcc/common/config/i386/cpuinfo.h#L111-L264
Configuration menu - View commit details
-
Copy full SHA for aa384af - Browse repository at this point
Copy the full SHA aa384afView commit details -
Add Func::output_type() method (halide#6724)
* Add Func::output_type() method * Add Python
Configuration menu - View commit details
-
Copy full SHA for 754018b - Browse repository at this point
Copy the full SHA 754018bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 85b9f29 - Browse repository at this point
Copy the full SHA 85b9f29View commit details
Commits on Apr 25, 2022
-
Deprecate variadic-template version of Realization ctor (halide#6695)
* Deprecate variadic-template version of Realization ctor The variadic-template approach was useful before C++11 (!) added brace initialization, but preferring an explicit vector-of-Buffer is arguably better, and provides better symmetry with the Python bindings. Also, some drive-by tweaks to other Realization methods. * Update PyPipeline.cpp * trigger buildbots
Configuration menu - View commit details
-
Copy full SHA for f5c77ce - Browse repository at this point
Copy the full SHA f5c77ceView commit details
Commits on Apr 26, 2022
-
Remove
rounding_halving_sub
and non-existent arm rhsub instructions (……halide#6723) * remove arm (s | u)rhsub instructions * remove rounding_halving_sub intrinsic entirely
Configuration menu - View commit details
-
Copy full SHA for 86a4a59 - Browse repository at this point
Copy the full SHA 86a4a59View commit details
Commits on Apr 27, 2022
-
Augment Halide::Func to allow for constraining Type and Dimensionality (
halide#6734) This enhances Func by allowing you to (optionally) constrain the type(s) of Exprs that the Func can contain, and/or the dimensionality of the Func. (Attempting to violate either of these will assert-fail.) There are a few goals here: - Enhanced code readability; in cases where a Func's values may not be obvious from the code flow, this can allow an in-code way of declaring it (rather than via comments) - Enhanced type enforcement; specifying constraints allows us to fail in type-mismatched compilations somewhat sooner, with somewhat better error messages. - Better symmetry for AOT/JIT code generation with ImageParam, in which the inputs (ImageParam) have a way to specify the required concrete type, but the outputs (Funcs) don't. If this is accepted, then subsequent changes will likely add uses where it makes sense (e.g., the Func associated with an ImageParam should always have both type and dimensionality specified since it will always be well-known). Note that this doesn't add any C++ template class for static declarations (e.g. `FuncT<float, 2>` -> `Func(Float(32), 2)`); these could be added later if desired.
Configuration menu - View commit details
-
Copy full SHA for 799c546 - Browse repository at this point
Copy the full SHA 799c546View commit details
Commits on Apr 28, 2022
-
More typed-Func work (halide#6735)
- Allow Func output_type(s)(), outputs(), dimensions(), and output_buffer(s)() to be called on undefined Funcs if the Func has required_type and required_dimensions specified. This allows for greater flexibility in defining pipelines in which you may want to set or examine constraints on a Func that hasn't been defined yet; previously this required restructuring code or other awkwardness. - Ensure that the Funcs that are defined for ImageParams and Generator fields define the types-and-dims when known. - Add some tests.
Configuration menu - View commit details
-
Copy full SHA for 00f4b29 - Browse repository at this point
Copy the full SHA 00f4b29View commit details -
Add missing #include <functional> in ThreadPool.h (halide#6738)
* Add missing #include <function> in ThreadPool.h * Update ThreadPool.h
Configuration menu - View commit details
-
Copy full SHA for fc0f4ed - Browse repository at this point
Copy the full SHA fc0f4edView commit details -
Fix regression from halide#6734 (halide#6739)
That change inadvertently required the RHS of an update stage that used `+=` (or similar operators) to match the LHS type, which should be required (implicit casting of the RHS is expected). Restructured to remove this, but still ensure that auto-injection of a pure definition matches the required types (if any), and updated tests.
Configuration menu - View commit details
-
Copy full SHA for 41b2d07 - Browse repository at this point
Copy the full SHA 41b2d07View commit details
Commits on Apr 30, 2022
-
Configuration menu - View commit details
-
Copy full SHA for e6260a8 - Browse repository at this point
Copy the full SHA e6260a8View commit details -
Configuration menu - View commit details
-
Copy full SHA for f376cbb - Browse repository at this point
Copy the full SHA f376cbbView commit details
Commits on May 2, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 92dfb61 - Browse repository at this point
Copy the full SHA 92dfb61View commit details
Commits on May 4, 2022
-
Revise PyStub calling convention for GeneratorParams (halide#6742)
This is a rethink of halide#6661, trying to make it saner in anticipation of the ongoing Python Generator work. TL;DR: instead of mixing GeneratorParams in with the rest of the keywords, segregate them into an optional `generator_params` keyword argument, which is a plain Python dict. This neatly solves a couple of problems: - synthetic params with funky names aren't a problem anymore. - error reporting is simpler because before an unknown keyword could have been intended to be a GP or an Input. - GP values are now clear and distinct from Inputs, which is IMHO a good thing. This is technically a breaking change, but I doubt anyone will notice; this is mainly here to get a sane convention in place for use with Python Generators as well. Also, a drive-by change to Func::output_types() to fix the assertion error message.
Configuration menu - View commit details
-
Copy full SHA for 1606039 - Browse repository at this point
Copy the full SHA 1606039View commit details
Commits on May 5, 2022
-
Silence "may be used uninitialized" in Buffer::for_each_element() (ha…
…lide#6747) In at least one version of GCC (Debian 11.2.0-16+build1), an optimized build using `Buffer::for_each_element(int *pos)` will give (incorrect) compiler warnings/errors that "pos may be used uninitialized). From inspection of the code I feel pretty sure this is a false positive -- i.e., the optimizer is confused -- and since no other compiler we've encountered issues a similar warning (nor do we see actual misbehavior), I'm inclined not to worry -- but the warning does break some build configurations. Rather than try to fight with selectively disabling this warning, I'm going to propose inserting a memset() here to reassure the compiler that the memory really is initialized; while it's unnnecessary, it's likely to be insignificant compared to the cost of usual calls to for_each_element(). (BTW, this is not a new issue, I've seen it for quite a while as this GCC is the default on one of my Linux machines... it just finally annoyed me enough to want to make it shut up.)
Configuration menu - View commit details
-
Copy full SHA for c8531a5 - Browse repository at this point
Copy the full SHA c8531a5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 557690e - Browse repository at this point
Copy the full SHA 557690eView commit details -
Update hannk README link to hosted models page (halide#6749)
The current one is being sunsetted
Configuration menu - View commit details
-
Copy full SHA for 6fbf203 - Browse repository at this point
Copy the full SHA 6fbf203View commit details
Commits on May 6, 2022
-
Add a
HalideError
base class to Python bindings (halide#6750)* Add a `HalideError` base class to Python bindings Per suggestion from @alexreinking, this remaps all exceptions thrown by the Halide Python bindings to be `halide.HalideError` (or a subclass thereof), rather than plain old `RuntimeError`. * Remove scalpel left in patient * Don't use a subclass for PyStub error handling
Configuration menu - View commit details
-
Copy full SHA for 47d8103 - Browse repository at this point
Copy the full SHA 47d8103View commit details
Commits on May 9, 2022
-
Deprecate GeneratorContext getters with
get_
prefix (halide#6753)Minor hygiene: most getters in Halide don't have a `get_` prefix. These are very rarely used (only one instance in our test suite I could find) but, hey, cleanliness.
Configuration menu - View commit details
-
Copy full SHA for a986078 - Browse repository at this point
Copy the full SHA a986078View commit details
Commits on May 10, 2022
-
Add GeneratorFactoryProvider to generate_filter_main() (halide#6755)
* Add GeneratorFactoryProvider to generate_filter_main() This provides hooks to allow overriding the Generator(s) that generate_filter_main() can use; normally it defaults to the global registry of C++ Generators, but this allows for (e.g.) alternate-language-bindings to selectively override this (e.g. to enumerate only Generators that are visible in that language, etc). (No visible change in behavior from this PR; this is just cherry-picked from work-in-progress elsewhere to simplify review & merge) * Update Generator.cpp * Fix error handling
Configuration menu - View commit details
-
Copy full SHA for a2e89d8 - Browse repository at this point
Copy the full SHA a2e89d8View commit details -
Deprecate disable_llvm_loop_opt (halide#4113) (halide#6754)
This PR proposes to (finally) deprecate disable_llvm_loop_opt: - make LLVM codegen default to no loop optimization; you must use enable_llvm_loop_opt explicitly to enable it - disable_llvm_loop_opt still exists, but does nothing (except issue a user_warning that the feature is deprecated) - Remove various uses of disable_llvm_loop_opt - Add comments everywhere that the default is different in Halide 15 and that the disable_llvm_loop_opt feature will be removed entirely in Halide 16 Note that all Halide code at Google has defaulted to having disable_llvm_loop_opt set for ~years now, so this is a well-tested codepath, and consensus on the Issue seemed to be that this was a good move.
Configuration menu - View commit details
-
Copy full SHA for b38b661 - Browse repository at this point
Copy the full SHA b38b661View commit details
Commits on May 13, 2022
-
Minor metadata-related cleanups (halide#6759)
(Harvested from halide#6757, which probably won't land) - Add clarifying comment/reference in Generator - Add assertion to compile_to_multitarget() function - Fix misleading/wrong code in correctness_compile_to_multitarget
Configuration menu - View commit details
-
Copy full SHA for 4ab4ad9 - Browse repository at this point
Copy the full SHA 4ab4ad9View commit details -
Expand the x86 SIMD variants tested in correctness_vector_reductions (h…
…alide#6762) A recent bug in LLVM codegen was missed because it only affected x86 architectures with earlier-than-AVX2 SIMD enabled; it didn't show up for AVX2 or later. This revamps correctness_vector_reductions to re-run multiple times when multiple SIMD architectures are available on x86 systems. (correctness_vector_reductions was chosen here because it reliably demonstrated the specific failures in this case.)
Configuration menu - View commit details
-
Copy full SHA for 09a986e - Browse repository at this point
Copy the full SHA 09a986eView commit details
Commits on May 16, 2022
-
Fix Param<T>::set_estimate for T=void (halide#6766)
* Fix Param<T>::set_estimate for T=void * Add tests
Configuration menu - View commit details
-
Copy full SHA for cc41e65 - Browse repository at this point
Copy the full SHA cc41e65View commit details -
add_python_aot_extension should use FUNCTION_NAME for the .so output … (
halide#6767) add_python_aot_extension should use FUNCTION_NAME for the .so output (otherwise you can't produce multiple aot extensions from the same Generator)
Configuration menu - View commit details
-
Copy full SHA for 25a3272 - Browse repository at this point
Copy the full SHA 25a3272View commit details
Commits on May 18, 2022
-
Update the list of fused_pairs and run validate_fused_group for speca…
…lization definitions too (halide#6770) * Update the list of fused_pairs and run validate_fused_group for specialization definitions too. Fixes halide#6763. * Address review comments * Add const to auto&
Configuration menu - View commit details
-
Copy full SHA for 13a5470 - Browse repository at this point
Copy the full SHA 13a5470View commit details
Commits on May 19, 2022
-
Add Func::type()/types(), deprecate Func::output_type()/output_types() (
halide#6772) * rename GIOBase::type() and friends * Func::output_type() -> Func::type() * Add type() forwarders for inputs * Add Func::dimensions() wrapper * Update Func.h
Configuration menu - View commit details
-
Copy full SHA for 61f6af7 - Browse repository at this point
Copy the full SHA 61f6af7View commit details -
Fix fundamental confusion about target/tune CPU (halide#6765)
* Fix fundamental confusion about target/tune CPU Sooo. Uh, remember when in halide#6655 we've agreed that we want to add support to precisely specify the CPU for which the code should be *tuned* for, but not *targeted* for. Aka, similar to clang's `-mtune=` option, that does not affect the ISA set selection? So guess what, that's not what we did, apparently. `CodeGen_LLVM::mcpu()` / `halide_mcpu` actually do specify the *target* CPU. It was obvious in retrospect, because e.g. `CodeGen_X86::mattrs()` does not, in fact, ever specify `+avx2`, yet we get AVX2 :) So we've unintentionally added `-march=` support. Oops. While i'd like to add `-march=` support, that was not the goal here. Fixing this is complicated by the fact that `llvm::Target::createTargetMachine()` only takes `CPU Target` string, you can't specify `CPU Tune`. But this is actually a blessing in disguise, because it allows us to fix another bug at the same time: There is a problem with halide "compile to llvm ir assembly", a lot of information from Halide Target is not //really// lowered into LLVM Module, but is embedded as a metadata, that is then extracted by halide `make_target_machine()`. While that is not a problem in itself, it makes it *impossible* to dump the LLVM IR, and manually play with it, because e.g. the CPU [Target] and Attributes (ISA set) are not actually lowered into the form LLVM understands, but are in some halide-specific metadata. So, to fix the first bug, we must lower the CPU Tune into per-function `"tune-cpu"` metadata, and while there we might as well lower `"target-cpu"` and `"target-features"` similarly. * Address review notes * Hopefully silence bogus issue reported by ancient GCC * Call `set_function_attributes_from_halide_target_options()` when JIT compiling * Fix grammar
Configuration menu - View commit details
-
Copy full SHA for b5f024f - Browse repository at this point
Copy the full SHA b5f024fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 56acc6e - Browse repository at this point
Copy the full SHA 56acc6eView commit details
Commits on May 23, 2022
-
Add execute_generator() API (halide#6771)
This refactors the existing `generate_filter_main()` call in two, moving the interesting implementation of how to drive AOT into the new `execute_generator()` call (reducing `generate_filter_main()` to parsing argc/argv and error reporting). The new `execute_generator()` is intended to be used (eventually) from Python, as a way to drive Generator compilation from a Python script more easily. The PR doesn't provide a Python wrapper for this call yet (that will come in a subsequent PR). Also, a drive-by removal of the "error_output" arg to generate_filter_main() -- AFAICT, no one has ever used it for anything but stderr, and the refactoring now just directs all errors to `user_error` uniformly.
Configuration menu - View commit details
-
Copy full SHA for d973993 - Browse repository at this point
Copy the full SHA d973993View commit details -
Allow overriding of
Generator::init_from_context()
for debug purpos……es (halide#6760) * Allow overriding of `Generator::init_from_context()` for debug purposes * Update Generator.h * Attempt to clarify contract
Configuration menu - View commit details
-
Copy full SHA for 83a90e7 - Browse repository at this point
Copy the full SHA 83a90e7View commit details
Commits on May 24, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ad1e7f6 - Browse repository at this point
Copy the full SHA ad1e7f6View commit details
Commits on May 26, 2022
-
[miscompile] Don't de-negate and change direction of shifts-by-unsign…
…ed (halide#6782) I'm afraid the problem is really obvious: https://github.com/halide/Halide/blob/b5f024fa83b6f1cfe5e83a459c9378b7c5bf096d/src/CodeGen_LLVM.cpp#L2628-L2649 ^ the shift direction is treated as flippable by the codegen iff the shift amount is signed :) The newly-added test fails without the fix. I've hit this when writing tests for halide#6775
Configuration menu - View commit details
-
Copy full SHA for d0c53fa - Browse repository at this point
Copy the full SHA d0c53faView commit details
Commits on May 27, 2022
-
Move some options from execute_generator back to generate_filter_main (…
…halide#6787) Loading plugins and setting the default autoscheduler name both change global state, which isn't a desirable fit for execute_generator(), since it's not intended to mutate global state. (Mutating the state from a main() function is of course a reasonable thing to do.)
Configuration menu - View commit details
-
Copy full SHA for 0f7d548 - Browse repository at this point
Copy the full SHA 0f7d548View commit details -
LLVM codegen: register AA pipeline if LLVM is older than 14 (halide#6785
) It's the default after https://reviews.llvm.org/D113210 / llvm/llvm-project@1331728, but still needs to be done for earlier LLVM's. Refs. halide#6783 Refs. halide#6718 Partially reverts halide#6718
Configuration menu - View commit details
-
Copy full SHA for 3ba2f94 - Browse repository at this point
Copy the full SHA 3ba2f94View commit details
Commits on May 31, 2022
-
halide_type_of<>() should always be constexpr (halide#6790)
The ones in HalideRuntime.h have been marked constexpr for a while, but the ones in Float16.h got missed
Configuration menu - View commit details
-
Copy full SHA for 25f615d - Browse repository at this point
Copy the full SHA 25f615dView commit details -
Define an AbstractGenerator interface (halide#6637)
* AbstractGenerator (rebased, v3) * Update AbstractGenerator.h * clang-format * Update Generator.cpp * IOKind -> ArgInfoKind * Various cleanups of AbstractGenerator * clang-format * fix pystub * Update abstractgeneratortest_generator.cpp * dead code * ArgInfoDirection * cleanup * Delete PyGenerator.cpp * Update PyStubImpl.cpp * Update PyStubImpl.cpp * Fixes from review comments * Remove `get_` prefix from getters in AbstractGenerator * Missed some fixes * Fixes * Add GeneratorFactoryProvider for generate_filter_main() * Add GeneratorFactoryProvider to generate_filter_main() This provides hooks to allow overriding the Generator(s) that generate_filter_main() can use; normally it defaults to the global registry of C++ Generators, but this allows for (e.g.) alternate-language-bindings to selectively override this (e.g. to enumerate only Generators that are visible in that language, etc). (No visible change in behavior from this PR; this is just cherry-picked from work-in-progress elsewhere to simplify review & merge) * Update Generator.cpp * fixes * Update Generator.cpp * Restore build_module() and build_gradient_module() methods * Update Generator.h * fixes * Update Generator.cpp * Update AbstractGenerator.h
Configuration menu - View commit details
-
Copy full SHA for 74d9909 - Browse repository at this point
Copy the full SHA 74d9909View commit details -
hexagon_scatter test should run only if target has HVX (halide#6793)
It will run otherwise, but is slow on some other targets; rather than trying to (e.g.) shard it, just skip it
Configuration menu - View commit details
-
Copy full SHA for 255ff18 - Browse repository at this point
Copy the full SHA 255ff18View commit details -
Add Target support for architectures with implementation specific vec…
…tor size. (halide#6786) Move vector_bits_* Target support from fixed_width_vectors branch to make smaller PRs.
Zalman Stern committedMay 31, 2022 Configuration menu - View commit details
-
Copy full SHA for 76793b4 - Browse repository at this point
Copy the full SHA 76793b4View commit details
Commits on Jun 1, 2022
-
slow tests should support sharding (halide#6780)
* slow tests should support sharding The simd_op_check test suite is pretty slow (especially for wasm, where it is interpreted); at one point we tried to use ThreadPool to speed it up, but too many pieces of Halide IR aren't threadsafe and we disabled it long ago. This removes the ThreadPool usage entirely, and instead adds support for the GoogleTest 'sharded test' protocol, which uses certain env vars to allow a test to opt in for splitting its test into smaller pieces. At present our buildbot isn't attempting to make use of this feature, but it will be a big win for downstream usage in Google, where tests that run "too long" are problematic and splitting them into multiple shards makes various day to day activiites much more pleasant.
Configuration menu - View commit details
-
Copy full SHA for 2b29bde - Browse repository at this point
Copy the full SHA 2b29bdeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 4f2251c - Browse repository at this point
Copy the full SHA 4f2251cView commit details -
Pacify clang-tidy (halide#6796)
* Pacify clang-tidy Newer versions can warn about "parameter 'f' shadows member inherited from type 'StubOutputBufferBase'", etc -- easy enough * Update .clang-tidy
Configuration menu - View commit details
-
Copy full SHA for e832c4f - Browse repository at this point
Copy the full SHA e832c4fView commit details
Commits on Jun 2, 2022
-
Silence a "possibly uninitialized" warning (halide#6797)
* Silence a "possibly uninitialized" warning At least one compiler thinks we can use this without initialization, which isn't true, but this silences it. * trigger buildbots
Configuration menu - View commit details
-
Copy full SHA for 00b5728 - Browse repository at this point
Copy the full SHA 00b5728View commit details -
Make all tests default to
-fvisibility=hidden
(halide#6799)* Step 1 * still more * Export the error classes so they can be caught
Configuration menu - View commit details
-
Copy full SHA for 8b31327 - Browse repository at this point
Copy the full SHA 8b31327View commit details
Commits on Jun 6, 2022
-
Configuration menu - View commit details
-
Copy full SHA for f712f4f - Browse repository at this point
Copy the full SHA f712f4fView commit details -
Fix auto_schedule/machine_params parsing (halide#6804)
The recent refactoring that added `execute_generator` accidentally nuked setting these two GeneratorParams. Oops. Fixed.
Configuration menu - View commit details
-
Copy full SHA for 0ec2740 - Browse repository at this point
Copy the full SHA 0ec2740View commit details
Commits on Jun 14, 2022
-
Rewrite strided loads of 4 in AlignLoads (halide#6806)
* Rewrite strided loads of 4 in AlignLoads * Add a check for strided 4 load
Configuration menu - View commit details
-
Copy full SHA for ce75862 - Browse repository at this point
Copy the full SHA ce75862View commit details -
Fix two minor bugs triggered by an or reduction with early-out (halid…
…e#6807) * Fix two minor bugs triggered by a or reduction with early-out * Gotta print success * Appease clang-tidy
Configuration menu - View commit details
-
Copy full SHA for fc0f1f7 - Browse repository at this point
Copy the full SHA fc0f1f7View commit details
Commits on Jun 27, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 0e17e67 - Browse repository at this point
Copy the full SHA 0e17e67View commit details -
Add support for vscale vector code generation. (halide#6802)
Add support for vscale vector code generation. Factored from the fixed_length_vectors branch to make PRs smaller and easier to review. This will be used to support the ARM SVE/SVE2 and RISC V Vector architectures.
Zalman Stern committedJun 27, 2022 Configuration menu - View commit details
-
Copy full SHA for 9e5c5ce - Browse repository at this point
Copy the full SHA 9e5c5ceView commit details
Commits on Jun 28, 2022
-
Rework .gitignore (halide#6822)
* reorganize .gitignore * Add exclusions for CMake build * .gitignore: comment, drop stale rules * fully and precisely exclude CMake build tree * add debugging directions to .gitignore * ignore CMake install tree * Sort groups
Configuration menu - View commit details
-
Copy full SHA for feba77c - Browse repository at this point
Copy the full SHA feba77cView commit details -
Configuration menu - View commit details
-
Copy full SHA for e0a9825 - Browse repository at this point
Copy the full SHA e0a9825View commit details -
Configuration menu - View commit details
-
Copy full SHA for c12f8a5 - Browse repository at this point
Copy the full SHA c12f8a5View commit details -
Tweak python apps for better Blaze/Bazel compatibility (halide#6823)
* Tweak python apps/tutorials for better Blaze/Bazel compatibility - Don't write to current directory (rely on an env var to say where to write) - Don't read from arbitrary absolute paths (again, rely on an env var) - Drive-by removal of unnecessary #include in Codegen_LLVM.cpp inside a lambda (!) * Recommended fixes * Revert all changes to tutorial * Revise apps * Remove apps_helpers.py
Configuration menu - View commit details
-
Copy full SHA for 3e142cf - Browse repository at this point
Copy the full SHA 3e142cfView commit details
Commits on Jun 29, 2022
-
Change stub module names in Python to be _pystub rather than _stub (h…
…alide#6830) This is a bit finicky, but making this the default nomenclature will make some downstream usages less ambiguous and a bit easier to manage. (Yes, I realize that halide#6821 removes the Makefile entirely, but until it lands, it needs fixing there too.)
Configuration menu - View commit details
-
Copy full SHA for d36cd04 - Browse repository at this point
Copy the full SHA d36cd04View commit details
Commits on Jun 30, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ece5fb7 - Browse repository at this point
Copy the full SHA ece5fb7View commit details -
Remove Python bindings from Makefiles (halide#6821)
* Remove Python bindings from Makefiles * Restore test_li2018 in Makefile (now C++-only) * Add dummy `test_python` target for buildbots
Configuration menu - View commit details
-
Copy full SHA for 60d2b98 - Browse repository at this point
Copy the full SHA 60d2b98View commit details -
Add a new, alternate JIT-call convention (halide#6777)
* Prototype of revised JIT-call convention Experiment to try out a way to call JIT code in C++ using the same calling conventions as AOT code. Very much experimental. * Update Pipeline.h * Add Python support for `compile_to_callable` + make empty_ucon static * Update PyCallable.cpp * Update buffer.py * wip * Update callable.py * WIP * Update custom_allocator.cpp * Update Callable.cpp * Add Generator support for Callables * Update Generator.cpp * Update PyPipeline.cpp * Fixes * Update callable.cpp * Update CMakeLists.txt * create_callable_from_generator * More cleanup * Update Generator.cpp * Fix Python bounds inference * Add Python wrapper for create_callable_from_generator() + Add kwarg support for Callable * Add set_generatorparam_values() + usage * Fix auto_schedule/machine_params parsing The recent refactoring that added `execute_generator` accidentally nuked setting these two GeneratorParams. Oops. Fixed. * Move the type-checking code into a constexpr code * Update Callable.h * clang-tidy * CLANG-TIDY * Add `make_std_function`, + more general cleanup * Update example_jittest.cpp * Update Callable.h * Update Callable.h * More tweaking, smaller CallCheckInfo * Still more cleanup * make_std_function now does Buffer type/dim checking where possible * Add tests for calling `AbstractGenreator::compile_to_callable()` directly * enable exports * Various fixes * Improve fill_slot for Halide::Buffer * kill report_if_error * Update callable_bad_arguments.cpp * Update Pipeline.cpp * Revise error handling * Update Callable.cpp * Update callable.py * Update callable_generator.cpp * Update callable.py * HALIDE_MUST_USE_RESULT -> HALIDE_FUNCTION_ATTRS for Callable
Configuration menu - View commit details
-
Copy full SHA for fac313e - Browse repository at this point
Copy the full SHA fac313eView commit details -
Configuration menu - View commit details
-
Copy full SHA for b2771c1 - Browse repository at this point
Copy the full SHA b2771c1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6838db0 - Browse repository at this point
Copy the full SHA 6838db0View commit details
Commits on Jul 1, 2022
-
Disable testing for apps/linear_algebra on x86-32-linux/Make (halide#…
…6836) * Disable testing for apps/linear_algebra on x86-32-linux/Make This wasn't biting us before because we were disabling *all* apps/ on x86-32-linux (oops); the recent change to remove python testing under Make also re-enabled this test. TL;DR: this can probably be made to work somehow, but it's not worth debugging, since that case is both pretty nice, and already covered under CMake. It's literally not worth the time to fix. * Update Makefile
Configuration menu - View commit details
-
Copy full SHA for 23a1fa8 - Browse repository at this point
Copy the full SHA 23a1fa8View commit details -
Rearrange subdirectories in python_bindings (halide#6835)
This is intended to facilitate a few things: - Move all Generators used in tests, apps, etc to a single directory to simplify the build rules (this is especially useful for the work in halide#6764) - Put all the test and apps stuff under a single directory to facilitate adding some Python packaging that can make integration into Bazel/Blaze builds a bit less painful @alexreinking, does this look like the layout we discussed before?
Configuration menu - View commit details
-
Copy full SHA for 23c4cf1 - Browse repository at this point
Copy the full SHA 23c4cf1View commit details
Commits on Jul 11, 2022
-
Better lowering of halving_sub and rounding_halving_add (halide#6827)
* Better lowering of halving_sub and rounding_halving_add Previously, lower_halving_sub and lower_rounding_halving_add both used 9 ops. This change redirects halving_sub to use rounding_halving_add, and redirects rounding_halving_add to use halving_add. In the case that none of these instructions exist natively, this reduces it to 7/8 ops for signed/unsigned halving sub and 6 ops for rounding halving add. More importantly, this lets halving_sub make use of pavgw/b on x86 to reduce it to 3 ops for u8 and u16 inputs. * Make signed rounding_halving_add on x86 use pavgb/w too * Cast result back to signed * Add explanatory comment * Fix comment * Add explanation of signed case
Configuration menu - View commit details
-
Copy full SHA for 29ebde9 - Browse repository at this point
Copy the full SHA 29ebde9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8159dd3 - Browse repository at this point
Copy the full SHA 8159dd3View commit details -
Remove Generator::value_tracker and friends (halide#6845)
This is an internal-to-Generator helper that is used to try to detect certain classes of errors when using GeneratorStubs. To the best of my knowledge, it has ~never found a useful error in all of its existence; combined with the very limited usage of GeneratorStubs, I think this code no longer pays for itself, and should be removed. (Note that this was never externally visible, thus no deprecation warnings should be necessary.)
Configuration menu - View commit details
-
Copy full SHA for d266e4e - Browse repository at this point
Copy the full SHA d266e4eView commit details -
Deprecate/remove Generator::get_externs_map() and friends (halide#6844)
* Deprecate/remove Generator::get_externs_map() and friends This is a feature of Generator that was added years ago to allow adding external code libraries in LLVM bitcode form (rather than simply as extern "C" or similar). In theory it allow for better codegen for external code modules (since LLVM has access to all the bitcode for optimization); in practice, we only know of one project that ever used it, and that project no longer exists. Additionally, it tended to be fairly flaky in terms of actual use -- e.g., missing symbols tended to crop up unpredictably. The issues with this feature are likely fixable, but since it hasn't (AFAICT) been used in ~years, we're better off deprecating it for Halide 15 and removing for Halide 16. (If anyone out there is still relying on this feature, obviously you should speak up ASAP.) * Also remove ExternalCode.h & friends * Also remove correctness/external_code.cpp * HALIDE_ALLOW_GENERATOR_EXTERNS_MAP -> HALIDE_ALLOW_GENERATOR_EXTERNAL_CODE
Configuration menu - View commit details
-
Copy full SHA for 708a320 - Browse repository at this point
Copy the full SHA 708a320View commit details
Commits on Jul 12, 2022
-
Add placeholder code for bfloat16 in Python (halide#6849) (halide#6850)
* Add placeholder code for bfloat16 in Python (halide#6849) This is a no-op change; I just want to mark the place(s) in the Python bindings that need attention if/when it becomes possible to support bfloat16 in Python buffers. * Update PyBinaryOperators.h
Configuration menu - View commit details
-
Copy full SHA for 13a43c0 - Browse repository at this point
Copy the full SHA 13a43c0View commit details
Commits on Jul 13, 2022
-
Configuration menu - View commit details
-
Copy full SHA for bdd7114 - Browse repository at this point
Copy the full SHA bdd7114View commit details
Commits on Jul 14, 2022
-
Add autoscheduling to the generator_aot_stubuser test (halide#6855)
* Add autoscheduling to the generator_aot_stubuser test * fix test_apps * fix test_apps, again
Configuration menu - View commit details
-
Copy full SHA for f9c2cdf - Browse repository at this point
Copy the full SHA f9c2cdfView commit details
Commits on Jul 15, 2022
-
Silence Adams2019 Autoscheduler (halide#6854)
* Make aslog() a proper ostream * Ensure that all `dump()` calls take and use an ostream * Progress Bar only draws as LogLevel >= 1 * clang-format * Rework all aslog(0) statements * Update ASLog.cpp * syntax * Update ASLog.cpp * Revert fancy aslog stuff * Update ASLog.h * trigger buildbots
Configuration menu - View commit details
-
Copy full SHA for 24913eb - Browse repository at this point
Copy the full SHA 24913ebView commit details -
Rework autoscheduler API (halide#6788) (halide#6838)
* Rework autoschduler API (halide#6788) * Oops * Update test_function_dag.cpp * clang-tidy * trigger buildbots * Update Generator.h * Minor cleanups * Update README_cmake.md * Check for malformed autoscheduler_params dicts * Add alias-with-autoscheduler code, plus tweaks * Update stubtest_jittest.cpp * Update Makefile * trigger buildbots * fixes * Update AbstractGenerator.cpp * Update stubtest_generator.cpp * Update Makefile * Add deprecation warning for HALIDE_ALLOW_LEGACY_AUTOSCHEDULER_API * Make AutoschedulerParams a real struct * clang-tidy
Configuration menu - View commit details
-
Copy full SHA for b1ca334 - Browse repository at this point
Copy the full SHA b1ca334View commit details -
[vulkan phase0] Add adts for containers and memory allocation to runt…
…ime (halide#6829) * Cherry pick runtime internals as standalone commit (preparation work for Vulkan runtime) * Clang format/tidy fixes * Fix runtime test linkage and include paths to not include libHalide * Update test/runtime/CMakeLists.txt Fix typo mismatch for HALIDE_VERSION_PATCH Co-authored-by: Alex Reinking <reinking@google.com> * Add compiler id guard to build options for runtime tests * Avoid building runtime tests on MSVC since Halide runtime headers are not MS compatible Remove CLANG warning flag for runtime test * Change runtime test compile definitions to be PRIVATE. Remove PUBLIC_EXPORTS from runtime test definition. * Add comment about GNU warnings for 'no-builtin-declaration-mismatch' * Change to debug(user_context) for debug messages where context is valid. Wrap verbose debugging with DEBUG_RUNTIME ifdef. Syle pass based on review comments. * Add note explaining why we disable the internal runtime tests on MSVC. * Cleanup cmake logic for disabling runtime internal tests for MSVC and add a status message. * Don't use strncpy for prepend since some implementations may insert a null char regardless of the length used * Workaround varying platform str implementations and handle termination directly. * Clang Tidy/Format pass Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Alex Reinking <reinking@google.com>
Configuration menu - View commit details
-
Copy full SHA for 2d907c4 - Browse repository at this point
Copy the full SHA 2d907c4View commit details
Commits on Jul 20, 2022
-
Promote Reinterpret Intrinsic into an Reinterpret IR Node (halide#6853)
* Promote Reinterpret Intrinsic into an Reinterpret IR Node As discussed in halide#6801 (comment) I don't think this is complete, there are likely a few more places that need to be taught about it still, altough i think this is mostly it. Note that this only promotes the intrinsic, this does not adjust it's handling, as hinted in: halide#6801 (comment) * Silence buildbot warning * Speculative fix for Codegen C failure? * Restore comment * Delete obsolete FIXME * RegionCost: reinterpret is free * LICM: actually adjust the comment
Configuration menu - View commit details
-
Copy full SHA for 359026a - Browse repository at this point
Copy the full SHA 359026aView commit details -
Python source reorg (halide#6867)
* Move python binding sources to src/halide/halide_ * Rename native module to halide_ * Fix tests * Avoid copying Python sources * Fix installation rules * Make diff smaller * trigger buildbots * Add issue todo Co-authored-by: Steven Johnson <srj@google.com>
Configuration menu - View commit details
-
Copy full SHA for 51c06b7 - Browse repository at this point
Copy the full SHA 51c06b7View commit details -
Fix simd_op_check for top-of-tree LLVM (halide#6874)
* Fix simd_op_check for top-of-tree LLVM * clang-format
Configuration menu - View commit details
-
Copy full SHA for 967c3bf - Browse repository at this point
Copy the full SHA 967c3bfView commit details
Commits on Jul 21, 2022
-
Use pmaddubsw 8-bit horizontal widening adds (Fixes halide#6859) (hal…
…ide#6873) * use pmaddubsw 8-bit horizontal widening adds * add SSE3 versions too * add pmaddubsw tests
Configuration menu - View commit details
-
Copy full SHA for 9a94756 - Browse repository at this point
Copy the full SHA 9a94756View commit details -
[Codegen_LLVM] Radically simplify
visit(const Reinterpret *op)
(hal……ide#6865) 1. LLVM IR `bitcast` happily bitcasts between vectors and scalars: https://godbolt.org/z/9zqx11rna 2. `ptrtoint` already implicitly truncates/zero-extends if the int is larger than the pointer type: https://llvm.org/docs/LangRef.html#ptrtoint-to-instruction 3. `inttoptr` already implicitly truncates/zero-extends if the int is larger than the pointer type: https://llvm.org/docs/LangRef.html#inttoptr-to-instruction So we don't need to do any of that 'special' handling.
Configuration menu - View commit details
-
Copy full SHA for 8b5486b - Browse repository at this point
Copy the full SHA 8b5486bView commit details -
[Codegen] Fail to codegen
Call::undef
, just like `Call::signed_inte……ger_overflow` (halide#6871) See discussion in halide#6866. It's not obvious if that codepath is ever hit, let's optimistically assume that it is not. If this turns out to be not true, we'll have to deal with a more complicated question of the proper lowering for it, can it be `poison`, or must it be a `freeze poison`.
Configuration menu - View commit details
-
Copy full SHA for 04c465b - Browse repository at this point
Copy the full SHA 04c465bView commit details -
Fix error in Makefile for Adams2019 on OSX (halide#6877)
We erroneously link in the dylib and also dynamically load it, causing an error. We should skip the linkage and always load dynamically..
Configuration menu - View commit details
-
Copy full SHA for 06fcf94 - Browse repository at this point
Copy the full SHA 06fcf94View commit details -
Refactor/cleanup in Autoscheduler code (halide#6858)
* Move ASLog.cpp/.h to common/ * Add trivial Parsing utility & use it * Update ParamParser.h * fixes * fixes
Configuration menu - View commit details
-
Copy full SHA for c904c53 - Browse repository at this point
Copy the full SHA c904c53View commit details
Commits on Jul 22, 2022
-
Ensure $CMAKE_{lang}_OUTPUT_EXTENSION is set before using it (halide#…
…6879) Ensure CMAKE_{lang}_OUTPUT_EXTENSION is set before using it Co-authored-by: Shoaib Kamil <kamil@adobe.com>
Configuration menu - View commit details
-
Copy full SHA for 4770495 - Browse repository at this point
Copy the full SHA 4770495View commit details -
halide#6863 - Fixes to make address sanitizer happy for internal runt…
…ime classes (halide#6880) * Fixes to make address sanitizer happy. Fixed initialization defects in StringStorage that could cause buffer overruns Fixed memory leaks within RegionAllocator and BlockAllocator Added system memory allocation tracking to all internal runtime tests. * Clang Tidy / Format pass * Fix formatting to use braces around if statements Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com>
Configuration menu - View commit details
-
Copy full SHA for 11a049c - Browse repository at this point
Copy the full SHA 11a049cView commit details
Commits on Jul 25, 2022
-
[Codegen_LLVM] Define all the things (halide#6866)
Long-term plan for LLVM is to get rid of `undef`, and replace it with zero-initialization, err, `poison`, because it has nicer semantics. Everywhere we use `undef` as a placeholder in shuffle (be it either for a second operand, or undef shuffle mask element), or as a base 'empty' vector we are about to fully override via insertelement, we can just switch those to poison nowadays. The scary part is the `Call::undef` semantics/lowering, perhaps it will need to be `freeze poison`.
Configuration menu - View commit details
-
Copy full SHA for 5e69ad9 - Browse repository at this point
Copy the full SHA 5e69ad9View commit details -
Add set-host-dirty/copy-to-host to PythonExtensionGen (halide#6869)
* Add set-host-dirty/copy-to-host to PythonExtensionGen See halide#6868: Python Buffers are host-memory-only, so if the AOT-compiled halide code runs on (say) GPU, it may fail to copy the inputs to device and/or the results back to host. This fixes that. (We still need a solution that allows for lazy copies, but that will require adding another protocol that supports it.) * Update PythonExtensionGen.cpp
Configuration menu - View commit details
-
Copy full SHA for 7821212 - Browse repository at this point
Copy the full SHA 7821212View commit details
Commits on Jul 27, 2022
-
Rewrite PythonExtensionGen to be C++ based (halide#6888)
* Rewrite PythonExtensionGen to be C++ based This is intended as an alternative to halide#6885 -- this is even *more* gratuitous, but: - We have ~always compiled Python extensions using C++ anyway - This code is arguably terser, cleaner, and safer (the cleanups happen via dtors) - The code size difference is negligible (~300 bytes out of 160k for addconstant.cpython-39-darwin.so) * Update PythonExtensionGen.cpp
Configuration menu - View commit details
-
Copy full SHA for e3e169d - Browse repository at this point
Copy the full SHA e3e169dView commit details -
Configuration menu - View commit details
-
Copy full SHA for c8b811a - Browse repository at this point
Copy the full SHA c8b811aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 3859b36 - Browse repository at this point
Copy the full SHA 3859b36View commit details -
Remove (most) of the env var usage from Adams2019 (halide#6861)
* Move ASLog.cpp/.h to common/ * Add trivial Parsing utility & use it * Update ParamParser.h * fixes * wip * fixes * Fixes * clang-format * Update Makefile * Remove may_subtile * Update Cache.cpp * Update Cache.cpp * Update AutoSchedule.cpp * Update AutoSchedule.cpp
Configuration menu - View commit details
-
Copy full SHA for b9a3356 - Browse repository at this point
Copy the full SHA b9a3356View commit details
Commits on Jul 29, 2022
-
[vulkan phase1] Add SPIR-V IR (halide#6882)
* Import SPIRV-IR from personal branch * Refactor SPIR-V IR into separate header / source files. * Refactory SPIR-V factory methods. Fix SPIR-V interface library and header paths. Add SPIR-V internal test. * Hookup internal SPIRV IR test * Fixes and cleanups to address PR halide#6882 Refactor logic of SPIR-V dependency to make fetch dependecy optional Change SPIR-V fetch dependency to avoid building and just populate contents Change SPIR-V internal test to always link against method ... only enabled if WITH_SPIRV is defined Add missing SPIRV target feature * Update src/CMakeLists.txt Co-authored-by: Alex Reinking <reinking@google.com> * Add missing iostream header when WITH_SPIRV is undefined * Fix declaration ordering for TARGET_SPIRV option so that dependencies get triggered * Turn on FETCH_SPIRV_HEADERS by default to get build to pass for now * Fix path finding logic for SPIR-V header path from populated fetch dependency * Revert back to Halide_SPIRV target name * Don't use imported interface for SPIR-V. Use Halide_SPIRV naming since target is defined before Halide itself. * Add local copy of SPIR-V header file, along with license and readme. Update CMake rules to use local include path by default. * Make SPIR-V include path a system path to avoid clang format/tidy processing * Remove SpirvIR.h header file from being included with Halide.h (since it's only used internally for CodeGen) * Add ./dependencies/spirv to clang format ignore file * Add comment about *not* including internally used headers like SpirvIR.h * Refactor is_defined() asserts into check_defined() for reuse * Add comment to SpirvIR.h header clarifying this file should not be exported. Fix formatting to avoid single line if statements. Use reserve for constructing vector components * Rename hash_* methods to make_*_key methods (since they construct a key and don't actually hash the value) Fix typo on components * Clang format/tidy pass * Fix formatting for more single-line if statements * Disable TARGET_SPIRV by default for now Co-authored-by: Derek Gerstmann <dgerstmann@adobe.com> Co-authored-by: Alex Reinking <reinking@google.com>
Configuration menu - View commit details
-
Copy full SHA for 9c25902 - Browse repository at this point
Copy the full SHA 9c25902View commit details -
Add
auto_schedule
label to Adams2019 and Li2018 tests in CMake (hal……ide#6898) * Add `auto_schedule` label to Adams2019 and Li2018 tests in CMake These were ~never getting tested on the buildbots (and still aren't, I need to update it to run `auto_schedule` tests) but conceptually these tests should be in the same group as for Mullapudi. Also, drive-by fix to broken test_apps_autoscheduler injected in halide#6861. * trigger buildbots
Configuration menu - View commit details
-
Copy full SHA for 6cc77b2 - Browse repository at this point
Copy the full SHA 6cc77b2View commit details
Commits on Aug 1, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 0739045 - Browse repository at this point
Copy the full SHA 0739045View commit details -
[Codegen_LLVM] Annotate LLVM IR functions with
nounwind
/`mustprogre……ss` attributes (halide#6897) My reasoning is as follows, please correct me if i'm wrong: 1. Halide-generated code never throws exceptions 2. Halide-generated code always `call`s (as opposed to `invoke`s) the functions, there is no exception-safety RAII 3. Halide loops are meant to have finite number of iterations, they aren't meant to be endless and side-effect free 4. Halide (IR) assertions *might* abort. 5. Likewise, external callees *might* abort. (???) Therefore, when not in presence of external calls, it is obvious that (1) no exception will be unwinded out of the halide-generated function, (2) none of the loops will end up being endless with no observable side-effects. ... which is the semantics that is being stated by the LLVM IR function attributes `nounwind`+`mustprogress`. I'm less clear as to what are the prerequisites on the behavior of the external callees, but i do believe that they must also at least not unwind. I guess they are also at least required to either return or abort eventually.
Configuration menu - View commit details
-
Copy full SHA for e03b0e0 - Browse repository at this point
Copy the full SHA e03b0e0View commit details -
Don't try to fold saturating_sub of VectorReduce (halide#6896)
don't fold saturating_sub of VectorReduce
Configuration menu - View commit details
-
Copy full SHA for e35654b - Browse repository at this point
Copy the full SHA e35654bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 703a738 - Browse repository at this point
Copy the full SHA 703a738View commit details -
Allow AMX instructions with K dimension larger than 4 bytes (halide#6582
) * recognize the patterns used for the RHS matrix * make 1d tile matcher more robust * put getting rhs tile's index into a separate func * expand the tests used in correctness check * add exclamation mark * remove unused vars * run format and tidy * check for null before using IR in the next step * check if the broadcast was found * llvm below 13 is no longer supported * replace single pattern with commutative permutations * check if the stride is an `IntImm`, otherwise reject pattern * apply clang-format-13 * rename wild_i32 -> v2 * check if v1 could be the stride value * add more detail to a receiving a bad type * added short explanation of the right-hand matrix layout * added explanation for where the 4 comes from * provide further documentation as to the layout of AMX * add comments for expected patterns to get_3d_rhs_tile_index * Document the matched pattern Co-authored-by: Steven Johnson <srj@google.com>
Configuration menu - View commit details
-
Copy full SHA for 8871404 - Browse repository at this point
Copy the full SHA 8871404View commit details
Commits on Aug 2, 2022
-
Fix autoscheduling trivial lut wrappers (halide#6905)
* Fix autoscheduling trivial lut wrappers Fixes halide#6899 * trigger buildbots Co-authored-by: Steven Johnson <srj@google.com>
Configuration menu - View commit details
-
Copy full SHA for 2239119 - Browse repository at this point
Copy the full SHA 2239119View commit details -
Fix broken Makefile rules for autoschedulers on OSX (halide#6906)
* Fix broken Makefile rules for autoschedulers on OSX A few issues here: - Make was building the plugins as .dylib on OSX, but they should have been .so to match Linux (and just on general principles) - On OSX, explicitly linking libHalide.dylib into a plugin means that it will load its own copy of libHalide, which is bad, because it means the plugin doesn't share the same set of globals. We need to omit that explicit dependency and allow it to just find the exported symbols at load time. - Add a test to verify the fix; run it everywhere even though it should only have been failing for Make-build OSX builds. Finally, let me add that we really need to set a sunset date for supporting Make in Halide. The Makefiles aren't really maintained properly anymore, and when something subtle goes wrong, it takes an unreasonable amount of time to debug for something that is no longer our canonical build tool. * Use order-only prerequisites * Remove new load_plugin.cpp test Not worth the complexity for the extra test coverage.
Configuration menu - View commit details
-
Copy full SHA for dd391e6 - Browse repository at this point
Copy the full SHA dd391e6View commit details -
Start developing pip package (halide#6886)
Co-authored-by: Lukas Trümper <lukas.truemper@outlook.de>
Configuration menu - View commit details
-
Copy full SHA for 88e7229 - Browse repository at this point
Copy the full SHA 88e7229View commit details
Commits on Aug 3, 2022
-
LICENSE.txt: Include full text of Apache 2.0 license (not just the 'h…
…eader' version) (halide#6912)
Configuration menu - View commit details
-
Copy full SHA for 0072946 - Browse repository at this point
Copy the full SHA 0072946View commit details -
Configuration menu - View commit details
-
Copy full SHA for a893d5e - Browse repository at this point
Copy the full SHA a893d5eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 857b045 - Browse repository at this point
Copy the full SHA 857b045View commit details
Commits on Aug 4, 2022
-
Configuration menu - View commit details
-
Copy full SHA for cc44ee5 - Browse repository at this point
Copy the full SHA cc44ee5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3a04fc0 - Browse repository at this point
Copy the full SHA 3a04fc0View commit details -
Fix two warnings found with clang 16 (halide#6918)
- variable 'count' set but not used - warning: use of bitwise '|' with boolean operands
Configuration menu - View commit details
-
Copy full SHA for ffa2c36 - Browse repository at this point
Copy the full SHA ffa2c36View commit details -
Fix bug when realize condition depends on tuple call (halide#6915)
If the realization is tuple-valued, and the condition on the realization uses a tuple call (index != 0), then the condition wasn't getting resolved during the split_tuples pass. The cause was a missing mutate call.
Configuration menu - View commit details
-
Copy full SHA for 256c4d9 - Browse repository at this point
Copy the full SHA 256c4d9View commit details
Commits on Aug 5, 2022
-
Fix wrong install path for *.py files (halide#6921)
* Fix wrong install path for *.py files We were looking in a nonexistent dir, so we never copied `__init__.py` as we should have. * Update CMakeLists.txt
Configuration menu - View commit details
-
Copy full SHA for 9ca7560 - Browse repository at this point
Copy the full SHA 9ca7560View commit details
Commits on Aug 8, 2022
-
Make use of CMake 3.22 features (halide#6919)
* Remove AddCudaToTarget.cmake * Remove MakeShellPath.cmake * Use CheckLinkerFlag in TargetExportScript * Use DEPFILE for all generators * Use REQUIRED with find_program, where applicable * Use REQUIRED with find_library, where applicable * Use CMake 3.21 cache behavior in HalideTargetHelpers.cmake * Replace uses of get_filename_component with cmake_path * Rework BLAS detection in linear_algebra app * Drive-by: fix autotune_loop.sh install rule. * Fix CBLAS header in linear_algebra test_halide_blas
Configuration menu - View commit details
-
Copy full SHA for 8794fac - Browse repository at this point
Copy the full SHA 8794facView commit details -
Make saturating_cast an intrinsic (halide#6900)
* Make saturating_cast an intrinsic * handle saturating_cast in Bounds.cpp + add bounds tests * update saturating_cast CodeGen * with_lanes should work on intrinsics as well * lift to saturating_cast in FindIntrinsics * update intrinsics test for u16_sat * better sat_cast(widen(expr)) handling in find_intrinsics * simplify bounds of saturating_cast + update is_monotonic
Configuration menu - View commit details
-
Copy full SHA for 1bf1599 - Browse repository at this point
Copy the full SHA 1bf1599View commit details
Commits on Aug 9, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 8981861 - Browse repository at this point
Copy the full SHA 8981861View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3e8403a - Browse repository at this point
Copy the full SHA 3e8403aView commit details
Commits on Aug 10, 2022
-
Halide::Error should not extend std::runtime_error (halide#6927)
* Halide::Error should not extend std::runtime_error Unfortunately, the std error/exception classes aren't marked for DLLEXPORT under MSVC; we need our Error classes to be DLLEXPORT for libHalide (and python bindings). The current situation basically causes MSVC to generator another version of `std::runtime_error` marked for DLLEXPORT, which can lead to ODR violations, which are bad. AFAICT we don't really rely on this inheritance anywhere, so this just eliminates the inheritance entirely. (Note that I can't point to a specific malfunction resulting from this, but casual googling based on the many warnings MSVC emits about the current situation has me convinced that it needs addressing.) * noexcept
Configuration menu - View commit details
-
Copy full SHA for 92de4a1 - Browse repository at this point
Copy the full SHA 92de4a1View commit details -
Rework internal PYTHONPATH maintenance (halide#6922)
* Rework PYTHONPATH * Move pure-Python file copying logic to build time. * Use TARGET_RUNTIME_DLLS to copy all DLLs instead of just Halide. * Ensure that the last path component for Halide_Python is always `halide` * Simplify __init__.py now that it's copied to build tree * Add helper to de-duplicate PYTHONPATH test logic Fixes halide#6870 Co-authored-by: Alex Reinking <alex.reinking@gmail.com> Co-authored-by: Alex Reinking <reinking@google.com>
Configuration menu - View commit details
-
Copy full SHA for 43e6a26 - Browse repository at this point
Copy the full SHA 43e6a26View commit details
Commits on Aug 11, 2022
-
Tutorial 10 needs to be skipped for Python when targeting Wasm (just …
…as non-Python does) (halide#6932) * Tutorial 10 needs to be skipped for Python when targeting Wasm (just as non-Python does) * fixes * Update CMakeLists.txt
Configuration menu - View commit details
-
Copy full SHA for 4cdc2a1 - Browse repository at this point
Copy the full SHA 4cdc2a1View commit details -
Configuration menu - View commit details
-
Copy full SHA for b734957 - Browse repository at this point
Copy the full SHA b734957View commit details -
Add ASAN support to CMake via toolchain file (halide#6920)
Add ASAN support Co-authored-by: Alex Reinking <reinking@google.com>
Configuration menu - View commit details
-
Copy full SHA for 5e8f97b - Browse repository at this point
Copy the full SHA 5e8f97bView commit details
Commits on Aug 12, 2022
-
Configuration menu - View commit details
-
Copy full SHA for f60a8fb - Browse repository at this point
Copy the full SHA f60a8fbView commit details
Commits on Aug 14, 2022
-
Add minimal useful implementation of extracting and concatenating bits (
halide#6928) * Minimal approach to making Deinterleave correct for Reinterpret * Add minimal useful implementation of extracting and concatenating bits * clang-tidy * More clang-tidy fixes * Add missing error message * Add low-bit-depth noise test * Add test to cmake build * Fix power-of-two check * Remove dead object * Add little-endian comment to reinterpret IR node * Simplify concat_bits of single arg * Add missing second arg * Fix concat_bits call Co-authored-by: Andrew Adams <anadams@adobe.com>
Configuration menu - View commit details
-
Copy full SHA for 52b91a4 - Browse repository at this point
Copy the full SHA 52b91a4View commit details
Commits on Aug 15, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 6798467 - Browse repository at this point
Copy the full SHA 6798467View commit details
Commits on Aug 19, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 2ce991a - Browse repository at this point
Copy the full SHA 2ce991aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 510ad6e - Browse repository at this point
Copy the full SHA 510ad6eView commit details
Commits on Aug 22, 2022
-
fixed merge issue that omitted the PyEvictionKey.cpp from makefile
Petter Larsson committedAug 22, 2022 Configuration menu - View commit details
-
Copy full SHA for bac2a50 - Browse repository at this point
Copy the full SHA bac2a50View commit details