Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[x86] Separate vector instruction selection and CodeGen passes #6884

Open
wants to merge 65 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
85e8c69
create VectorIntrininsic node
rootjalex Jul 25, 2022
f0931c6
update IRMatch for VectorIntrinsic node
rootjalex Jul 25, 2022
ac5b6f2
implement optimize_x86_instructions
rootjalex Jul 25, 2022
09193f4
fix typo
rootjalex Jul 25, 2022
24f74a9
clang-format
rootjalex Jul 25, 2022
58ff01b
add VectorIntrinsic comment
rootjalex Jul 25, 2022
0d30b56
format
rootjalex Jul 25, 2022
e9029a2
add missing horizontal_add x86Intrinsics
rootjalex Jul 26, 2022
9d2deb4
fix bfloat16 abs issue
rootjalex Jul 26, 2022
1a51b83
fix unhandled bitwise_or in IRMatch.h
rootjalex Jul 26, 2022
614b7ea
missing paren
rootjalex Jul 26, 2022
a5b7e72
fix buildbot failures (I hope?)
rootjalex Jul 26, 2022
c58f85e
clang-format
rootjalex Jul 26, 2022
c9efd32
add empty Expr return to Deinterleaver::visic(const VectorIntrinsic*)
rootjalex Jul 26, 2022
0e94961
fix horizontal_add references
rootjalex Jul 26, 2022
fb538e3
fix bfloat16 abs issue (again)
rootjalex Jul 26, 2022
c2a6175
fix instruction selection location
rootjalex Jul 26, 2022
78edb81
clang format
rootjalex Jul 26, 2022
53c560b
fix virtual function hidden error
rootjalex Jul 26, 2022
2cfc0c1
fix absd codegen bug
rootjalex Jul 26, 2022
0675e86
attempt to fix x86 vector-reduction splitting
rootjalex Jul 26, 2022
6471226
clang tidy
rootjalex Jul 26, 2022
fb82166
fix MSVC templating bug
rootjalex Jul 26, 2022
f092606
implement Andrew's requested changes
rootjalex Jul 27, 2022
d660816
Merge branch 'main' of github.com:halide/Halide into rootjalex/x86-op…
rootjalex Jul 27, 2022
339b6b7
undef -> poison
rootjalex Jul 27, 2022
17c9924
fully remove saturating_pmulhrs
rootjalex Jul 27, 2022
6c74a63
clang format
rootjalex Jul 27, 2022
11690d7
disable UB for VectorInstruction node
rootjalex Jul 28, 2022
3648ca6
implement a base class for instruction selection
rootjalex Jul 28, 2022
870af00
fix merge conflict + implement psadbw
rootjalex Jul 28, 2022
c21bec5
clang format
rootjalex Jul 28, 2022
e6502f8
fix last remnants of vector intrinsic -> vector instruction renaming
rootjalex Jul 28, 2022
6d2bfd1
fix virtual func hidden error
rootjalex Jul 28, 2022
fa2d4e2
remove 'implement VI visitor' error msg
rootjalex Jul 28, 2022
ec2cd4e
address nits
rootjalex Jul 28, 2022
0e5cfcf
temporary HVX/CSE fix
rootjalex Jul 28, 2022
b3b3551
fix case without WITH_X86
rootjalex Jul 28, 2022
40f575c
fix x86 saturating_narrow pattern mistake
rootjalex Jul 29, 2022
545fbe8
lower mod in InstructionSelector too
rootjalex Aug 2, 2022
cd0fe8a
clang format
rootjalex Aug 2, 2022
6e67ddf
implement pattern matching for SapphireRapids
rootjalex Aug 8, 2022
7cc3b64
Merge branch 'main' of github.com:halide/Halide into rootjalex/x86-op…
rootjalex Aug 8, 2022
9da91da
Merge branch 'rootjalex/x86-optimize' of github.com:halide/Halide int…
rootjalex Aug 8, 2022
19b2c5e
rm stray 'protected'
rootjalex Aug 8, 2022
258b72c
merge conflict
rootjalex Aug 8, 2022
a98f268
update x86 saturating_cast rules using intrinsic
rootjalex Aug 8, 2022
22d17e7
fix namespace issue
rootjalex Aug 8, 2022
bbfefd2
Merge branch 'main' of github.com:halide/Halide into rootjalex/x86-op…
rootjalex Aug 22, 2022
e2045bf
place Expr constants on the stack
rootjalex Aug 23, 2022
dc4d1f7
i8 -> u8 bugfix
rootjalex Aug 23, 2022
9a5327c
add better type checking in IRMatch for SpecificExpr cases
rootjalex Aug 24, 2022
292d8e5
clang format
rootjalex Aug 24, 2022
5258627
Merge branch 'main' of github.com:halide/Halide into rootjalex/x86-op…
rootjalex Aug 24, 2022
3b0dc43
missing &&
rootjalex Aug 24, 2022
1eb0e94
clang format
rootjalex Aug 24, 2022
f6eb2bf
update SpecificExpr comment + remove dangling TODO comments
rootjalex Aug 24, 2022
a57e022
Merge branch 'main' of github.com:halide/Halide into rootjalex/x86-op…
rootjalex Sep 1, 2022
6a342ba
Merge branch 'main' of github.com:halide/Halide into rootjalex/x86-op…
rootjalex Sep 8, 2022
95e5070
fix signed absd lowering on x86
rootjalex Sep 12, 2022
7aaf9a7
add type assertion to Optimize_X86::mutate
rootjalex Sep 13, 2022
b4e2e42
use shuffle for deinterleave on VectorInstruction
rootjalex Sep 13, 2022
15e1a8c
do not try to extract when a vector is a simple extract_element
rootjalex Sep 13, 2022
cc2fac7
don't call 'simplify' in deinterleave on extract_lane
rootjalex Sep 13, 2022
f36d75d
resolve merge conflicts
rootjalex Sep 13, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -481,6 +481,7 @@ SOURCE_FILES = \
InjectHostDevBufferCopies.cpp \
Inline.cpp \
InlineReductions.cpp \
InstructionSelector.cpp \
IntegerDivisionTable.cpp \
Interval.cpp \
Introspection.cpp \
Expand Down Expand Up @@ -579,7 +580,8 @@ SOURCE_FILES = \
Var.cpp \
VectorizeLoops.cpp \
WasmExecutor.cpp \
WrapCalls.cpp
WrapCalls.cpp \
X86Optimize.cpp

# The externally-visible header files that go into making Halide.h.
# Don't include anything here that includes llvm headers.
Expand Down Expand Up @@ -662,6 +664,7 @@ HEADER_FILES = \
InjectHostDevBufferCopies.h \
Inline.h \
InlineReductions.h \
InstructionSelector.h \
IntegerDivisionTable.h \
Interval.h \
Introspection.h \
Expand Down Expand Up @@ -745,7 +748,8 @@ HEADER_FILES = \
Util.h \
Var.h \
VectorizeLoops.h \
WrapCalls.h
WrapCalls.h \
X86Optimize.h

OBJECTS = $(SOURCE_FILES:%.cpp=$(BUILD_DIR)/%.o)
HEADERS = $(HEADER_FILES:%.h=$(SRC_DIR)/%.h)
Expand Down
5 changes: 5 additions & 0 deletions src/Bounds.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1111,6 +1111,11 @@ class Bounds : public IRVisitor {
op->value.accept(this);
}

void visit(const VectorInstruction *op) override {
// TODO(rootjalex): we may need to implement bounds queries.
internal_error << "Unexpected VectorInstruction in bounds query: " << Expr(op) << "\n";
}

void visit(const Call *op) override {
TRACK_BOUNDS_INTERVAL;
TRACK_BOUNDS_INFO("name:", op->name);
Expand Down
4 changes: 4 additions & 0 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ set(HEADER_FILES
InjectHostDevBufferCopies.h
Inline.h
InlineReductions.h
InstructionSelector.h
IntegerDivisionTable.h
Interval.h
Introspection.h
Expand Down Expand Up @@ -166,6 +167,7 @@ set(HEADER_FILES
VectorizeLoops.h
WasmExecutor.h
WrapCalls.h
X86Optimize.h
)

set(SOURCE_FILES
Expand Down Expand Up @@ -245,6 +247,7 @@ set(SOURCE_FILES
InjectHostDevBufferCopies.cpp
Inline.cpp
InlineReductions.cpp
InstructionSelector.cpp
IntegerDivisionTable.cpp
Interval.cpp
Introspection.cpp
Expand Down Expand Up @@ -344,6 +347,7 @@ set(SOURCE_FILES
VectorizeLoops.cpp
WasmExecutor.cpp
WrapCalls.cpp
X86Optimize.cpp
)

##
Expand Down
5 changes: 5 additions & 0 deletions src/CodeGen_C.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2829,6 +2829,11 @@ Expr CodeGen_C::scalarize_vector_reduce(const VectorReduce *op) {
return Shuffle::make_concat(lanes);
}

void CodeGen_C::visit(const VectorInstruction *op) {
internal_error << "CodeGen_C should never receive a VectorInstruction, received:\n"
<< Expr(op) << "\n";
}

void CodeGen_C::visit(const VectorReduce *op) {
stream << get_indent() << "// Vector reduce: " << op->op << "\n";

Expand Down
1 change: 1 addition & 0 deletions src/CodeGen_C.h
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,7 @@ class CodeGen_C : public IRPrinter {
void visit(const Fork *) override;
void visit(const Acquire *) override;
void visit(const Atomic *) override;
void visit(const VectorInstruction *) override;
void visit(const VectorReduce *) override;

void visit_binop(Type t, const Expr &a, const Expr &b, const char *op);
Expand Down
34 changes: 19 additions & 15 deletions src/CodeGen_LLVM.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4022,11 +4022,16 @@ void CodeGen_LLVM::visit(const Shuffle *op) {
}
}

void CodeGen_LLVM::visit(const VectorInstruction *op) {
internal_error << "CodeGen_LLVM received VectorInstruction node, should be handled by architecture-specific CodeGen class:\n"
<< Expr(op) << "\n";
}

void CodeGen_LLVM::visit(const VectorReduce *op) {
codegen_vector_reduce(op, Expr());
}

void CodeGen_LLVM::codegen_vector_reduce(const VectorReduce *op, const Expr &init) {
Expr CodeGen_LLVM::split_vector_reduce(const VectorReduce *op, const Expr &init) const {
Expr val = op->value;
const int output_lanes = op->type.lanes();
const int native_lanes = native_vector_bits() / op->type.bits();
Expand Down Expand Up @@ -4066,8 +4071,7 @@ void CodeGen_LLVM::codegen_vector_reduce(const VectorReduce *op, const Expr &ini
equiv = max(equiv, init);
}
equiv = cast(op->type, equiv);
equiv.accept(this);
return;
return equiv;
}

if (op->type.is_bool() && op->op == VectorReduce::And) {
Expand All @@ -4078,8 +4082,7 @@ void CodeGen_LLVM::codegen_vector_reduce(const VectorReduce *op, const Expr &ini
if (init.defined()) {
equiv = min(equiv, init);
}
equiv.accept(this);
return;
return equiv;
}

if (elt == Float(16) && upgrade_type_for_arithmetic(elt) != elt) {
Expand All @@ -4089,8 +4092,7 @@ void CodeGen_LLVM::codegen_vector_reduce(const VectorReduce *op, const Expr &ini
equiv = binop(equiv, init);
}
equiv = cast(op->type, equiv);
equiv.accept(this);
return;
return equiv;
}

if (output_lanes == 1) {
Expand Down Expand Up @@ -4189,8 +4191,7 @@ void CodeGen_LLVM::codegen_vector_reduce(const VectorReduce *op, const Expr &ini
if (initial_value.defined()) {
equiv = binop(initial_value, equiv);
}
equiv.accept(this);
return;
return equiv;
}
}

Expand All @@ -4213,8 +4214,7 @@ void CodeGen_LLVM::codegen_vector_reduce(const VectorReduce *op, const Expr &ini
equiv = binop(equiv, init);
}
equiv = common_subexpression_elimination(equiv);
equiv.accept(this);
return;
return equiv;
}

if (factor > 2 && ((factor & 1) == 0)) {
Expand Down Expand Up @@ -4246,8 +4246,7 @@ void CodeGen_LLVM::codegen_vector_reduce(const VectorReduce *op, const Expr &ini
equiv = binop(equiv, init);
}
equiv = common_subexpression_elimination(equiv);
codegen(equiv);
return;
return equiv;
}

// Extract each slice and combine
Expand All @@ -4261,8 +4260,13 @@ void CodeGen_LLVM::codegen_vector_reduce(const VectorReduce *op, const Expr &ini
}
}
equiv = common_subexpression_elimination(equiv);
codegen(equiv);
} // namespace Internal
return equiv;
}

void CodeGen_LLVM::codegen_vector_reduce(const VectorReduce *op, const Expr &init) {
Expr equiv = split_vector_reduce(op, init);
equiv.accept(this);
}

void CodeGen_LLVM::visit(const Atomic *op) {
if (!op->mutex_name.empty()) {
Expand Down
14 changes: 14 additions & 0 deletions src/CodeGen_LLVM.h
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ struct ExternSignature;

namespace Internal {

class InstructionSelector;

/** A code generator abstract base class. Actual code generators
* (e.g. CodeGen_X86) inherit from this. This class is responsible
* for taking a Halide Stmt and producing llvm bitcode, machine
Expand Down Expand Up @@ -361,6 +363,7 @@ class CodeGen_LLVM : public IRVisitor {
void visit(const IfThenElse *) override;
void visit(const Evaluate *) override;
void visit(const Shuffle *) override;
void visit(const VectorInstruction *) override;
void visit(const VectorReduce *) override;
void visit(const Prefetch *) override;
void visit(const Atomic *) override;
Expand Down Expand Up @@ -514,6 +517,11 @@ class CodeGen_LLVM : public IRVisitor {
* across backends. */
virtual void codegen_vector_reduce(const VectorReduce *op, const Expr &init);

/** Split up a VectorReduce node if possible, or generate LLVM
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate on this comment, also nit: lines in the block comment should start with *

intrinsics for full reductions. This is used in
`codegen_vector_reduce`. **/
virtual Expr split_vector_reduce(const VectorReduce *op, const Expr &init) const;

/** Are we inside an atomic node that uses mutex locks?
This is used for detecting deadlocks from nested atomics & illegal vectorization. */
bool inside_atomic_mutex_node;
Expand Down Expand Up @@ -621,6 +629,12 @@ class CodeGen_LLVM : public IRVisitor {
* represents a unique struct type created by a closure or similar.
*/
std::map<llvm::Value *, llvm::Type *> struct_type_recovery;

/** Instruction selection uses `split_vector_reduce` and
* `upgrade_type_for_arithmetic`, so needs access to those
* methods.
*/
friend class InstructionSelector;
};

} // namespace Internal
Expand Down
Loading