There are two directories used by backends in Glow:
-
tools/ClassGen/Backends/: Each backend directory here contains new backend-specific Nodes and Instructions for the backends. If a backend provides its own backend-specific nodes/instructions, they should be included in NodeGen/InstrGen.
-
lib/Backends/: The implementation of the backend is contained here. This includes derived classes for
Backend
andCompiledFunction
.
All backends in Glow derive from the abstract base class
Backend
. There
are two pure virtual functions all backends must implement:
-
virtual std::unique_ptr<CompiledFunction> compile(Function *F, const Context &ctx) const;
- This function takes a
Function *F
to compile.Context &ctx
maps the graph to the concrete execution environment for a specific function. It should return a unique pointer to theCompiledFunction
ofF
. If the backend uses Glow low-level IR, it can callgenerateAndOptimizeIR()
to generate an optimizedIRFunction
.
- This function takes a
-
virtual bool isOpSupported(Kinded::Kind opKind, ElemKind elementTy) const;
- Returns whether the backend supports the given operation
opKind
with the givenElemKind elementTy
. For example, a backend may not support a specific bit-width quantization kind (e.g.Int16QTy
) at all, or may only support it for certain operations (e.g.ConvolutionNodeKind
). Any(opKind, elementTy)
pair passed in that returns true must be supported duringcompile()
.
- Returns whether the backend supports the given operation
Additionally, there are virtual functions that backends can override:
-
virtual bool transformPreLowering(Function *F, CompilationMode mode) const;
- Allow the backend to transform the
Function *F
before node lowering occurs, given someCompilationMode mode
. For example, a backend may prefer to replace a ConvolutionNode followed by a ReluNode with a backend-specific fused ConvReluNode. This should be done prior to node lowering, as otherwise the ReluNode will already be lowered to a MaxNode and may be transformed by other optimization passes. Returns true if the Function was modified at all. See below for more information.
- Allow the backend to transform the
-
virtual bool transformPostLowering(Function *F, CompilationMode mode) const;
- Allow the backend to transform the
Function *F
after node lowering occurs, given someCompilationMode mode
. For example, the CPU backend prefers to transform MaxNodes, which take a SplatNode as an input, into a backend-specific CPUMaxSplatNode, which takes a scalar value as a member input instead of a SplatNode. This should be done after node lowering, as ReluNodes are lowered into MaxNodes. See below for more information.
- Allow the backend to transform the
-
virtual bool shouldLower(const Node *N) const;
- Allow the backend to prevent lowering for some
Node *N
. For example, if a backend supports executing a FullyConnected operator, it would want to prevent lowering for it and provide a backend-specific Instruction for the FullyConnectedNode to be IRGen'd into. Note that IRGen for a Node can be specified via the ClassGenautoIRGen("NodeName")
call. See below for more information. Returns true ifN
should be lowered.
- Allow the backend to prevent lowering for some
-
virtual bool shouldShareBuffers() const;
- Allow the backend to disable the buffer sharing optimization. This may be prefered by backends which would like to do their own memory optimizations. Returns true by default.
-
virtual void save(Function *F, llvm::StringRef outputDir, llvm::StringRef networkName) const;
- Save a standalone executable
bundle, where the
provided
Function *F
is compiled and then saved tooutputDir
with main entry namenetworkName
.
- Save a standalone executable
bundle, where the
provided
CompiledFunction
is an abstract class that represents the result of
compilation of a Function
. Backends must implement their own derived class
from CompiledFunction
, which must be returned as a result of
Backend::compile()
. CompiledFunction
contains a single pure virtual function
that must be implemented: virtual void execute();
. This function is
responsible for copying inputs to the device from all input
Placeholders,
executing the function, and copying outputs back from the device to output
Placeholders. Thus after the function returns, all Placeholders for the outputs
of the function should have had their backing tensor updated.
Different backends may prefer to transform or optimize the graph differently for
their own specialized architecture. For example, Glow lowers ReLU down to a Max
node, taking as inputs the original tensor and a "Splat" tensor of matching
dimensions, filled with all 0
s. Glow's CPU JIT backend prefers to replace this
pattern -- a Max with a Splat input and another non-Splat input -- with a single
"CPUMaxSplat" operation that takes a scalar Splat value as input in place of an
entire Splat tensor.
Backends have the opportunity to perform their own analysis and transformations
before or after lowering depending on their requirements. This is exposed via
transformPreLowering()
and transformPostLowering()
hooks, during which a
backend can transform the graph however it desires. For example, the backend
could use transformPostLowering()
to search the graph looking for the above
CPUMaxSplat
pattern.
A backend may create its own custom Nodes and Instructions which it can insert
into the IR. This is done via ClassGen and included in
tools/ClassGen/NodeGen.cpp
. For example, the CPU Backend defines CPUMaxSplat
in tools/ClassGen/Backends/CPU/CPUSpecificNodes.h
:
BB.newBackendSpecificNode("CPUMaxSplat")
.addInput("Input")
.addResult("Input.getType()")
.addMember(MemberType::Float, "SplatValue")
.setDocstring("A Max node with one splat input; CPU specific.");
During transformPostLowering()
, this CPUMaxSplat
node replaces the
aforementioned pattern. However, there must be a corresponding instruction for
this Node to be lowered to during the IRGen phase. Thus, we need a corresponding
backend-specific CPUMaxSplat instruction, defined in
tools/ClassGen/Backends/CPU/CPUSpecificInstrs.h
:
BB.newBackendSpecificInstr("CPUMaxSplat")
.addOperand("Dest", OperandKind::Out)
.addOperand("Src", OperandKind::In)
.addMember(MemberType::Float, "SplatValue")
.inplaceOperand({"Dest", "Src"})
.dataParallel()
.autoIRGen();
These instructions will appear in the instruction stream sent to the CPU backend
JIT; its standard library has a kernel
for executing this CPUMaxSplat
instruction. You can see such instructions in
the LeNet MNIST example.
Note that backend-specific nodes and instructions can be treated just as any
other node or instruction defined in tools/ClassGen/NodeGen.cpp
or
tools/ClassGen/InstrGen.cpp
. For example, the CPUMaxSplat
instruction
definition includes the dataParallel()
property, allowing for data parallel
optimizations to take place.
The tools/ClassGen/Backends/CPU/CPUSpecificNodes.h
and
tools/ClassGen/Backends/CPU/CPUSpecificInstrs.h
files are included in
tools/ClassGen/NodeGen.cpp
and tools/ClassGen/InstrGen.cpp
, respectively.