-
Notifications
You must be signed in to change notification settings - Fork 81
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[RFC] Create LLVM scope class for use with LLVM libraries (#83)
* [RFC] Encapsulate LLVM target for use with LLVM libraries * Add the RFC number * Clarify the objective of the RFC Clarify that this RFC proposes a scope object that can be used later to save/restore LLVM's options. The saving and restoring itself is not a part of the RFC, since the `llvm` target in TVM does not yet allow passing LLVM command line flags. * Rename `LLVMTarget` to `LLVMScope` * Clarification about actual saving/restoring of llvm flags as future step There was another passage about it that was ommitted from previous commits. * Reflect the latest status of the discussion * Add descriptions for ParseIR and LoadIR
- Loading branch information
Krzysztof Parzyszek
authored
Jul 15, 2022
1 parent
0f30c94
commit 22d1d11
Showing
1 changed file
with
199 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,199 @@ | ||
- Feature Name: Create LLVM scope class for use with LLVM libraries | ||
- Start Date: May 13, 2022 | ||
- RFC PR: [apache/tvm-rfcs#0083](https://github.com/apache/tvm-rfcs/pull/83) | ||
- GitHub Issue: None | ||
|
||
# Summary | ||
|
||
1. Create an object `LLVMScope` whose lifetime determines the scope of | ||
availability of LLVM functions (except serializing/deserializing LLVM IR). | ||
2. Enapsulate all information related to a compilation target in LLVM into a | ||
single object `LLVMTarget`. | ||
|
||
This will allow extending the `llvm` target in TVM to contain LLVM command | ||
line flags. | ||
The `LLVMTarget` could them be used to save/restore LLVM's command line | ||
options based on the flags contained in the `llvm` target. | ||
|
||
# Motivation | ||
|
||
For more details, see [discussion](https://discuss.tvm.apache.org/t/modularizing-llvm-codegen-jit/12764) | ||
on discourse. | ||
|
||
The main issue with using statically linked LLVM libraries is that the LLVM | ||
code has, and depends on a global state. First of all, LLVM needs to be | ||
initialized by registering all targets before any use. Another (and most | ||
problematic) example of that are command line flags (implemented via `cl::opt` | ||
in LLVM). Many LLVM components use them to tune their behavior, provide | ||
debugging or tracing facilities, or simply as on/off switches. In LLVM | ||
sources they are global variables, and once set they maintain their values. | ||
|
||
Since TVM uses LLVM to generate code for multiple different targets, each | ||
specific code generator in TVM may want to use its own set of tuning flags | ||
without affecting code generation for other targets. Similarly, using debug | ||
flag to investigate an issue with a code generation should not affect | ||
unrelated uses of LLVM. | ||
|
||
Luckily, LLVM does provide an interface into the command option registry, | ||
which allows clients to query and set the values of these options. TVM | ||
could utilize this to set and restore LLVM's options for the duration of | ||
code generation for each target. This could be done by having a single | ||
"entry point" into LLVM, that each LLVM client would need to use. This | ||
RFC proposes a class that would serve as such entry point. | ||
|
||
Another consideration is the LLVM context (`llvm::LLVMContext`), which is | ||
a common source of a number of LLVM IR constructs (like types, or constants). | ||
LLVM context is required for creating LLVM IR, and its lifetime must be | ||
enough to contain the lifetimes of any LLVM IR (`llvm::Module` in particular). | ||
|
||
Uses of LLVM in TVM generally fall into two categories: (1) loading/writing | ||
LLVM IR (`llvm::Module`), and (2) target-specific functionality like | ||
optimization, or code generation. This RFC proposes two classes: | ||
1. `LLVMScope` class that would initialize LLVM as a whole, and maintain | ||
the LLVM context. | ||
2. `LLVMTarget` class that would be a unified bridge between the `llvm` | ||
target in TVM and target representation in LLVM, and eventually handle | ||
the saving and restoration of LLVM command line flags. | ||
|
||
# Guide-level explanation | ||
|
||
The idea of this RFC is to implement a common class `LLVMScope` that would | ||
manage LLVM intialization and the LLVM context. It would be able to create | ||
LLVM modules (`llvm::Module`)[1], but not be associated with any specific | ||
target. | ||
|
||
The target object `LLVMTarget` would require a scope object, and the lifetime | ||
of the scope object must entirely contain any target object. The target object | ||
would be a common location to access LLVM data structures associated with | ||
compilation target, e.g. target machine (`llvm::TargetMachine`), fast math | ||
flags (`llvm::FastMathFlags`), optimization level (`llvm::CodeGenOpt::Level`), | ||
and so on. Once LLVM flags are added to the `llvm` target, the `LLVMTarget` | ||
object would also save/restore the original values when necessary. | ||
|
||
A typical use would follow this pattern: | ||
```C++ | ||
{ | ||
// Initialize LLVM. | ||
LLVMScope llvm_scope; | ||
// Let's see the LLVM IR and MIR after each transformation, i.e. use | ||
// -print-after-all in codegen. | ||
my_target = Target("llvm -mtriple myarch-unknown-elf -llvm-options=print-after-all"); | ||
With<LLVMTarget> llvm_target(llvm_scope, my_target); | ||
// [...] | ||
// Some uses of llvm_target | ||
const llvm::Target& t = llvm_target.target_machine->getTarget(); | ||
std::cout << "name: " << t.getName() << "\n"; | ||
std::cout << "description: " << t.getShortDescription() << "\n"; | ||
// [...] | ||
// Create codegen | ||
auto cg = new CodeGenMyArch(); | ||
cg->Init(llvm_target); | ||
// add functions, optimize, save output, etc. | ||
// [...] | ||
// Done using LLVM. llvm_target's destructor does the cleanup. | ||
} | ||
``` | ||
|
||
[1] Unless indicated otherwise, the term "LLVM module" in the text of the RFC | ||
refers to `llvm::Module`. | ||
|
||
# Reference-level explanation | ||
|
||
## Design considerations | ||
|
||
One of the potential further developments could be loading LLVM support | ||
dynamically. Similarly to the saving of LLVM command line options, the call | ||
to dlopen could happen in the constructor of `LLVMScope`, and the call to | ||
dlclose in its destructor. | ||
This obviously precludes any uses of LLVM outside of the lifetime of the | ||
`LLVMScope` object, and making it so (or at least coming as close as | ||
possible) was one of the design goals. | ||
|
||
Another consideration is the extent of the impact of command line options | ||
in LLVM. Since they are represented as global variables, they are acccessible | ||
nearly anywhere in the LLVM code (including LLVM IR deserialization). | ||
To completely contain any uses of LLVM flags in the scope of saving/restoring | ||
their default values one would have to save them before making any calls to | ||
LLVM code. This is unfortunately impossible, since LLVM command line flags | ||
will eventually become an attribute of `llvm` target, which in certain cases | ||
can only be created once an LLVM module has been deserialized: LLVM modules | ||
store target string as metadata. | ||
|
||
Because of that, saving and restoring of LLVM flags will not apply to | ||
serialization or deserialization of LLVM IR. | ||
|
||
Another design consideration was not imposing any limitations on using LLVM, | ||
once the prerequisites were met. In particular, the programmer should be | ||
able to use any LLVM functions or data structures that were available to | ||
them before this proposal. | ||
|
||
## Implementation | ||
|
||
One of the more important structures in LLVM, in particular when dealing | ||
with LLVM IR, is `LLVMContext`. A LLVM module needs a context, but it does | ||
not own one. `LLVMContext` should be managed by `LLVMScope` (in principle, | ||
by anything that outlives the rest of LLVM's objects). | ||
|
||
At the minimum, the designed interface would contain: | ||
|
||
```C++ | ||
class LLVMScope { | ||
public: | ||
LLVMScope(); | ||
~LLVMScope(); | ||
|
||
std::shared_ptr<llvm::LLVMContext> GetContext() const { return ctx_; } | ||
|
||
// Assume the "llvm_ir" parameter contains serialized textual LLVM IR. | ||
// Parse the IR and return the resulting llvm::Module. | ||
std::unique_ptr<llvm::Module> ParseIR(const std::string& llvm_ir) const; | ||
// Load LLVM IR from file given by "file_name", and return the created | ||
// llvm::Module. The file can contain either the bitcode (i.e. "bc"), or | ||
// text (i.e. "ll"). | ||
std::unique_ptr<llvm::Module> LoadIR(const std::string& file_name) const; | ||
|
||
private: | ||
std::shared_ptr<llvm::LLVMContext> ctx_; | ||
}; | ||
``` | ||
Since the LLVM state is global, there should only be one `LLVMTarget` object | ||
live at any given time if it attempts to modify the state. There can be | ||
arbitrarily many of such objects live simultaneously as long as none of them | ||
modify the state. | ||
# Drawbacks | ||
There is no way to effectively enforce the creation of `LLVMScope` or | ||
`LLVMTarget` objects before using LLVM inside TVM. At the same time adding | ||
these objects to common code (e.g. `CodeGenLLVM`) should prevent accidental | ||
misuse of LLVM. | ||
# Rationale and alternatives | ||
Having `LLVMScope` as a prerequisite for using LLVM APIs was intended | ||
to allow its constructor/destructor serve as the "setup"/"cleanup" functions, | ||
similarly to Python's `__enter__` and `__exit__`. Following Python's `with` | ||
idiom was actually suggested by @tqchen on the discussion forum (thread | ||
linked above). | ||
An alternative way to ensure that LLVM is only used within a certain scope | ||
would be to implement a thin wrapper on top of LLVM, and make all of its APIs | ||
available as members of that wrapper. While adding all available functions | ||
from LLVM as members would be infasible, adding them on an as-needed basis | ||
could actually work. The main reason this approach was not taken is that | ||
one's experience with using LLVM in applications should be directly usable | ||
within TVM, without having to consider additional software layers. | ||
# Prior art | ||
# Unresolved questions | ||
# Future possibilities | ||
With static linking, there is no way to fully save/restore LLVM's global | ||
state. While command line options are the most pressing issue, if there is | ||
a need for further isolation, dynamic loading can be considered. This could | ||
be either loading LLVM libraries built into a shared object, or making the | ||
LLVM as a TVM backend into a shared library. | ||