[RFC] Create LLVM scope class for use with LLVM libraries #83

kparzysz-quic · 2022-06-27T22:15:46Z

Follow-up to https://discuss.tvm.apache.org/t/modularizing-llvm-codegen-jit/12764

kparzysz-quic · 2022-06-27T22:17:59Z

Rendered (updated to include 8a739f8).

tqchen · 2022-06-29T17:47:38Z

cc @junrushao1994

tqchen · 2022-06-29T17:52:30Z

rfcs/0080-llvm-target.md

+  // Let's see the LLVM IR and MIR after each transformation, i.e. use
+  // -print-after-all in codegen.
+  my_target = Target("llvm -mtriple myarch-unknown-elf -llvm-options=print-after-all");
+  LLVMTarget llvm_target(my_target);


thanks @kparzysz-quic . I wonder if there could be potential mis-use if a user created multiple LLVMTarget, since target is more like an config, rather than ensuring a scoping info.

A different name might help(e.g. something that implies scope), alternatively, we can also aligns to our current With convention, which clearly indicate scoping

With<LLVMTarget> llvm_scope(llvm_target);

I think renaming LLVMTarget to LLVMScope is probably technically a better approach at the moment, although personally I like With a bit better (since it resembles the Python idiom, both visually and functionally).

The problem is when we want to initialize the LLVM scope by deserializing llvm::Module (either from file, or from a string). Since the target string is stored in the module, the llvm::Module needs to be created first (which means that it would be created before the scope object itself). To hide the gap between creating llvm::Module, and creating the scope object, the LLVMTarget (using old naming) implements both steps in a single function call. This call then returns both, the llvm::Module and the scope object.

The issue with With in the above situation is that it doesn't apply very nicely to this case. The return type (currently named ModuleData (for a lack of a better idea), is

std::pair<std::unique_ptr<llvm::Module>, std::unique_ptr<LLVMTarget>>

With With, it would become

std::pair<std::unique_ptr<llvm::Module>, std::unique_ptr<With<LLVMTarget>>>

which separates the "with" from its scope. This is not formally wrong, but loses the visual impact of "with".

Clarify that this RFC proposes a scope object that can be used later to save/restore LLVM's options. The saving and restoring itself is not a part of the RFC, since the `llvm` target in TVM does not yet allow passing LLVM command line flags.

There was another passage about it that was ommitted from previous commits.

areusch

broadly LGTM, but let's let TQ/Junru and any others comment on this/approve as well. unifying the interface between TVM Target and LLVM options seems like a net win for code complexity. my one question, which i think is mainly open-ended now, is how we handle LLVM arch that have unique flags that, right now, would show up in TVM as target-specific llvm flags.

kparzysz-quic · 2022-07-05T20:01:11Z

my one question, which i think is mainly open-ended now, is how we handle LLVM arch that have unique flags that, right now, would show up in TVM as target-specific llvm flags.

I'd have to see a practical example of that to get an idea of what's expected. For Hexagon we'd want to automatically append certain flags (that enable auto-vectorization in LLVM, for example). I had one idea for that, but it may need a revision.

kparzysz-quic · 2022-07-06T15:37:25Z

@tqchen @junrushao1994 Do you have any additional comments?

tqchen · 2022-07-08T13:21:21Z

Thanks @kparzysz-quic . Sorry for the delayed reply since we are taking break here.

Overall I like the direction we are going. Just to figure out the spectrum of possible APIs

My main question is how are we going to interact with multiple ParseIR calls. Some example would be helpful. For example, is it OK for us to have nested LLVMScope

void Example() {
     LLVMScope scope1(target);
     {
          // what is the effect here, seems mod_data1 is immediate in scope
          auto mod_data1 = LLVMScope::ParseIR(name);
     }
}

I also wonder if there is a way to "defer" the scope initialization. e.g. can LLVMModule be created, stored in the LLVMScope, but the target does not take in-effect until we enter the scope. We call InitializeLLVM in the constructor of LLVMTarget, but do other things like option setting in the enter stage.Something like

void Example() {
     LLVMTarget target1(target);
     auto mod_data = LLVMTarget::ParseIR(name);
     // enter target1 scope
     With<LLVMTarget> scope1(target1);
     {
           // entering target in mod_data
          With<LLVMTarget> scope2(mod_data);
     }
}

I think the main question is how coupled the operations related to LLVM are.

kparzysz-quic · 2022-07-08T17:11:19Z

Both LoadIR or ParseIR take an optional pre-existing LLVMContext, and return a pair {llvm::Module, LLVMScope}, where the scope object in the pair is newly constructed, and contains the given LLVMContext (or create a new one if none was given). In the first call to ParseIR, the caller can take the second element of the pair and just use it as the scope. In a subsequent call, the caller can simply discard the second element, which will then be automatically destroyed.
An example of calling ParseIR or LoadIR when a LLVM scope already exists is in HandleImport in codegen_llvm.cc.

You bring up a good point about when the scope "takes effect", or binds to LLVM. I propose that the scope becomes active when a new LLVMContext is created (i.e. the scope is really determined by LLVMContext, not by the LLVMScope). We could also define "takes effect" to mean that locks (or failure-to-create object) could be acquired. That would mean that ParseIR and LoadIR return an active scope. Also, if they are called with a context argument (meaning that they are called from an active scope), they wouldn't need to create a new context, and so they wouldn't block (if we were to add locks). In such case they would return another scope object with the same LLVMContext. This shouldn't cause any problems even if both scope objects were used, but could make the code confusing to read. If we follow the convention that the "extra" scope is to be ignored, this would avoid the confusion. There may be details to be sorted out about releasing locks, etc, but those are implementation details.

Side note: Ideally, prior to the activation of the scope, none of LLVM code or data types would be exposed to the user, but this RFC doesn't try to go that far. The main concern here is a platform on which saving/restoring LLVM flags could be done.

I also suggest to create two classes: LLVMScope and LLVMTarget. The LLVMTarget would contain all the LLVM target flags and options, while LLVMScope would deal with activation, ParseIR and LoadIR. The LLVMTarget object could be passed around, copied, etc (without concerns about locking or activating anything), and its purpose would be to have all the target flags/options in one place, and it could only exist once scope has been activated.

Something like

class LLVMTarget {
public:
  LLVMTarget(const LLVMTarget&);  // copy ctor, etc.

private:
  friend class LLVMScope;
  LLVMTarget(ctx, values extracted from tvm::Target);

  std::shared_ptr<llvm::LLVMContext> ctx_;    // set via the private ctor, passed from LLVMContext
  llvm::TargetMachine *tm;
  ...
};

class LLVMScope {
public:
  LLVMScope(const Target& target) {
    // Parse the target into individual attributes, e.g. mtriple, etc. This would allow users
    // to create LLVMScope from Target and modify it before creating LLVMTarget.
    //
    // If the target has non-empty list of LLVM flags, it would be considered "modifying"
    // the global state, otherwise it would be "read-only".  This would allow creating
    // multiple "read-only" concurrent scopes, but only one "modifying" one.
  }

  void activate() {
     ctx_ = std::shared_ptr<llvm::LLVMContext>(CreateNewContext(is_read_only));
  }

  static std::pair<llvm::Module, LLVMScope> ParseIR(std::string ir) {   // same for LoadIR
    auto ctx = std::shared_ptr<llvm::LLVMContext>(CreateNewContext(false /*assume modifying*/));
    llvm::Module m = llvm::parse_string_etc(ctx, ...)
    LLVMScope scp(ctx, get_target_string(m));   // create an already active scope
    return {m, scp};
  }

  LLVMTarget getTargetInfo() {
    ICHECK(ctx_ != nullptr) << "must activate first";
    ...
  }

private:
  static llvm::LLVMContext* CreateNewContext(bool is_read_only) {
     // Create context with appropriate locking
  }
 
  LLVMScope(std::shared_ptr<llvm::LLVMContext> ctx, const Target& target);
  std::shared_ptr<llvm::LLVMContext> ctx_;
};

kparzysz-quic · 2022-07-08T17:15:33Z

To address the second question: we shouldn't allow nesting of LLVM scopes. The reason is that if the target doesn't specify any command line flags, the assumption is that it will rely on LLVM's defaults. If we were to nest scopes, and the outer scope did modify flags, then the inner scope (without flags) would not have a way of knowing that the global state has been altered.

tqchen · 2022-07-08T17:53:57Z

Just to followup, why does ParseIR require activation of a scope, or is it possible that ParseIR returns a tuple of Target and Module, where activation is done separately.

I am asking this mainly to see if we can get an explicit With style API

kparzysz-quic · 2022-07-08T18:50:50Z

Just to followup, why does ParseIR require activation of a scope, or is it possible that ParseIR returns a tuple of Target and Module, where activation is done separately.

The reason is that ParseIR needs to make calls to LLVM functions to create the llvm::Module.

Edit: I like the explicit With, but it doesn't work well with ParseIR/LoadIR. See my earlier comment.

tqchen · 2022-07-11T00:32:00Z

The reason is that ParseIR needs to make calls to LLVM functions to create the llvm::Module.

I get this part, my main question is whether we can initializeLLVM in the Target constructor but do option resetting in enter/exit, i could miss something here.

kparzysz-quic · 2022-07-11T13:26:20Z

The reason is that ParseIR needs to make calls to LLVM functions to create the llvm::Module.

I get this part, my main question is whether we can initializeLLVM in the Target constructor but do option resetting in enter/exit, i could miss something here.

I see what you mean. My concern is that since command line options can be used almost everywhere in LLVM (not just optimizations or code generation), we should try to do the saving as early as possible (and restoring as late as possible). Ideally all the LLVM calls in the program would be enclosed in the save/restore functions (even the bitcode reader registers a couple of command line flags).

At the same time we could make a deliberate decision to only apply the save/restore to a part of LLVM functionality, i.e. allow some kinds of LLVM function calls to happen outside of the save/restore scope. Let me know what you think.

tqchen · 2022-07-11T13:36:31Z

Logically, there are two kinds of operations involved:

K0: operations that deserializes/deserializes the data structure (llvm::Module)
K1: operations that transforms the llvm::Module

We are certainly in agreement that all K1 should be enclosed with option save/recovery scope. Parse/Save seems to belong to K0. From a logic reasoning pov, operations in K0 should not be impacted by cli opt as a result my comment.

But I think @kparzysz-quic what you mean is that cl opt might breach into K0 as well. If that is really the case, it would indeed be helpful to take the setting/recover at the load/store level. Would be great to reason and confirm a bit. Since the ability to de-couple K0 and K1 is nice from the overall code structuring pov

kparzysz-quic · 2022-07-11T13:53:12Z

It does, yes: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L91-L99

The flags above come from the latest development branch, so in order to handle all LLVM versions we'd need to check all of them since 4.0 (which I'm ok doing), and we will need to keep this code up to date with the latest developments in LLVM.

On the other hand, these aren't options that are very likely to be used by TVM users, and the issue with command line flags only really applies to deserialization (since serialization would happen after the flags had been saved). I don't think it's unreasonable to state that we only handle (K1), but we'd need to ensure that (K1) also included creation of any data structures that assist in modifying or in analysis of llvm::Module.

tqchen · 2022-07-12T16:27:39Z

In that case, I think it would be nice to state we handle K1, and leave ONLY serialization/deserialization out. This way we can have With style API, which indicate that we are explicit entering the scope for any processing.

auto data = LLVMTarget::ParseIR(file_name);
With<LLVMTarget> scope(mod_data);

``

tqchen · 2022-07-08T12:51:38Z

rfcs/0080-llvm-target.md

+  LLVMScope(const Target& target);
+  ~LLVMScope();
+
+  std::pair<llvm::Module, LLVMScope> LoadIR(const std::string& file_name);


What is the semantics of this function? Does it create a new scope(besides the current one?) A code example that demonstrates LoadIR would be helpful

Added comments with descriptions.

kparzysz-quic · 2022-07-13T23:16:48Z

I updated the text of the RFC, the rendered link above, and the prototype draft PR.

kparzysz-quic · 2022-07-15T17:37:03Z

Should we still wait for review from Junru?

tqchen · 2022-07-15T17:53:58Z

I think we can go ahead and merge

This implements RFC 80. See apache/tvm-rfcs#83. Summary of changes: - Created an `LLVMInstance` class. Uses of LLVM functions and data struc- tures should be contained within the lifetime of an object of this class. LLVMInstance object contains LLVMContext, and implements member functions to deserialize an llvm::Module. - Created an `LLVMTarget` class. Once an LLVMInstance object has been created, an object of LLVMTarget class can be created from TVM target string, or Target object for "llvm" target. Once LLVM command line flags are added to the "llvm" target, one of the goals of this object will be to save/restore relevant LLVM global state. Another objective for the LLVMTarget object is to be a single location for all LLVM-related compilation structures and options (such as TargetMachine, FastMathFlags, etc.)

…2140) This implements RFC 80. See apache/tvm-rfcs#83. Summary of changes: - Created an `LLVMInstance` class. Uses of LLVM functions and data struc- tures should be contained within the lifetime of an object of this class. LLVMInstance object contains LLVMContext, and implements member functions to deserialize an llvm::Module. - Created an `LLVMTarget` class. Once an LLVMInstance object has been created, an object of LLVMTarget class can be created from TVM target string, or Target object for "llvm" target. Once LLVM command line flags are added to the "llvm" target, one of the goals of this object will be to save/restore relevant LLVM global state. Another objective for the LLVMTarget object is to be a single location for all LLVM-related compilation structures and options (such as TargetMachine, FastMathFlags, etc.)

kparzysz-quic mentioned this pull request Jun 28, 2022

[LLVM] Encapsulate LLVM target for use with LLVM libraries apache/tvm#11933

Closed

areusch requested review from areusch and tqchen June 28, 2022 23:11

tqchen reviewed Jun 29, 2022

View reviewed changes

Krzysztof Parzyszek added 3 commits June 29, 2022 12:36

[RFC] Encapsulate LLVM target for use with LLVM libraries

2f0417c

Add the RFC number

b61d2ca

Clarify the objective of the RFC

5e7398e

Clarify that this RFC proposes a scope object that can be used later to save/restore LLVM's options. The saving and restoring itself is not a part of the RFC, since the `llvm` target in TVM does not yet allow passing LLVM command line flags.

kparzysz-quic force-pushed the llvm-target branch from e71d114 to 5e7398e Compare June 29, 2022 19:37

Krzysztof Parzyszek added 2 commits June 29, 2022 12:40

Rename LLVMTarget to LLVMScope

96ead35

Clarification about actual saving/restoring of llvm flags as future step

ebfb68f

There was another passage about it that was ommitted from previous commits.

areusch approved these changes Jul 1, 2022

View reviewed changes

kparzysz-quic requested a review from junrushao July 5, 2022 17:23

kparzysz-quic changed the title ~~[RFC] Encapsulate LLVM target for use with LLVM libraries~~ [RFC] Create LLVM scope class for use with LLVM libraries Jul 7, 2022

Reflect the latest status of the discussion

85e552f

tqchen approved these changes Jul 13, 2022

View reviewed changes

Add descriptions for ParseIR and LoadIR

8a739f8

tqchen merged commit 22d1d11 into apache:main Jul 15, 2022

kparzysz-quic deleted the llvm-target branch July 15, 2022 19:03

kparzysz-quic mentioned this pull request Jul 19, 2022

[LLVM] Create LLVM scope object for use with LLVM libraries apache/tvm#12140

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Create LLVM scope class for use with LLVM libraries #83

[RFC] Create LLVM scope class for use with LLVM libraries #83

kparzysz-quic commented Jun 27, 2022 •

edited

Loading

kparzysz-quic commented Jun 27, 2022 •

edited

Loading

tqchen commented Jun 29, 2022

tqchen Jun 29, 2022

kparzysz-quic Jun 29, 2022

areusch left a comment

kparzysz-quic commented Jul 5, 2022

kparzysz-quic commented Jul 6, 2022

tqchen commented Jul 8, 2022

kparzysz-quic commented Jul 8, 2022

kparzysz-quic commented Jul 8, 2022

tqchen commented Jul 8, 2022

kparzysz-quic commented Jul 8, 2022 •

edited

Loading

tqchen commented Jul 11, 2022

kparzysz-quic commented Jul 11, 2022

tqchen commented Jul 11, 2022

kparzysz-quic commented Jul 11, 2022

tqchen commented Jul 12, 2022

tqchen Jul 8, 2022

kparzysz-quic Jul 14, 2022

kparzysz-quic commented Jul 13, 2022

kparzysz-quic commented Jul 15, 2022

tqchen commented Jul 15, 2022

[RFC] Create LLVM scope class for use with LLVM libraries #83

[RFC] Create LLVM scope class for use with LLVM libraries #83

Conversation

kparzysz-quic commented Jun 27, 2022 • edited Loading

kparzysz-quic commented Jun 27, 2022 • edited Loading

tqchen commented Jun 29, 2022

tqchen Jun 29, 2022

Choose a reason for hiding this comment

kparzysz-quic Jun 29, 2022

Choose a reason for hiding this comment

areusch left a comment

Choose a reason for hiding this comment

kparzysz-quic commented Jul 5, 2022

kparzysz-quic commented Jul 6, 2022

tqchen commented Jul 8, 2022

kparzysz-quic commented Jul 8, 2022

kparzysz-quic commented Jul 8, 2022

tqchen commented Jul 8, 2022

kparzysz-quic commented Jul 8, 2022 • edited Loading

tqchen commented Jul 11, 2022

kparzysz-quic commented Jul 11, 2022

tqchen commented Jul 11, 2022

kparzysz-quic commented Jul 11, 2022

tqchen commented Jul 12, 2022

tqchen Jul 8, 2022

Choose a reason for hiding this comment

kparzysz-quic Jul 14, 2022

Choose a reason for hiding this comment

kparzysz-quic commented Jul 13, 2022

kparzysz-quic commented Jul 15, 2022

tqchen commented Jul 15, 2022

kparzysz-quic commented Jun 27, 2022 •

edited

Loading

kparzysz-quic commented Jun 27, 2022 •

edited

Loading

kparzysz-quic commented Jul 8, 2022 •

edited

Loading