Provide an API to dump the machine code generated by LLVM for a given function. #31

romix · 2015-04-25T21:11:28Z

It would be nice to see the final machine code produced by LLVM after JITting a given function.

romix · 2015-04-25T21:14:22Z

LLVM also provides options to dump LLVM IR before/after all/some LLVM passes or to generate debug output during a processing performed by a certain LLVM pass. It could be useful, if this logging/debugging functionality would be possible to trigger via the API. It would allow for better understanding of optimizations and transformations performed by LLVM.

dibyendumajumdar · 2015-04-25T21:35:09Z

I would very much like to provide facility to dump machine code - just haven't figured out how to do it. I haven't found any documentation on how to do it - if you know of any docs please would you point me to them.

Dumping IR between passes is also possible, but right now I am using the standard PassManagerBuilder so that means I get the standard Clang /O1 /O2 /O3 passes etc.

romix · 2015-04-25T21:53:49Z

I think you can use the LLVMTargetMachineEmitToMemoryBuffer:
http://llvm.org/docs/doxygen/html/TargetMachineC_8cpp.html#aaa9ce583969eb8754512e70ec4b80061

Just specify that you want to have an LLVMAssemblyFile.

Dumping IR between passes is also possible, but right now I am using the standard
PassManagerBuilder so that means I get the standard Clang /O1 /O2 /O3 passes etc.

It can be still useful to see what each pass does. If possible, provide an option for dumping between/during LLVM passes.

romix · 2015-04-25T21:54:51Z

An example of how LLVMTargetMachineEmitToMemoryBuffer is used:
https://github.com/numba/llvmlite/blob/7da05cdbe3d56620bcf46d155b01089e78e83082/ffi/targets.cpp

This could be also related:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-October/054033.html

dibyendumajumdar · 2015-04-25T22:03:30Z

Cool - thanks! Will look into implementing this.

dibyendumajumdar · 2015-04-28T19:58:18Z

Hi - I tried above approach - unfortunately it does not disassemble - rather generates machine code from scratch. Haven't checked in due to that. It seems that to disassemble I need to use a different approach.

romix · 2015-04-28T20:01:15Z

You mean it does not take the machine code which you already generated, but generates the same machine code again and emits it in a disassembled form?

dibyendumajumdar · 2015-04-28T20:08:47Z

That's right, but not the same machine code, e.g. doesn't take optimization options into account

romix · 2015-04-28T20:10:56Z

Well, I understand that this is not ideal, but it is supposed to be used only for debugging, where performance is not so important, right? So, even though it is pretty inefficient, it still produces the same code and in this sense the disassembly is correct, or?

dibyendumajumdar · 2015-04-28T20:13:11Z

Well I guess I want to see the actual machine code ... rather than a regenerated version as I can't be sure that it reflects what will be executed. In my view it is not so useful. There is a way to disassemble the actual code so that would be better

dibyendumajumdar · 2015-04-28T20:14:13Z

I can checkin this for now, but will probably rewrite it

romix · 2015-04-28T20:15:37Z

Sure. I'm not saying that it should be the final solution. It is only for the time being, until a proper solution for disassembly is found.

romix · 2015-04-28T20:17:07Z

BTW, I think doing a real disassembly may turn out pretty hard, because it would loose most of the symbolic information eventually. It is probably easier to force LLVM to emit both the machine code and assembly at the same time from the same input, i.e. pipeline should contain both native code generation and assembly generation.

dibyendumajumdar · 2015-04-28T20:30:11Z

Yes I need to find out how to do that - I think the other link you posted might work:

http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-October/054033.html

dibyendumajumdar · 2015-04-28T21:41:54Z

I have added two types of dump.

ravi.compile(f [,b]) - b is an optional boolean which if set to true will cause LLVM to dump the code generation (it is very verbose output)

ravi.dumpllvmasm(f) - this dumps the assembly code output but also emits a warning that the generated assembly is not a disassembly of JIT code

romix · 2015-04-28T21:54:29Z

Thanks a lot! I checked on OS X and it works just fine!

romix · 2015-04-28T21:57:38Z

Actually, if possible I'd like to have an option to produce even more debug information. Right now, LLVM dumps the IR after each pass. But it does not show the debug output from each pass as it tries to transform the code. I think it can be very useful, because it may provide a hint why certain optimizations are not applied (e.g. it could not prove that two pointers do not alias, or it could not hoist a load, because something was not allowing it to do so).

romix · 2015-04-28T22:00:20Z

I have one more question: Have you thought about building a CFG from the LuaVM bytecode? I don't mean LLVM CFG, I mean high-level LuaVM CFG (control-flow-graph)? If you'd have it, you could do certain kinds of analysis, that LLVM cannot do as it operates at the lower level.

E.g. you could try to detect dead stores, you could detect if a given variable is used in a given basic block or in a given LuaVM instruction. You could perform escape analysis, etc.

dibyendumajumdar · 2015-04-28T22:12:25Z

Re your question about CFG from Lua code, maybe at some point.

There are several conflicting goals for Ravi:
a) Keep the code base simple and clean so that I can get others to more easily contribute
b) Maintain compatibility with Lua as far as possible - especially merge upstream changes
c) Achieve better performance - if possible close to LuaJIT when static typing is used - but not at all costs. The reason Ravi exists is that I wasn't able to decipher LuaJIT code - so I don't want Ravi to end up like that

Some optimizations will only be possible if I implement a full AST for Lua and change the compiler. But then upstream merges will be very difficult.

romix · 2015-04-28T22:18:15Z

I absolutely understand your intention, really. Therefore, my idea was that CFG is built not based on the AST or directly by parser, but based on the bytecode. This was it is completely decoupled from the LuaVM sources and is placed in your Ravi-specific source files. But of course, doing so would increase complexity.

I ask those questions because I've got the impression that you'll soon run into a wall with the current "context-free" approach, where each Lua opcode is translated into LLVM IR one-by-one. This makes it impossible to perform many optimizations. Having a "context-sensitive" translation into LLVM IR, i.e. doing some kinds of analysis and transformation at Lua bytecode level, may result in a much better resulting performance, but at the expense of making the mapping from LuaVM opcode to LLVM IR more complex.

romix · 2015-04-28T22:20:49Z

The reason Ravi exists is that I wasn't able to decipher LuaJIT code - so I don't want Ravi to end up
like that

I know this feeling as I also tried to decipher LuaJIT code ;-) IMHO, LuaJIT code looks the way how it looks intentionally, if you ask me. The author wanted it to be not very understandable by others.
But as long as you pay attention and write Ravi code in an understandable way, there is no danger. After all LLVM is a huge project, but its source code if way better understandable than LuaJIT.

dibyendumajumdar · 2015-04-28T22:24:29Z

I will implement optimizations that are possible starting from Lua bytecode. One of the first ones is eliminating the overhead of updating the fornum "external" index - this can be done by checking whether the variable is being written to or being captured in an up-value.

Another area is expression evaluation. Right now each node sets the type - but actually the type could be set after the entire expression is evaluated.

I think LuaJIT 1.1.8 had some bytecode optimization (I could be mistaken) - so I could lift those.

romix · 2015-04-28T22:26:52Z

Sounds like a good plan.

romix · 2015-04-28T22:37:08Z

BTW, FWIW you could try to reuse the LLVM classes for basic blocks, instruction lists, etc. You just sub-class them and define your own blocks, instructions, etc in a form that you like. This would give you things like iteration over all basic blocks, over all instructions, computation of dominance information,etc for free. You could even reuse their pass manager for your own passes working on your own high-level internal representation. The price for it is that you become more dependent on LLVM. Since I've seen you're also playing with the idea of using gccjit, I don't know how much you want to depend on LLVM.

romix · 2015-04-29T00:19:40Z

I looked at your code for assembly dumping. One thing which I don't understand is: why do you create a new PassManager there instead of reusing the one, which was created in RaviJITFunctionImpl::compile? Reusing this one would ensure that the same optimizations are applied to the code before an assembly is produced, or?

dibyendumajumdar · 2015-04-29T22:53:02Z

To be honest I don't understand how the assembly generation passes are hooked together and how this fits into the passmanagers in compile()

romix · 2015-04-29T22:55:20Z

May be it is worth asking on LLVM mailing lists? I think they should be able to explain how to do it pretty quickly.

romix · 2015-04-30T05:52:43Z

This is my attempt to make sure that the same pipeline is used to produce the machine code and the disassembly:

diff --git a/include/ravi_llvmcodegen.h b/include/ravi_llvmcodegen.h
index 0fefbd5..8d31d2f 100644
--- a/include/ravi_llvmcodegen.h
+++ b/include/ravi_llvmcodegen.h
@@ -303,6 +303,12 @@ class RAVI_API RaviJITFunctionImpl : public RaviJITFunction {

   // The llvm Function definition
   llvm::Function *function_;
+  
+  // LLVM Module Pass Manager
+  std::unique_ptr<llvm::PassManager> MPM;
+  
+  // LLVM Function Pass Manager
+  std::unique_ptr<llvm::FunctionPassManager> FPM;

   // Pointer to compiled function - this is only set when
   // the function
diff --git a/src/ravijit.cpp b/src/ravijit.cpp
index 8cd17cb..ecd88ef 100644
--- a/src/ravijit.cpp
+++ b/src/ravijit.cpp
@@ -149,8 +149,6 @@ RaviJITFunctionImpl::RaviJITFunctionImpl(
     fprintf(stderr, "Could not create ExecutionEngine: %s\n", errStr.c_str());
     return;
   }
-}
-
 RaviJITFunctionImpl::~RaviJITFunctionImpl() {
   // Remove this function from parent
   owner_->deleteFunction(name_);
@@ -189,8 +187,6 @@ static void addMemorySanitizerPass(const llvm::PassManagerBuilder &Builder,
 }
 #endif

-void *RaviJITFunctionImpl::compile(bool doDump) {
-
   // We use the PassManagerBuilder to setup optimization
   // passes - the PassManagerBuilder allows easy configuration of
   // typical C/C++ passes corresponding to O0, O1, O2, and O3 compiler options
@@ -214,8 +210,8 @@ void *RaviJITFunctionImpl::compile(bool doDump) {
 #endif
   {
     // Create a function pass manager for this engine
-    std::unique_ptr<llvm::FunctionPassManager> FPM(
-        new llvm::FunctionPassManager(module_));
+    FPM = std::unique_ptr<llvm::FunctionPassManager>(
+            new llvm::FunctionPassManager(module_));

 // Set up the optimizer pipeline.  Start with registering info about how the
 // target lays out data structures.
@@ -231,20 +227,22 @@ void *RaviJITFunctionImpl::compile(bool doDump) {
 #endif
     pmb.populateFunctionPassManager(*FPM);
     FPM->doInitialization();
-    FPM->run(*function_);
   }

   {
-    std::unique_ptr<llvm::PassManager> MPM(new llvm::PassManager());
+    MPM = std::unique_ptr<llvm::PassManager>(new llvm::PassManager());
 #if LLVM_VERSION_MINOR > 5
     MPM->add(new llvm::DataLayoutPass());
 #else
     MPM->add(new llvm::DataLayoutPass(*engine_->getDataLayout()));
 #endif
     pmb.populateModulePassManager(*MPM);
-    MPM->run(*module_);
   }

+}
+
+void *RaviJITFunctionImpl::compile(bool doDump) {
+
   if (ptr_)
     return ptr_;
   if (!function_ || !engine_)
@@ -257,6 +255,10 @@ void *RaviJITFunctionImpl::compile(bool doDump) {
     TM->Options.PrintMachineCode = 1;
   }

+  // Run required passes.
+  FPM->run(*function_);
+  MPM->run(*module_);
+
   // Upon creation, MCJIT holds a pointer to the Module object
   // that it received from EngineBuilder but it does not immediately
   // generate code for this module. Code generation is deferred
@@ -300,13 +302,16 @@ void RaviJITFunctionImpl::dumpAssembly() {
   }
   if (!ptr_)
     module_->setDataLayout(engine_->getDataLayout());
-  llvm::legacy::PassManager pass;
-  if (TM->addPassesToEmitFile(pass, formatted_stream,
+  //llvm::legacy::PassManager pass;
+  if (TM->addPassesToEmitFile(*MPM.get(), formatted_stream,
                               llvm::TargetMachine::CGFT_AssemblyFile)) {
     llvm::errs() << "unable to add passes for generating assemblyfile\n";
     return;
   }
-  pass.run(*module_);
+  // Run the same passes as during the usual compilation.
+  FPM->run(*function_);
+  MPM->run(*module_);
+  engine_->finalizeObject();
   formatted_stream.flush();
   llvm::errs() << codestr << "\n";
   llvm::errs()

dibyendumajumdar · 2015-04-30T19:02:41Z

Thanks. I don't think holding the pass managers in the function impl is a good idea though - I would rather refactor the pass manager calls into a common function both can call.

romix · 2015-04-30T20:11:25Z

I don't think holding the pass managers in the function impl is a good idea though

Well, in LLVM usually you create them once and then reuse. But since you generate new pass manager per module/function, you don't do it yet. But may be you should. Pass managers are pretty heavy-weight and creating them every time is not such a good idea.

dibyendumajumdar · 2015-04-30T20:40:39Z

There is not much guidance in the LLVM docs and the examples I have seen so not do as you suggest. For example:

http://cs.swan.ac.uk/~csdavec/FOSDEM12/compiler.cc.html

Do you have a doc reference or example you could point me to - that would be very helpful. I do not yet understand how to hook up pass managers properly - just going by examples.

romix · 2015-04-30T21:08:36Z

I don't have docs at hand. http://llvm.org/docs/WritingAnLLVMPass.html seems to imply that PassManagers are rather expensive to create. LLVM compiler uses only one global pass manager, AFAIK.

And, BTW, you are eventually using a wrong PassManager. You include <llvm/PassManager.h> which
falls back to <llvm/IR/LegacyPassManager.h>. But the newer APIs for PassManager are in <llvm/IR/PassManager.h>. The interesting thing about the new ones is that they are more re-usable. They do not take Module or Function in the constructor and thus you can reuse the same manager for multiple modules or multiple functions.

dibyendumajumdar · 2015-04-30T23:06:45Z

I have checked in an implementation.

dibyendumajumdar · 2015-06-06T12:24:12Z

Resolved

dibyendumajumdar self-assigned this Apr 26, 2015

dibyendumajumdar closed this as completed Jun 6, 2015

dibyendumajumdar pushed a commit that referenced this issue Jul 10, 2018

issue #31 tidy up

74cb240

pramodk mentioned this issue Apr 18, 2021

Adding debug information for JIT compiled kernel? BlueBrain/nmodl#592

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide an API to dump the machine code generated by LLVM for a given function. #31

Provide an API to dump the machine code generated by LLVM for a given function. #31

romix commented Apr 25, 2015

romix commented Apr 25, 2015

dibyendumajumdar commented Apr 25, 2015

romix commented Apr 25, 2015

romix commented Apr 25, 2015

dibyendumajumdar commented Apr 25, 2015

dibyendumajumdar commented Apr 28, 2015

romix commented Apr 28, 2015

dibyendumajumdar commented Apr 28, 2015

romix commented Apr 28, 2015

dibyendumajumdar commented Apr 28, 2015

dibyendumajumdar commented Apr 28, 2015

romix commented Apr 28, 2015

romix commented Apr 28, 2015

dibyendumajumdar commented Apr 28, 2015

dibyendumajumdar commented Apr 28, 2015

romix commented Apr 28, 2015

romix commented Apr 28, 2015

romix commented Apr 28, 2015

dibyendumajumdar commented Apr 28, 2015

romix commented Apr 28, 2015

romix commented Apr 28, 2015

dibyendumajumdar commented Apr 28, 2015

romix commented Apr 28, 2015

romix commented Apr 28, 2015

romix commented Apr 29, 2015

dibyendumajumdar commented Apr 29, 2015

romix commented Apr 29, 2015

romix commented Apr 30, 2015

dibyendumajumdar commented Apr 30, 2015

romix commented Apr 30, 2015

dibyendumajumdar commented Apr 30, 2015

romix commented Apr 30, 2015

dibyendumajumdar commented Apr 30, 2015

dibyendumajumdar commented Jun 6, 2015

Provide an API to dump the machine code generated by LLVM for a given function. #31

Provide an API to dump the machine code generated by LLVM for a given function. #31

Comments

romix commented Apr 25, 2015

romix commented Apr 25, 2015

dibyendumajumdar commented Apr 25, 2015

romix commented Apr 25, 2015

romix commented Apr 25, 2015

dibyendumajumdar commented Apr 25, 2015

dibyendumajumdar commented Apr 28, 2015

romix commented Apr 28, 2015

dibyendumajumdar commented Apr 28, 2015

romix commented Apr 28, 2015

dibyendumajumdar commented Apr 28, 2015

dibyendumajumdar commented Apr 28, 2015

romix commented Apr 28, 2015

romix commented Apr 28, 2015

dibyendumajumdar commented Apr 28, 2015

dibyendumajumdar commented Apr 28, 2015

romix commented Apr 28, 2015

romix commented Apr 28, 2015

romix commented Apr 28, 2015

dibyendumajumdar commented Apr 28, 2015

romix commented Apr 28, 2015

romix commented Apr 28, 2015

dibyendumajumdar commented Apr 28, 2015

romix commented Apr 28, 2015

romix commented Apr 28, 2015

romix commented Apr 29, 2015

dibyendumajumdar commented Apr 29, 2015

romix commented Apr 29, 2015

romix commented Apr 30, 2015

dibyendumajumdar commented Apr 30, 2015

romix commented Apr 30, 2015

dibyendumajumdar commented Apr 30, 2015

romix commented Apr 30, 2015

dibyendumajumdar commented Apr 30, 2015

dibyendumajumdar commented Jun 6, 2015