Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an API to dump the machine code generated by LLVM for a given function. #31

Closed
romix opened this issue Apr 25, 2015 · 34 comments
Closed
Assignees

Comments

@romix
Copy link

romix commented Apr 25, 2015

It would be nice to see the final machine code produced by LLVM after JITting a given function.

@romix
Copy link
Author

romix commented Apr 25, 2015

LLVM also provides options to dump LLVM IR before/after all/some LLVM passes or to generate debug output during a processing performed by a certain LLVM pass. It could be useful, if this logging/debugging functionality would be possible to trigger via the API. It would allow for better understanding of optimizations and transformations performed by LLVM.

@dibyendumajumdar
Copy link
Owner

I would very much like to provide facility to dump machine code - just haven't figured out how to do it. I haven't found any documentation on how to do it - if you know of any docs please would you point me to them.

Dumping IR between passes is also possible, but right now I am using the standard PassManagerBuilder so that means I get the standard Clang /O1 /O2 /O3 passes etc.

@romix
Copy link
Author

romix commented Apr 25, 2015

I think you can use the LLVMTargetMachineEmitToMemoryBuffer:
http://llvm.org/docs/doxygen/html/TargetMachineC_8cpp.html#aaa9ce583969eb8754512e70ec4b80061

Just specify that you want to have an LLVMAssemblyFile.

Dumping IR between passes is also possible, but right now I am using the standard
PassManagerBuilder so that means I get the standard Clang /O1 /O2 /O3 passes etc.

It can be still useful to see what each pass does. If possible, provide an option for dumping between/during LLVM passes.

@romix
Copy link
Author

romix commented Apr 25, 2015

@dibyendumajumdar
Copy link
Owner

Cool - thanks! Will look into implementing this.

@dibyendumajumdar dibyendumajumdar self-assigned this Apr 26, 2015
@dibyendumajumdar
Copy link
Owner

Hi - I tried above approach - unfortunately it does not disassemble - rather generates machine code from scratch. Haven't checked in due to that. It seems that to disassemble I need to use a different approach.

@romix
Copy link
Author

romix commented Apr 28, 2015

You mean it does not take the machine code which you already generated, but generates the same machine code again and emits it in a disassembled form?

@dibyendumajumdar
Copy link
Owner

That's right, but not the same machine code, e.g. doesn't take optimization options into account

@romix
Copy link
Author

romix commented Apr 28, 2015

Well, I understand that this is not ideal, but it is supposed to be used only for debugging, where performance is not so important, right? So, even though it is pretty inefficient, it still produces the same code and in this sense the disassembly is correct, or?

@dibyendumajumdar
Copy link
Owner

Well I guess I want to see the actual machine code ... rather than a regenerated version as I can't be sure that it reflects what will be executed. In my view it is not so useful. There is a way to disassemble the actual code so that would be better

@dibyendumajumdar
Copy link
Owner

I can checkin this for now, but will probably rewrite it

@romix
Copy link
Author

romix commented Apr 28, 2015

Sure. I'm not saying that it should be the final solution. It is only for the time being, until a proper solution for disassembly is found.

@romix
Copy link
Author

romix commented Apr 28, 2015

BTW, I think doing a real disassembly may turn out pretty hard, because it would loose most of the symbolic information eventually. It is probably easier to force LLVM to emit both the machine code and assembly at the same time from the same input, i.e. pipeline should contain both native code generation and assembly generation.

@dibyendumajumdar
Copy link
Owner

Yes I need to find out how to do that - I think the other link you posted might work:

http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-October/054033.html

@dibyendumajumdar
Copy link
Owner

I have added two types of dump.

ravi.compile(f [,b]) - b is an optional boolean which if set to true will cause LLVM to dump the code generation (it is very verbose output)

ravi.dumpllvmasm(f) - this dumps the assembly code output but also emits a warning that the generated assembly is not a disassembly of JIT code

@romix
Copy link
Author

romix commented Apr 28, 2015

Thanks a lot! I checked on OS X and it works just fine!

@romix
Copy link
Author

romix commented Apr 28, 2015

Actually, if possible I'd like to have an option to produce even more debug information. Right now, LLVM dumps the IR after each pass. But it does not show the debug output from each pass as it tries to transform the code. I think it can be very useful, because it may provide a hint why certain optimizations are not applied (e.g. it could not prove that two pointers do not alias, or it could not hoist a load, because something was not allowing it to do so).

@romix
Copy link
Author

romix commented Apr 28, 2015

I have one more question: Have you thought about building a CFG from the LuaVM bytecode? I don't mean LLVM CFG, I mean high-level LuaVM CFG (control-flow-graph)? If you'd have it, you could do certain kinds of analysis, that LLVM cannot do as it operates at the lower level.

E.g. you could try to detect dead stores, you could detect if a given variable is used in a given basic block or in a given LuaVM instruction. You could perform escape analysis, etc.

@dibyendumajumdar
Copy link
Owner

Re your question about CFG from Lua code, maybe at some point.

There are several conflicting goals for Ravi:
a) Keep the code base simple and clean so that I can get others to more easily contribute
b) Maintain compatibility with Lua as far as possible - especially merge upstream changes
c) Achieve better performance - if possible close to LuaJIT when static typing is used - but not at all costs. The reason Ravi exists is that I wasn't able to decipher LuaJIT code - so I don't want Ravi to end up like that

Some optimizations will only be possible if I implement a full AST for Lua and change the compiler. But then upstream merges will be very difficult.

@romix
Copy link
Author

romix commented Apr 28, 2015

I absolutely understand your intention, really. Therefore, my idea was that CFG is built not based on the AST or directly by parser, but based on the bytecode. This was it is completely decoupled from the LuaVM sources and is placed in your Ravi-specific source files. But of course, doing so would increase complexity.

I ask those questions because I've got the impression that you'll soon run into a wall with the current "context-free" approach, where each Lua opcode is translated into LLVM IR one-by-one. This makes it impossible to perform many optimizations. Having a "context-sensitive" translation into LLVM IR, i.e. doing some kinds of analysis and transformation at Lua bytecode level, may result in a much better resulting performance, but at the expense of making the mapping from LuaVM opcode to LLVM IR more complex.

@romix
Copy link
Author

romix commented Apr 28, 2015

The reason Ravi exists is that I wasn't able to decipher LuaJIT code - so I don't want Ravi to end up
like that

I know this feeling as I also tried to decipher LuaJIT code ;-) IMHO, LuaJIT code looks the way how it looks intentionally, if you ask me. The author wanted it to be not very understandable by others.
But as long as you pay attention and write Ravi code in an understandable way, there is no danger. After all LLVM is a huge project, but its source code if way better understandable than LuaJIT.

@dibyendumajumdar
Copy link
Owner

I will implement optimizations that are possible starting from Lua bytecode. One of the first ones is eliminating the overhead of updating the fornum "external" index - this can be done by checking whether the variable is being written to or being captured in an up-value.

Another area is expression evaluation. Right now each node sets the type - but actually the type could be set after the entire expression is evaluated.

I think LuaJIT 1.1.8 had some bytecode optimization (I could be mistaken) - so I could lift those.

@romix
Copy link
Author

romix commented Apr 28, 2015

Sounds like a good plan.

@romix
Copy link
Author

romix commented Apr 28, 2015

BTW, FWIW you could try to reuse the LLVM classes for basic blocks, instruction lists, etc. You just sub-class them and define your own blocks, instructions, etc in a form that you like. This would give you things like iteration over all basic blocks, over all instructions, computation of dominance information,etc for free. You could even reuse their pass manager for your own passes working on your own high-level internal representation. The price for it is that you become more dependent on LLVM. Since I've seen you're also playing with the idea of using gccjit, I don't know how much you want to depend on LLVM.

@romix
Copy link
Author

romix commented Apr 29, 2015

I looked at your code for assembly dumping. One thing which I don't understand is: why do you create a new PassManager there instead of reusing the one, which was created in RaviJITFunctionImpl::compile? Reusing this one would ensure that the same optimizations are applied to the code before an assembly is produced, or?

@dibyendumajumdar
Copy link
Owner

To be honest I don't understand how the assembly generation passes are hooked together and how this fits into the passmanagers in compile()

@romix
Copy link
Author

romix commented Apr 29, 2015

May be it is worth asking on LLVM mailing lists? I think they should be able to explain how to do it pretty quickly.

@romix
Copy link
Author

romix commented Apr 30, 2015

This is my attempt to make sure that the same pipeline is used to produce the machine code and the disassembly:

diff --git a/include/ravi_llvmcodegen.h b/include/ravi_llvmcodegen.h
index 0fefbd5..8d31d2f 100644
--- a/include/ravi_llvmcodegen.h
+++ b/include/ravi_llvmcodegen.h
@@ -303,6 +303,12 @@ class RAVI_API RaviJITFunctionImpl : public RaviJITFunction {

   // The llvm Function definition
   llvm::Function *function_;
+  
+  // LLVM Module Pass Manager
+  std::unique_ptr<llvm::PassManager> MPM;
+  
+  // LLVM Function Pass Manager
+  std::unique_ptr<llvm::FunctionPassManager> FPM;

   // Pointer to compiled function - this is only set when
   // the function
diff --git a/src/ravijit.cpp b/src/ravijit.cpp
index 8cd17cb..ecd88ef 100644
--- a/src/ravijit.cpp
+++ b/src/ravijit.cpp
@@ -149,8 +149,6 @@ RaviJITFunctionImpl::RaviJITFunctionImpl(
     fprintf(stderr, "Could not create ExecutionEngine: %s\n", errStr.c_str());
     return;
   }
-}
-
 RaviJITFunctionImpl::~RaviJITFunctionImpl() {
   // Remove this function from parent
   owner_->deleteFunction(name_);
@@ -189,8 +187,6 @@ static void addMemorySanitizerPass(const llvm::PassManagerBuilder &Builder,
 }
 #endif

-void *RaviJITFunctionImpl::compile(bool doDump) {
-
   // We use the PassManagerBuilder to setup optimization
   // passes - the PassManagerBuilder allows easy configuration of
   // typical C/C++ passes corresponding to O0, O1, O2, and O3 compiler options
@@ -214,8 +210,8 @@ void *RaviJITFunctionImpl::compile(bool doDump) {
 #endif
   {
     // Create a function pass manager for this engine
-    std::unique_ptr<llvm::FunctionPassManager> FPM(
-        new llvm::FunctionPassManager(module_));
+    FPM = std::unique_ptr<llvm::FunctionPassManager>(
+            new llvm::FunctionPassManager(module_));

 // Set up the optimizer pipeline.  Start with registering info about how the
 // target lays out data structures.
@@ -231,20 +227,22 @@ void *RaviJITFunctionImpl::compile(bool doDump) {
 #endif
     pmb.populateFunctionPassManager(*FPM);
     FPM->doInitialization();
-    FPM->run(*function_);
   }

   {
-    std::unique_ptr<llvm::PassManager> MPM(new llvm::PassManager());
+    MPM = std::unique_ptr<llvm::PassManager>(new llvm::PassManager());
 #if LLVM_VERSION_MINOR > 5
     MPM->add(new llvm::DataLayoutPass());
 #else
     MPM->add(new llvm::DataLayoutPass(*engine_->getDataLayout()));
 #endif
     pmb.populateModulePassManager(*MPM);
-    MPM->run(*module_);
   }

+}
+
+void *RaviJITFunctionImpl::compile(bool doDump) {
+
   if (ptr_)
     return ptr_;
   if (!function_ || !engine_)
@@ -257,6 +255,10 @@ void *RaviJITFunctionImpl::compile(bool doDump) {
     TM->Options.PrintMachineCode = 1;
   }

+  // Run required passes.
+  FPM->run(*function_);
+  MPM->run(*module_);
+
   // Upon creation, MCJIT holds a pointer to the Module object
   // that it received from EngineBuilder but it does not immediately
   // generate code for this module. Code generation is deferred
@@ -300,13 +302,16 @@ void RaviJITFunctionImpl::dumpAssembly() {
   }
   if (!ptr_)
     module_->setDataLayout(engine_->getDataLayout());
-  llvm::legacy::PassManager pass;
-  if (TM->addPassesToEmitFile(pass, formatted_stream,
+  //llvm::legacy::PassManager pass;
+  if (TM->addPassesToEmitFile(*MPM.get(), formatted_stream,
                               llvm::TargetMachine::CGFT_AssemblyFile)) {
     llvm::errs() << "unable to add passes for generating assemblyfile\n";
     return;
   }
-  pass.run(*module_);
+  // Run the same passes as during the usual compilation.
+  FPM->run(*function_);
+  MPM->run(*module_);
+  engine_->finalizeObject();
   formatted_stream.flush();
   llvm::errs() << codestr << "\n";
   llvm::errs()

@dibyendumajumdar
Copy link
Owner

Thanks. I don't think holding the pass managers in the function impl is a good idea though - I would rather refactor the pass manager calls into a common function both can call.

@romix
Copy link
Author

romix commented Apr 30, 2015

I don't think holding the pass managers in the function impl is a good idea though

Well, in LLVM usually you create them once and then reuse. But since you generate new pass manager per module/function, you don't do it yet. But may be you should. Pass managers are pretty heavy-weight and creating them every time is not such a good idea.

@dibyendumajumdar
Copy link
Owner

There is not much guidance in the LLVM docs and the examples I have seen so not do as you suggest. For example:

http://cs.swan.ac.uk/~csdavec/FOSDEM12/compiler.cc.html

Do you have a doc reference or example you could point me to - that would be very helpful. I do not yet understand how to hook up pass managers properly - just going by examples.

@romix
Copy link
Author

romix commented Apr 30, 2015

I don't have docs at hand. http://llvm.org/docs/WritingAnLLVMPass.html seems to imply that PassManagers are rather expensive to create. LLVM compiler uses only one global pass manager, AFAIK.

And, BTW, you are eventually using a wrong PassManager. You include <llvm/PassManager.h> which
falls back to <llvm/IR/LegacyPassManager.h>. But the newer APIs for PassManager are in <llvm/IR/PassManager.h>. The interesting thing about the new ones is that they are more re-usable. They do not take Module or Function in the constructor and thus you can reuse the same manager for multiple modules or multiple functions.

@dibyendumajumdar
Copy link
Owner

I have checked in an implementation.

@dibyendumajumdar
Copy link
Owner

Resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants