Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read support for remaining LLVM IR language concepts #15

Closed
mewmew opened this issue Aug 24, 2016 · 13 comments
Closed

Read support for remaining LLVM IR language concepts #15

mewmew opened this issue Aug 24, 2016 · 13 comments
Milestone

Comments

@mewmew
Copy link
Member

mewmew commented Aug 24, 2016

The intention is to provide read support for LLVM IR assembly using a Gocc generated lexer and parser from a BNF grammar of the LLVM IR assembly language.

The BNF grammar is located at ast/internal/ll.bnf. The reason to keep the grammar in an internal directory, is because the lexer and parser packages generated by Gocc will be considered internal packages, and should not be used by end-users directly. Instead, high-level libraries will make use of these internal packages to parse LLVM IR assembly into the data structures of the llir/llvm/ir package.

Since LLVM IR makes use of unnamed local variables and basic blocks, a context is required to keep track of and map local IDs to their associated values. A bit unfortunate, but this essentially means we cannot use syntax directed translation to translate directly from LLVM IR assembly to the data structures of the ir package. Instead, we must introduce an intermediate step which keeps the necessary information around for us to create and make use of this contextual information. Said and done, the current approach is to define an ast package for LLVM IR assembly, which will later be traversed to create the aforementioned context and translate AST nodes into their corresponding ir data types.

To get a feel for what the production action expressions of Gocc looks like, see the follow example.

FuncDef
    : "define" OptFuncLinkage
      FuncHeader FuncBody                         << irx.NewFuncDef($2, $3) >>
;

Help wanted

If anyone manages to figure out a clean way for us to skip this step (i.e. not having to translate from BNF grammar to AST, then from AST to ir data types; but instead, translating directly from BNF grammar to ir data types), and go directly from the BNF grammar to the ir package data types using production action expressions, please let us know. This would facilitate the maintainability and future development of this package a lot!

@mewmew mewmew added this to the v0.2 milestone Aug 24, 2016
@mewmew
Copy link
Member Author

mewmew commented Aug 24, 2016

@quarnster Do you have any idea how we may get rid of the intermediate step of translating into and out of the AST representation? I really wish we could figure how a way, as this extra step will make the threshold higher for keeping in sync with updates to the LLVM IR assmebly language, as time goes by and LLVM develops upstream.

mewmew added a commit that referenced this issue Aug 24, 2016
The intention is to keep the AST package minimal for now,
and support only the LLVM IR assembly directives currently
defined by the llir/llvm/ir package.

Updates #15.
@mewmew
Copy link
Member Author

mewmew commented Sep 6, 2016

Tracking of read support for individual LLVM IR instructions.

Last update: 2016-09-20

This comment has been superseded by #15 (comment)

@sangisos
Copy link
Contributor

@mewmew Is there anything specific I can try to help with?

@mewmew
Copy link
Member Author

mewmew commented Sep 20, 2016

@mewmew Is there anything specific I can try to help with?

While not a contrived example, a94bf85 implemented support for fixing the dummy values of icmp, phi, select and call instructions. It may be used as a template for implementing support for the remaining instructions.

Ping me before starting any larger work, so we don't duplicate our efforts. The good thing is, we can easily split the work based on the categories of instructions. I've recently been working on the other operators category.

@chrisbdaemon
Copy link

Is this particular topic still under active development? Is there any kind of an ETA on when it will be useable?

@mewmew
Copy link
Member Author

mewmew commented Nov 29, 2016

Is this particular topic still under active development? Is there any kind of an ETA on when it will be useable?

Hi @chrisbdaemon,

It is very much indeed.

Current status

Status as of 2016-12-10

All instructions covered by the LLVM IR subset defined in [https://godoc.org/github.com/llir/llvm/ir] are now supported by the parser.

The parser branch has now been merged with master.

Types

  • Function types
  • Named types
    • Identified struct types
    • Type aliases

Instructions

Binary instructions

  • Add
  • FAdd
  • Sub
  • FSub
  • Mul
  • FMul
  • UDiv
  • SDiv
  • FDiv
  • URem
  • SRem
  • FRem

Bitwise instructions

  • Shl
  • LShr
  • AShr
  • And
  • Or
  • Xor

Vector instructions

  • ExtractElement
  • InsertElement
  • ShuffleVector

Aggregate instructions

  • ExtractValue
  • InsertValue

Memory instructions

  • Alloca
  • Load
  • Store
  • Fence
  • CmpXchg
  • AtomicRMW
  • GetElementPtr

Conversion instructions

  • Trunc
  • ZExt
  • SExt
  • FPTrunc
  • FPExt
  • FPToUI
  • FPToSI
  • UIToFP
  • SIToFP
  • PtrToInt
  • IntToPtr
  • BitCast
  • AddrSpaceCast

Other instructions

  • ICmp
  • FCmp
  • Phi
  • Select
  • Call

Terminators

  • Ret
  • Br
  • CondBr
  • Switch
  • IndirectBr
  • Unreachable

Constants

  • Int
  • Float
  • Pointer
  • Vector
  • Array
  • String
  • Struct
  • ZeroInitializer
  • Metadata nodes

Constant expressions

Binary expressions

  • Add
  • FAdd
  • Sub
  • FSub
  • Mul
  • FMul
  • UDiv
  • SDiv
  • FDiv
  • URem
  • SRem
  • FRem

Bitwise expressions

  • Shl
  • LShr
  • AShr
  • And
  • Or
  • Xor

Vector expressions

  • ExtractElement
  • InsertElement
  • ShuffleVector

Aggregate expressions

  • ExtractValue
  • InsertValue

Memory expressions

  • GetElementPtr

Conversion expressions

  • Trunc
  • ZExt
  • SExt
  • FPTrunc
  • FPExt
  • FPToUI
  • FPToSI
  • UIToFP
  • SIToFP
  • PtrToInt
  • IntToPtr
  • BitCast
  • AddrSpaceCast

Other expressions

  • ICmp
  • FCmp
  • Select

@mewmew
Copy link
Member Author

mewmew commented Dec 1, 2016

Before the v0.2 release, we should try to implement support for the subset of LLVM IR produced by clang for regular C programs. In particular, anything that we currently strip before parsing, should be implemented natively by the BNF specification.

This includes

  • source_filename specifiers
  • target specifiers
  • function attributes
  • function attribute IDs (i.e. #[0-9]+)
  • inbounds
  • nsw, nuw, signext
  • linkage (e.g. common)

Help fill out this list as you encounter more. Our goal is to remove the need for strip.sh.

Note, the intention is to provide support for parsing source files including these directives, not to define the data types holding information about these directives, for the v0.2 release of the LLVM IR library.

Future releases of the LLVM IR library will track the development of a sane data representation for these directives. For now, we simply want to be able to parse complex LLVM IR files, but only want to retain the most relevant information.

mewmew added a commit that referenced this issue Dec 1, 2016
@mewmew
Copy link
Member Author

mewmew commented Dec 5, 2016

Add support for unnamed function parameters. Example:

  • unnamed function parameters
; Function Attrs: nounwind uwtable
define i32 @add(i32, i32) #0 {
  %3 = alloca i32, align 4
  %4 = alloca i32, align 4
  store i32 %0, i32* %3, align 4
  store i32 %1, i32* %4, align 4
  %5 = load i32, i32* %3, align 4
  %6 = load i32, i32* %4, align 4
  %7 = add nsw i32 %5, %6
  ret i32 %7
}

mewmew added a commit that referenced this issue Mar 29, 2017
mewmew added a commit that referenced this issue Mar 30, 2017
mewmew added a commit that referenced this issue Mar 30, 2017
Now capable of parsing c4.ll.

Now capable of parsing sqlite3.ll, except for fence instructions.

Updates #15.
@mewmew
Copy link
Member Author

mewmew commented Mar 30, 2017

The LLVM IR library read support is moving closer to being usable for real world LLVM IR assembly files.

For now, we've been playing with adding read support for c4.ll and sqlite3.ll, produced by Clang from c4.c and sqlite3.c, respectively.

The Gocc generated parser is now very close at handling these files.

For c4.ll when optimized with opt --mem2reg, support for undef values are not yet added. Comments on how to do so would be appreciated. There have been a proposal to potentially remove undef from LLVM IR. I still haven't distilled it, so have no gut feeling if I agree with the proposed solution, but still wish to follow its progress; http://llvm.org/devmtg/2016-11/#talk13

For sqlite3.ll a single instruction remains to be implemented, namely fence.

It is exciting times!!

This work is tracked in tandem with the development of an LLVM IR to Go decompiler, which uses these files as a first semi-real world test case : )

@mewmew
Copy link
Member Author

mewmew commented Apr 1, 2017

Left to do as of 2017-05-06:

Intended for version 0.3

mewmew added a commit that referenced this issue Apr 2, 2017
mewmew added a commit that referenced this issue Apr 4, 2017
mewmew added a commit that referenced this issue Apr 4, 2017
mewmew referenced this issue in cznic/ir May 6, 2017
* Add Switch operation.

* Finish switch branch.

Squashed commit of the following:

commit 49d5280dab8a4dbd13419360bc0bf992da444b4e
Author: Jan Mercl <0xjnml@gmail.com>
Date:   Sat Apr 29 16:07:46 2017 +0200

    WIPS

commit 127b35dd82c0980c591848280619ef6f6de5dd3f
Author: Jan Mercl <0xjnml@gmail.com>
Date:   Sat Apr 29 15:31:35 2017 +0200

    WIPS

commit 0205c8b377ff506fce92e1c128ede521b3d63d8e
Author: Jan Mercl <0xjnml@gmail.com>
Date:   Sat Apr 29 15:19:27 2017 +0200

    WIPS

commit 3acb0a215a1c61adc715d8b275f264756891cb83
Author: Jan Mercl <0xjnml@gmail.com>
Date:   Sat Apr 29 15:14:38 2017 +0200

    WIPS

commit 98f520c9c30ad150affecda390e52e13c906e344
Author: Jan Mercl <0xjnml@gmail.com>
Date:   Sat Apr 29 03:04:06 2017 +0200

    WIPS

commit 09a19e92dbc2142a94c3dd5a5ea491d410990260
Author: Jan Mercl <0xjnml@gmail.com>
Date:   Sat Apr 29 02:37:28 2017 +0200

    WIPS

commit 7ed7ce62f9f0381673b75e766ac4e96c109e6029
Author: Jan Mercl <0xjnml@gmail.com>
Date:   Sat Apr 29 00:20:55 2017 +0200

    Try to fix "..\cc\testdata\tcc-0.9.26\tests\tests2\40_stdio.c:5:14: undefined fopen" on Windows.

commit 401ecab6670c777da18610f170a0621d4638a6b7
Author: Jan Mercl <0xjnml@gmail.com>
Date:   Fri Apr 28 22:27:25 2017 +0200

    WIPS

commit f409c2dadc61ffef89d13f2c81ea6bdb17b6335c
Author: Jan Mercl <0xjnml@gmail.com>
Date:   Fri Apr 28 22:05:38 2017 +0200

    WIPS
mewmew added a commit to llir/grammar that referenced this issue Jun 14, 2017
mewmew added a commit that referenced this issue Jun 14, 2017
@mewmew
Copy link
Member Author

mewmew commented Jun 16, 2017

@chrisbdaemon

Is this particular topic still under active development? Is there any kind of an ETA on when it will be useable?

We are currently preparing for the v0.2 release of llir/llvm. Within the next few days we will check-in the source code of the Gocc generated parser into the source tree, thus making the project go-getable.

#15 (comment) tracks the last few pieces of the LLVM IR assembly language which are currently not part of the BNF grammar. A few of those language concepts have been intentionally postponed to a future release, as their occurrence in LLVM IR source files are not too common.

Cheers /u & i

@mewmew
Copy link
Member Author

mewmew commented Jun 24, 2017

@chrisbdaemon The source code of the Gocc generated parser has now been checked into the source tree, thus making the llir/llvm packages go-getable.

The remaining LLVM IR language concepts are tracked for the version 0.3 release (#15 (comment)).

@mewmew mewmew modified the milestones: v0.3, v0.2 Jun 24, 2017
@mewmew mewmew changed the title Read support of LLVM IR assembly Read support for remaining LLVM IR language concepts Jun 24, 2017
mewmew added a commit that referenced this issue Feb 10, 2018
@mewmew
Copy link
Member Author

mewmew commented Nov 13, 2018

Read and write support for all LLVM IR construct as of LLVM 7.0 have now been implemented. Therefore, we may now close this issue. If you find any constructs that are not supported, feel free to comment here or open a new issue.

@mewmew mewmew closed this as completed Nov 13, 2018
mewmew added a commit that referenced this issue Nov 30, 2018
The intention is to keep the AST package minimal for now,
and support only the LLVM IR assembly directives currently
defined by the llir/llvm/ir package.

Updates #15.


Former-commit-id: b217ce8
mewmew added a commit that referenced this issue Nov 30, 2018
mewmew added a commit that referenced this issue Nov 30, 2018
mewmew added a commit that referenced this issue Nov 30, 2018
mewmew added a commit that referenced this issue Nov 30, 2018
mewmew added a commit that referenced this issue Nov 30, 2018
Now capable of parsing c4.ll.

Now capable of parsing sqlite3.ll, except for fence instructions.

Updates #15.


Former-commit-id: b6f515f
mewmew added a commit that referenced this issue Nov 30, 2018
mewmew added a commit that referenced this issue Nov 30, 2018
mewmew added a commit that referenced this issue Nov 30, 2018
mewmew added a commit that referenced this issue Nov 30, 2018
mewmew added a commit that referenced this issue Nov 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants