Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create JSON or YAML Files with Loop Features/Configurations #557

Merged

Conversation

mostafaelhoushi
Copy link
Contributor

Added an option to the opt_loops tool to dump a YAML loop features/configuration file. The objectives of this YAML file are:

  • log features of each loop, to provide CompilerGym with per-loop observation
  • [in the future] could act as a configuration file to control optimizations of each loop

Here is an example of how the YAML file looked like for add.c benchmark:

---
ID:              '!2'
Function:        add
Module:          '/tmp/add.ll'
MetadataID:      4
Name:            '<unnamed loop>'
Depth:           1
HeaderName:      loop_0
llvm.loop.unroll.enable: true
llvm.loop.unroll.disable: false
llvm.loop.unroll.count: 32
llvm.loop.isunrolled: false
llvm.loop.vectorize.enable: false
llvm.loop.vectorize.disable: false
llvm.loop.isvectorized: false
llvm:            '  Loop at depth 1 containing: 
<header><exiting>
loop_0:                                           ; preds = %19, %1
  %4 = load i32, i32* %3, align 4
  %5 = icmp slt i32 %4, 1000000
  br i1 %5, label %6, label %22


6:                                                ; preds = %loop_0
  %7 = load i32, i32* %3, align 4
  %8 = sext i32 %7 to i64
  %9 = getelementptr inbounds [1000000 x i32], [1000000 x i32]* @A, i64 0, i64 %8
  %10 = load i32, i32* %9, align 4
  %11 = load i32, i32* %3, align 4
  %12 = sext i32 %11 to i64
  %13 = getelementptr inbounds [1000000 x i32], [1000000 x i32]* @B, i64 0, i64 %12
  %14 = load i32, i32* %13, align 4
  %15 = add nsw i32 %10, %14
  %16 = load i32, i32* %3, align 4
  %17 = sext i32 %16 to i64
  %18 = getelementptr inbounds [1000000 x i32], [1000000 x i32]* @A, i64 0, i64 %17
  store i32 %15, i32* %18, align 4
  br label %19

<latch>
19:                                               ; preds = %6
  %20 = load i32, i32* %3, align 4
  %21 = add nsw i32 %20, 1
  store i32 %21, i32* %3, align 4
  br label %loop_0, !llvm.loop !2
'
...
---
ID:              '!5'
Function:        main
Module:          '/tmp/add.ll'
MetadataID:      4
Name:            '<unnamed loop>'
Depth:           1
HeaderName:      loop_1
llvm.loop.unroll.enable: true
llvm.loop.unroll.disable: false
llvm.loop.unroll.count: 32
llvm.loop.isunrolled: false
llvm.loop.vectorize.enable: false
llvm.loop.vectorize.disable: false
llvm.loop.isvectorized: false
llvm:            '  Loop at depth 1 containing: 
<header><exiting>
loop_1:                                           ; preds = %18, %2
  %15 = load i32, i32* %9, align 4
  %16 = icmp slt i32 %15, 100
  br i1 %16, label %17, label %21


17:                                               ; preds = %loop_1
  call void @add(i32* %6)
  br label %18

<latch>
18:                                               ; preds = %17
  %19 = load i32, i32* %9, align 4
  %20 = add nsw i32 %19, 1
  store i32 %20, i32* %9, align 4
  br label %loop_1, !llvm.loop !2
'
...

Some issues that we need to fix/improve in the YAML file:

  • need to find a unique ID or name for a loop that persists across multiple passes on the LLVM IR file: currently, when I run optimizations like loop unroll or loop vectorize, fields like name, id, etc. change
  • create a single list of YAML entries, rather than a separate YAML list for each loop
  • add an option to hide the loops of the main() function (or to be generic, hide/show loops of specific functions)

For next steps, I plan to:

  • focus on extracting features like AutoPhase, ir2vec, and ProGraml from both the module IR as well as loop IR

@mostafaelhoushi mostafaelhoushi added the Enhancement New feature or request label Jan 31, 2022
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 31, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jan 31, 2022

Codecov Report

Merging #557 (1476ab0) into development (9d7977c) will increase coverage by 0.81%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           development     #557      +/-   ##
===============================================
+ Coverage        87.42%   88.24%   +0.81%     
===============================================
  Files              113      114       +1     
  Lines             6411     6693     +282     
===============================================
+ Hits              5605     5906     +301     
+ Misses             806      787      -19     
Impacted Files Coverage Δ
compiler_gym/datasets/uri.py 93.18% <0.00%> (-6.82%) ⬇️
compiler_gym/envs/llvm/datasets/llvm_stress.py 91.66% <0.00%> (-5.00%) ⬇️
compiler_gym/spaces/sequence.py 92.30% <0.00%> (-4.76%) ⬇️
compiler_gym/service/proto/py_converters.py 96.63% <0.00%> (-3.37%) ⬇️
compiler_gym/views/observation_space_spec.py 89.28% <0.00%> (-2.30%) ⬇️
compiler_gym/envs/llvm/datasets/clgen.py 89.55% <0.00%> (-2.26%) ⬇️
compiler_gym/envs/llvm/datasets/csmith.py 86.66% <0.00%> (-1.74%) ⬇️
compiler_gym/datasets/datasets.py 92.92% <0.00%> (-0.75%) ⬇️
compiler_gym/datasets/dataset.py 85.61% <0.00%> (-0.11%) ⬇️
compiler_gym/bin/manual_env.py 80.73% <0.00%> (-0.06%) ⬇️
... and 30 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d7977c...1476ab0. Read the comment docs.

@mostafaelhoushi
Copy link
Contributor Author

I refactored the code and added another option to dump to JSON. So now, -emit-json logs to a JSON file while -emit-json logs to a YAML file.
I can add an option to log to protobuf.

I also added IRCanonicalizer as a third-party directory, and made it optional to run its pases in our opt_loops tool. However, it seems from the first try that it didn't solve the problem. But perhaps I can make a closer look.

Cool!

  • create a single list of YAML entries, rather than a separate YAML list for each loop

This should probably be fixed prior to merging, as I think this means the current generated YAML is malformed. Do you care about YAML? If not, you could use JSON. There is a nice library for working with JSON in C++. Here is an example of it being used in CompilerGym.

Or is the idea that this output will be parsed by a python environment to extract the features? If so, you may want to use protobufs, as they provide significantly faster deserialization than JSON/YAML, while still providing a text format for human readable debugging.

  • need to find a unique ID or name for a loop that persists across multiple passes on the LLVM IR file: currently, when I run optimizations like loop unroll or loop vectorize, fields like name, id, etc. change

Could LLVM-Canon help here?

Copy link
Contributor

@ChrisCummins ChrisCummins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mostafaelhoushi, thanks for taking a look at my suggestions and sorry for the extra hassle!

I also added IRCanonicalizer as a third-party directory, and made it optional to run its pases in our opt_loops tool. However, it seems from the first try that it didn't solve the problem. But perhaps I can make a closer look.
Ah, that's a pity. It will be useful to have the canonicalizer in our codebase, I think it would make a useful addition to the python APIs.

The README needs a few extra details (see inline comment). Aside from that, and some minor nitpicks, LGTM! Please merge when you're happy.

Cheers,
Chris

compiler_gym/third_party/LLVM-Canon/BUILD Outdated Show resolved Hide resolved
compiler_gym/third_party/LLVM-Canon/BUILD Outdated Show resolved Hide resolved
compiler_gym/third_party/LLVM-Canon/README.md Outdated Show resolved Hide resolved
examples/loop_optimizations_service/opt_loops/opt_loops.cc Outdated Show resolved Hide resolved
examples/loop_optimizations_service/opt_loops/opt_loops.cc Outdated Show resolved Hide resolved
mostafaelhoushi and others added 7 commits February 7, 2022 22:28
Co-authored-by: Chris Cummins <chrisc.101@gmail.com>
Co-authored-by: Chris Cummins <chrisc.101@gmail.com>
Co-authored-by: Chris Cummins <chrisc.101@gmail.com>
Co-authored-by: Chris Cummins <chrisc.101@gmail.com>
@mostafaelhoushi
Copy link
Contributor Author

Thanks Chris. There was no hassle at all :)
I have also made sure that it builds with CMake on Linux

@mostafaelhoushi mostafaelhoushi changed the title Create YAML File with Loop Features/Configurations Create JSON or YAML Files with Loop Features/Configurations Feb 9, 2022
@mostafaelhoushi mostafaelhoushi merged commit 11784d4 into facebookresearch:development Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants