Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure loops optimize right at the LLVM level #26902

Closed
Gankra opened this issue Jul 8, 2015 · 8 comments
Closed

Ensure loops optimize right at the LLVM level #26902

Gankra opened this issue Jul 8, 2015 · 8 comments
Labels
A-codegen Area: Code generation E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@Gankra
Copy link
Contributor

Gankra commented Jul 8, 2015

This involves 4 steps:

  1. Gather up several different loop/iterator idioms that should optimize to e.g. memset/memcopy and run it through rustc -opt-level=0 --emit=llvm-ir
  2. Feed the IR from step 1 into opt -O2 -print-before-all and grab the IR from before Recognize loop idioms
  3. Write a LoopIdiomRecognize pass that handles the IR in step 2
  4. Write codegen tests for rustc that verify that the idioms are being optimized to the right IR

CC @pcwalton

@Gankra Gankra added A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. labels Jul 8, 2015
@Gankra
Copy link
Contributor Author

Gankra commented Jul 8, 2015

@pcwalton has said their willing to mentor anyone who wants to take a crack at this.

@pcwalton
Copy link
Contributor

Also check the benchmarks game benchmarks. I've heard things about LoopIdiomRecognize badness causing problems there.

@Gankra
Copy link
Contributor Author

Gankra commented Jul 22, 2015

A lot of the things we want to guarantee optimize right:

  • Vec::extend by a vec/slice or common iterator idioms
  • Vec::from_iterator by a vec/slice or common iterator idioms
  • ziping two iterators, particularly those of a vec/slice should ideally reduce to something good

This is an old gist I have of some test things when I was looking at solving this with TrustedLen (which is a bad idea IMO). Not all of these benches can clearly optimize well (e.g. they aren't a memcopy/memset, but rather creating (0..n) as a vec).

https://gist.github.com/Gankro/fd2543302dade5992e0e

@jnordwick
Copy link

I'll take a crack this weekend. I'll start with Ganko's benches and add anything from benchmark games that isn't already there.

@Gankra
Copy link
Contributor Author

Gankra commented Jul 30, 2015

I've made a repo for minimally experimenting with Vec, RawVec, and some basic benches: https://github.com/Gankro/vec-perf-test

Note that the vec! macro still uses std's Vec, but that's fine since it always optimizes anyway.

Note that not all of these benches are expected to optimize well; in particular skip-take and take-skip make [1, 2, 3, ..] which can't be turned into a memcpy or memset to my knowledge.

@jnordwick
Copy link

I'm sending my scripts/command lines I use to generate the LLVM IR to pcwalton this weekend along with the output of a run so he can make sure I'm on the right path.

One of the problems I encountered was the deep inlining required when rust monomorphized a static lambda. It make it very difficult to pull out a few things and took considerable time to hunt down all the generated functions.

If anybody knows of a way to llvm-extract that will also pull out functions that the target depends on, that would be really helpful. -func="target" will only give you exactly what you ask for. Running over the entire output just gives too much.

So, I've taken to putting a single function in main() and dumping that up then running opt with -print-before-all and looking at the before loop idiom recognition sections that are generated,

@jnordwick
Copy link

Last note on this for now. I sent pcwalton the LLVM IR opt output for one of the cases and he said that LLVM wasn't getting enough hints to do anything with the loops (e.g., it couldn't turn them into bounded loops and optimize from there). So we just need to push more hinting to LLVM, and that's the big problem. People can email me if they want to see the IR dumps and how I generated them.

@brson
Copy link
Contributor

brson commented Jul 19, 2016

This is quite a speculative wishlist task, not updated in nearly a year. Closing. Performance improvements always welcome.

@brson brson closed this as completed Jul 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

4 participants