Offer attributes for controlling loop optimizations #2219

hanna-kruppe · 2017-11-15T22:08:39Z

For example, Clang has #pragma loop which allow the programmer to guide loop unrolling, vectorization, and other optimizations. This is useful in high performance code because heuristics are often fallible and nudging the optimizer in the right direction can sometimes squeeze out some more performance (for a particular optimizer version, of course).

In Rust, the natural replacement for a pragma would probably be an attribute.

The text was updated successfully, but these errors were encountered:

hanna-kruppe · 2017-11-15T22:17:00Z

Previous discussion: https://internals.rust-lang.org/t/loop-unrolling-on-request/3091 (pointed out in #rust-internals by lqd)

hanna-kruppe · 2017-11-15T22:23:43Z

In that discussion, the possibility of a (likely procedural) macro for unrolling specifically was brought up. That would be a rather blunt tool, as it wouldn't integrate with the optimizer. It would also not cover use cases for limiting in the optimizer (i.e., preventing unrolling that would normally occur).

It also doesn't address knobs related to other optimization than loop unrolling.

leonardo-m · 2017-11-16T10:27:34Z

My proposal was meant to integrate with the optimizer. In that proposal #[unroll(never)] is for limiting unrolling, that equals to:
#pragma clang loop unroll(disable)

#[unroll]
That's equal to:
#pragma clang loop unroll(enable)

#[unroll(8)]
That's equal to:
#pragma clang loop unroll_count(8)

#[unroll(try_full)]
That's equal to:
#pragma clang loop unroll(full)

Regarding the knobs for other optimizations, my proposal doesn't prevent them. If you want later you can add other attributes:

#[vectorize_width(2)]
Similar to:
#pragma clang loop vectorize_width(2)

And:
#[interleave_count(2)]
Similar to:
#pragma clang loop interleave_count(2)

BatmanAoD · 2018-05-28T19:17:29Z

The llvm.loop metadata interface itself actually seems like a pretty well-named set of traits. In particular, I like the idea of the names having a sort of namespacing using ., so that the loop-related traits all start llvm.loop. Exposing these directly would introduce an undesirable connection between the frontend and the backend (making alternate backends less viable), but why not simply base the frontend attributes on the LLVM names for now, using namespacing in a similar fashion?

I.e, something like,

#[optimization_hint.loop.<LLVM metadata trait>]

So, for example:

#[optimization_hint.loop.interleave(4)]

BatmanAoD · 2018-05-29T16:22:56Z

I believe this is potentially something for @rust-lang/wg-codegen to weigh in on?

eddyb · 2018-05-29T17:25:56Z

(small nit) In Rust we'd probably use #[optimization_hint(loop(interleave(4)))] instead.
Or #[optimization_hint::loop::interleave(4)], but it'd be a first - currently we don't use the path syntax for builtin attributes, so we ended up with #[repr(align(4))], instead of #[repr::align(4)].

BatmanAoD · 2018-05-29T20:09:28Z

Personally, I think #[foo::bar::baz::<etc>(arg)] would be much better than #[foo(bar(baz<etc>(arg)))<many parens>)))]. After all, we may have quite a few functional features, but this isn't Lisp! Two layers of paren-nesting, as in repr(align(4)), seems fine, but three is pushing it, in my opinion.

The unroll crate only supports unrolling literal constant loops, like: ``` for 0..10 { // body } ``` I spent some time looking into adding constant-generic unroll support, since the bounds will still be constant at compile time, but proc macros operate over source tokens (i.e. before monomorphization when the const generic would be a known value). There was some light discussion about offering compiler-level tools for better control of more kinds of loop unrolling in: rust-lang/rfcs#2219 But, that seems to have stalled out for now. The suggestion from [1] to use a proc macro to unroll the const generic loop into a series of const branches that can be pruned after monomorphization seems promising, but as no such macro yet exists I did it manually. [1]: 0xPolygonZero/plonky#80 (comment)

paulabrudanandrei · 2024-06-24T09:00:26Z

I would really love to see this implemented

marshallpierce mentioned this issue Nov 15, 2017

Making a function generic (but not using the parameter at all) causes ~12% slowdown rust-lang/rust#46019

Open

Centril added T-compiler Relevant to the compiler team, which will review and decide on the RFC. T-lang Relevant to the language team, which will review and decide on the RFC. labels Dec 6, 2017

durka mentioned this issue Mar 23, 2018

The binary size - performance tradeoff rust-embedded/wg#69

Closed

japaric mentioned this issue Aug 10, 2018

The binary size - performance tradeoff rust-embedded/book#11

Closed

dlubarov mentioned this issue Sep 23, 2020

Issue 68: Refactor existing modular arithmetic code 0xPolygonZero/plonky#80

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offer attributes for controlling loop optimizations #2219

Offer attributes for controlling loop optimizations #2219

hanna-kruppe commented Nov 15, 2017

hanna-kruppe commented Nov 15, 2017 •

edited

Loading

hanna-kruppe commented Nov 15, 2017

leonardo-m commented Nov 16, 2017 •

edited

Loading

BatmanAoD commented May 28, 2018

BatmanAoD commented May 29, 2018

eddyb commented May 29, 2018

BatmanAoD commented May 29, 2018

paulabrudanandrei commented Jun 24, 2024

Offer attributes for controlling loop optimizations #2219

Offer attributes for controlling loop optimizations #2219

Comments

hanna-kruppe commented Nov 15, 2017

hanna-kruppe commented Nov 15, 2017 • edited Loading

hanna-kruppe commented Nov 15, 2017

leonardo-m commented Nov 16, 2017 • edited Loading

BatmanAoD commented May 28, 2018

BatmanAoD commented May 29, 2018

eddyb commented May 29, 2018

BatmanAoD commented May 29, 2018

paulabrudanandrei commented Jun 24, 2024

hanna-kruppe commented Nov 15, 2017 •

edited

Loading

leonardo-m commented Nov 16, 2017 •

edited

Loading