Skip to content

Alt optimizations #1897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jruderman opened this issue Feb 24, 2012 · 5 comments
Closed

Alt optimizations #1897

jruderman opened this issue Feb 24, 2012 · 5 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@jruderman
Copy link
Contributor

Let's take advantage of enum types and alt-expressions to optimize code that uses alt. Some of these optimizations might require teaching LLVM about the restrictions on enums, or doing the work ourselves in trans_alt.

Algorithm selection

Based on the cases, pick a map structure:

  • For dense enums, use i = (x - smallest)
  • For sparse enums, use i = perfecthash(x)
  • For strings, look up i in a hash table
  • For int ranges, consider a binary search
  • Otherwise (e.g. pattern matching) fall back on testing each case (if/elif/else)

Based on the branches, pick an output type:

  • For constant values, use i as an index into an array of values
  • For similar-length blocks, pad the shorter blocks with nops, and jump to i * block_length
  • For differing-length blocks, use i as an index into an array of jump offsets

In this example, the cases are a dense enum and the branches are values:

y = alt x { red { 9 } green { 1 } blue { 4 } }

So it can use an array of values, with no branching (even for a bounds check):

const_table = [9, 1, 4]
y = const_table[x as uint]

This is something Mozilla programmers do in C++ all the time, manually, often messing it up.

Optimizations for if/elif/else compilation

Omit the last "else if" check when the alt is exhaustive, turning these into a simple "if":

alt x { one {} _ {} }
alt x { one {} two {} }

Reorder cases in order to put a difficult case at the end (where it can be omitted):

alt { one, three { 5 } two { 7 } }

Collapse adjacent cases into a single less-than check:

alt x { one, two, three { 7 } four, five, six { 9 } }

"Pass through" optimization

Compile as "pass through" if the arms all match the values. Useful when translating between two enum types.

alt x { one { 1 } two { 2 } }
alt x { one { one_ } two { two_ } }
@ghost ghost assigned catamorphism Feb 24, 2012
@catamorphism
Copy link
Contributor

I'd like to work on this, but if someone else gets there first, feel free to steal it :-)

@marijnh
Copy link
Contributor

marijnh commented Feb 24, 2012

I'm not very enthusiastic about most of these. I'll give my reasons point-by-point below:

  • For dense enums, use i = (x - smallest)

Dense enums tend to start at zero (or is it one?), so I don't think this'll help in a lot of cases.

  • For sparse enums, use i = perfecthash(x)
  • For strings, look up i in a hash table

Unless there's a ton of cases, the overhead of hashing will be more than the cost of just running down the list of cases.

  • Otherwise (e.g. pattern matching) fall back on testing each case (if/elif/else)

Nested pattern matches are reduced to a nested set of direct tag matches by a simple algorithm.

  • For constant values, use i as an index into an array of values

We should first check whether LLVM doesn't already manage this. I believe we use Phi nodes in a way that would make this possible (given enough cleverness on LLVM's side).

  • For similar-length blocks, pad the shorter blocks with nops, and jump to i * block_length

Would need an (invasive) LLVM patch to do this. Probably not worth it.

  • For differing-length blocks, use i as an index into an array of jump offsets

This is what an LLVM switch instruction will do, I believe. We're already using those.

@pcwalton
Copy link
Contributor

The only one I'm enthusiastic about is string comparisons, but I think the most efficient way to do that is to basically encode a trie. For example:

alt x {
    "foo" { ... }
    "bar" { ... }
    "baz" { ... }
}

Could become:

if (x[0] == 'b' && x[1] == 'a') {
    if (x[2] == 'r') { ... }
    else if (x[2] == 'z') { ... }
} else if (!strcmp(x, "foo")) {
    ...
} else {
    ...
}

But I think there's far more low-hanging optimization fruit...

@graydon
Copy link
Contributor

graydon commented Feb 29, 2012

Agreed, O(N) with small constant factors and small N does, in practice, often beat O(lg(N)) or O(1) with more substantial constant factors. Switches are usually small N, integer and string compare are the lowest-overhead operations known, and when N is large users often have their own cleverness in mind anyways.

Just stick with the simple thing for now. The backend authors have already got the table-switching logic done, and they employ it when there's reason to believe it's a win. If we start seeing real world code that has this sort of thing hot in real profiles, then investigate. I'll bet it never comes up.

@catamorphism
Copy link
Contributor

I'm closing this since there seems to be consensus that LLVM is better suited to do these optimizations than our middle-end is.

Feel free to open a new, more specific issue if necessary.

flip1995 pushed a commit to flip1995/rust that referenced this issue Apr 7, 2022
Add lints `drop_non_drop` and `forget_non_drop`

fixes rust-lang#1897

changelog: Add lints `drop_non_drop` and `forget_non_drop`
celinval added a commit to celinval/rust-dev that referenced this issue Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

5 participants