[proposal] fast path for general integer division for x64

#### Feature

Latency and RCP for division depends on register sizes for most of x64 architectures. So Clang do one interesting trick which may speedup 64-bit division if high parts of operands equal to zero. Pseudocode:
```cpp
idiv(a: i64, b: i64) -> i64 {
  if ((a | b) >> 32) return a / b; // full 64-bit division
  return i32(a) / i32(b) as i64; // 32-bit division
}
```

godbolt: https://godbolt.org/z/Tqqzs1

Is it make sense apply same optimization for cranelift only for x64 architecture?

#### Benefit

it may speedup div / rem over 2x for arguments without high parts with small constant overhead according to this table:
<img width="525" alt="comparision" src="https://user-images.githubusercontent.com/1301959/99916476-0cf5dc80-2d13-11eb-9ffe-7dcfd04eab8e.png">

But it is worth excluding **Zen1,2,3** architecture due to it uses a more modern scheme for division which doesn't dependent on register's width. Also it doesn't need for ARM.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[proposal] fast path for general integer division for x64 #2439

Feature

Benefit

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[proposal] fast path for general integer division for x64 #2439

Description

Feature

Benefit

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions