Skip to content

[proposal] fast path for general integer division for x64 #2439

@MaxGraey

Description

@MaxGraey

Feature

Latency and RCP for division depends on register sizes for most of x64 architectures. So Clang do one interesting trick which may speedup 64-bit division if high parts of operands equal to zero. Pseudocode:

idiv(a: i64, b: i64) -> i64 {
  if ((a | b) >> 32) return a / b; // full 64-bit division
  return i32(a) / i32(b) as i64; // 32-bit division
}

godbolt: https://godbolt.org/z/Tqqzs1

Is it make sense apply same optimization for cranelift only for x64 architecture?

Benefit

it may speedup div / rem over 2x for arguments without high parts with small constant overhead according to this table:
comparision

But it is worth excluding Zen1,2,3 architecture due to it uses a more modern scheme for division which doesn't dependent on register's width. Also it doesn't need for ARM.

Metadata

Metadata

Assignees

No one assigned

    Labels

    craneliftIssues related to the Cranelift code generatorcranelift:area:x64Issues related to x64 codegencranelift:goal:optimize-speedFocus area: the speed of the code produced by Cranelift.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions