Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

impl Ord for bool: optimization #66780

Closed
carado opened this issue Nov 26, 2019 · 3 comments · Fixed by #66881
Closed

impl Ord for bool: optimization #66780

carado opened this issue Nov 26, 2019 · 3 comments · Fixed by #66881
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@carado
Copy link

carado commented Nov 26, 2019

Currently, impl Ord for bool converts the bools to u8s and then compares them.

On the playground,

pub fn using_ord(x: bool, y: bool) -> Ordering {
    x.cmp(&y)
}

compiles to

playground::using_ord: # @playground::using_ord
# %bb.0:
	movl	%edi, %ecx
	xorl	%esi, %ecx
	testl	%esi, %esi
	movl	$255, %eax
	cmovel	%ecx, %eax
	testl	%edi, %edi
	cmovnel	%ecx, %eax
                                        # kill: def $al killed $al killed $eax
	retq
                                        # -- End function

A much shorter (in instruction count) and almost certainly faster version taking advantage of the fact that orderings are represented by i8 values -1, 0, and 1, is:

pub fn minus_transmute(x: bool, y: bool) -> Ordering {
    unsafe { std::mem::transmute(x as i8 - y as i8) }
}

which on the playground compiles to

playground::minus_transmute: # @playground::minus_transmute
# %bb.0:
	movl	%edi, %eax
	subb	%sil, %al
                                        # kill: def $al killed $al killed $eax
	retq
                                        # -- End function
@jonas-schievink jonas-schievink added I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Nov 26, 2019
@alex
Copy link
Member

alex commented Nov 27, 2019

FWIW, you get identical codegen with a match statement (+ an unreachable_unchecked), which I find slightly more readable than the transmute:

pub fn minus_match(x: bool, y: bool) -> Ordering {
    match x as i8 - y as i8 {
        -1 => Ordering::Less,
        0 => Ordering::Equal,
        1 => Ordering::Greater,
        _ => unsafe { std::hint::unreachable_unchecked() }
    }
}

@carado
Copy link
Author

carado commented Nov 27, 2019

Fair enough; that does seem like it relies on less assumptions about "the outside", too.

@krishna-veerareddy
Copy link
Contributor

I would like to work on this issue unless somebody else already started working on it.

@bors bors closed this as completed in 830b4ee Dec 11, 2019
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue Aug 15, 2023
…r=cuviper

Optimizing the rest of bool's Ord implementation

After coming across issue rust-lang#66780, I realized that the other functions provided by Ord (`min`, `max`, and `clamp`) were similarly inefficient for bool. This change provides implementations for them in terms of boolean operators, resulting in much simpler assembly and faster code.
Fixes issue rust-lang#114653

[Comparison on Godbolt](https://rust.godbolt.org/z/5nb5P8e8j)

`max` assembly before:
```assembly
example::max:
        mov     eax, edi
        mov     ecx, eax
        neg     cl
        mov     edx, esi
        not     dl
        cmp     dl, cl
        cmove   eax, esi
        ret
```
`max` assembly after:
```assembly
example::max:
        mov     eax, edi
        or      eax, esi
        ret
```
`clamp` assembly before:
```assembly
example::clamp:
        mov     eax, esi
        sub     al, dl
        inc     al
        cmp     al, 2
        jae     .LBB1_1
        mov     eax, edi
        sub     al, sil
        movzx   ecx, dil
        sub     dil, dl
        cmp     dil, 1
        movzx   edx, dl
        cmovne  edx, ecx
        cmp     al, -1
        movzx   eax, sil
        cmovne  eax, edx
        ret
.LBB1_1:
        ; identical assert! code
```
`clamp` assembly after:
```assembly
example::clamp:
        test    edx, edx
        jne     .LBB1_2
        test    sil, sil
        jne     .LBB1_3
.LBB1_2:
        or      dil, sil
        and     dil, dl
        mov     eax, edi
        ret
.LBB1_3:
        ; identical assert! code
```
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 16, 2023
…r=cuviper

Optimizing the rest of bool's Ord implementation

After coming across issue rust-lang#66780, I realized that the other functions provided by Ord (`min`, `max`, and `clamp`) were similarly inefficient for bool. This change provides implementations for them in terms of boolean operators, resulting in much simpler assembly and faster code.
Fixes issue rust-lang#114653

[Comparison on Godbolt](https://rust.godbolt.org/z/5nb5P8e8j)

`max` assembly before:
```assembly
example::max:
        mov     eax, edi
        mov     ecx, eax
        neg     cl
        mov     edx, esi
        not     dl
        cmp     dl, cl
        cmove   eax, esi
        ret
```
`max` assembly after:
```assembly
example::max:
        mov     eax, edi
        or      eax, esi
        ret
```
`clamp` assembly before:
```assembly
example::clamp:
        mov     eax, esi
        sub     al, dl
        inc     al
        cmp     al, 2
        jae     .LBB1_1
        mov     eax, edi
        sub     al, sil
        movzx   ecx, dil
        sub     dil, dl
        cmp     dil, 1
        movzx   edx, dl
        cmovne  edx, ecx
        cmp     al, -1
        movzx   eax, sil
        cmovne  eax, edx
        ret
.LBB1_1:
        ; identical assert! code
```
`clamp` assembly after:
```assembly
example::clamp:
        test    edx, edx
        jne     .LBB1_2
        test    sil, sil
        jne     .LBB1_3
.LBB1_2:
        or      dil, sil
        and     dil, dl
        mov     eax, edi
        ret
.LBB1_3:
        ; identical assert! code
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants