Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better implementation of signed div_floor/ceil #130002

Merged
merged 1 commit into from
Sep 8, 2024

Conversation

orlp
Copy link
Contributor

@orlp orlp commented Sep 5, 2024

Tracking issue for signed div_floor/div_ceil: #88581.

This PR improves the implementation of those two functions by adding a better branchless algorithm. Side-by-side comparison of i32::div_floor on x86-64:

div_floor_new:                               div_floor_old:                                 
        push    rax                                  push    rax                            
        test    esi, esi                             test    esi, esi                       
        je      .LBB0_3                              je      .LBB1_6                        
        mov     eax, esi                             mov     eax, esi                       
        not     eax                                  not     eax                            
        lea     ecx, [rdi - 2147483648]              lea     ecx, [rdi - 2147483648]        
        or      ecx, eax                             or      ecx, eax                       
        je      .LBB0_2                              je      .LBB1_7                        
        mov     eax, edi                             mov     eax, edi                       
        cdq                                          cdq                                    
        idiv    esi                                  idiv    esi                            
        xor     esi, edi                             test    edx, edx                       
        sar     esi, 31                              setg    cl                             
        test    edx, edx                             test    esi, esi                       
        cmove   esi, edx                             sets    dil                            
        add     eax, esi                             test    dil, cl                        
        pop     rcx                                  jne     .LBB1_4                        
        ret                                          test    edx, edx                       
.LBB0_3:                                             setns   cl                             
        lea     rdi, [rip + .L__unnamed_1]           test    esi, esi                       
        call    qword ptr [rip + panic...]          setle   dl                             
.LBB0_2:                                             or      dl, cl                         
        lea     rdi, [rip + .L__unnamed_1]           jne     .LBB1_5                        
        call    qword ptr [rip + panic...]   .LBB1_4:                                       
                                                     dec     eax                            
                                             .LBB1_5:                                       
                                                     pop     rcx                            
                                                     ret                                    
                                             .LBB1_6:                                       
                                                     lea     rdi, [rip + .L__unnamed_2]     
                                                     call    qword ptr [rip + panic...]
                                             .LBB1_7:                                       
                                                     lea     rdi, [rip + .L__unnamed_2]     
                                                     call    qword ptr [rip + panic...]

And on Aarch64:

_div_floor_new:                                   _div_floor_old:                               
        stp     x29, x30, [sp, #-16]!                     stp     x29, x30, [sp, #-16]!         
        mov     x29, sp                                   mov     x29, sp                       
        cbz     w1, LBB0_4                                cbz     w1, LBB1_9                    
        mov     w8, #-2147483648                          mov     x8, x0                        
        cmp     w0, w8                                    mov     w9, #-2147483648              
        b.ne    LBB0_3                                    cmp     w0, w9                        
        cmn     w1, #1                                    b.ne    LBB1_3                        
        b.eq    LBB0_5                                    cmn     w1, #1                        
LBB0_3:                                                   b.eq    LBB1_10                       
        sdiv    w8, w0, w1                        LBB1_3:                                       
        msub    w9, w8, w1, w0                            sdiv    w0, w8, w1                    
        eor     w10, w1, w0                               msub    w8, w0, w1, w8                
        asr     w10, w10, #31                             tbz     w1, #31, LBB1_5               
        cmp     w9, #0                                    cmp     w8, #0                        
        csel    w9, wzr, w10, eq                          b.gt    LBB1_7                        
        add     w0, w9, w8                        LBB1_5:                                       
        ldp     x29, x30, [sp], #16                       cmp     w1, #1                        
        ret                                               b.lt    LBB1_8                        
LBB0_4:                                                   tbz     w8, #31, LBB1_8               
        adrp    x0, l___unnamed_1@PAGE            LBB1_7:                                       
        add     x0, x0, l___unnamed_1@PAGEOFF             sub     w0, w0, #1                    
        bl      panic...                          LBB1_8:                                       
LBB0_5:                                                   ldp     x29, x30, [sp], #16           
        adrp    x0, l___unnamed_1@PAGE                    ret                                   
        add     x0, x0, l___unnamed_1@PAGEOFF     LBB1_9:                                       
        bl      panic...                                  adrp    x0, l___unnamed_2@PAGE        
                                                          add     x0, x0, l___unnamed_2@PAGEOFF 
                                                          bl      panic...                      
                                                  LBB1_10:                                      
                                                          adrp    x0, l___unnamed_2@PAGE        
                                                          add     x0, x0, l___unnamed_2@PAGEOFF 
                                                          bl      panic...                      

@rustbot
Copy link
Collaborator

rustbot commented Sep 5, 2024

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 5, 2024
@orlp
Copy link
Contributor Author

orlp commented Sep 5, 2024

r? @thomcc

@rustbot rustbot assigned thomcc and unassigned Mark-Simulacrum Sep 5, 2024
@thomcc
Copy link
Member

thomcc commented Sep 7, 2024

Very nice. I doubt the compiler uses this at all but lets avoid rollup just to be safe.

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Sep 7, 2024

📌 Commit 6b4ff51 has been approved by thomcc

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 7, 2024
@bors
Copy link
Contributor

bors commented Sep 8, 2024

⌛ Testing commit 6b4ff51 with merge adf8d16...

@bors
Copy link
Contributor

bors commented Sep 8, 2024

☀️ Test successful - checks-actions
Approved by: thomcc
Pushing adf8d16 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 8, 2024
@bors bors merged commit adf8d16 into rust-lang:master Sep 8, 2024
7 checks passed
@rustbot rustbot added this to the 1.83.0 milestone Sep 8, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (adf8d16): comparison URL.

Overall result: ❌ regressions - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.6% [0.6%, 0.6%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (secondary -2.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.3% [-2.3%, -2.3%] 1
All ❌✅ (primary) - - 0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 754.738s -> 756.685s (0.26%)
Artifact size: 341.16 MiB -> 341.18 MiB (0.01%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants