Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for MMX instructions being generated without emms #172

Merged
merged 2 commits into from
Nov 12, 2024

Conversation

sterrettm2
Copy link
Contributor

Adds an _mm_empty() intrinsic to generate emms when MMX is enabled in argsort/argselect, which should resolve the potential for issues described in #154.

As far as I can determine, this has no significant performance impact

Performance Comparison
Comparing simdargsort (from /home/msterrett/xss/newperf/x86-simd-sort/.bench/main/builddir/benchexe) to simdargsort (from /home/msterrett/xss/newperf/x86-simd-sort/.bench/mmx_fix/builddir/benchexe)
Benchmark                                                                Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------
[simdargsort vs. simdargsort]/random_128/int64_t                      +0.0012         +0.0012           708           709           708           709
[simdargsort vs. simdargsort]/random_256/int64_t                      -0.0023         -0.0023          1772          1768          1772          1768
[simdargsort vs. simdargsort]/random_512/int64_t                      -0.0016         -0.0016          4784          4776          4784          4776
[simdargsort vs. simdargsort]/random_1k/int64_t                       +0.0010         +0.0010         10499         10509         10498         10509
[simdargsort vs. simdargsort]/random_5k/int64_t                       -0.0025         -0.0025         67099         66934         67098         66932
[simdargsort vs. simdargsort]/random_100k/int64_t                     -0.0088         -0.0088       2013707       1995957       2013556       1995818
[simdargsort vs. simdargsort]/random_1m/int64_t                       -0.0002         -0.0002      35245624      35238625      35243076      35237101
[simdargsort vs. simdargsort]/random_10m/int64_t                      +0.0156         +0.0156     977124766     992322555     977039270     992235295
[simdargsort vs. simdargsort]/random_100m/int64_t                     +0.0139         +0.0139   18232700790   18485483779   18231141204   18483676752
[simdargsort vs. simdargsort]/smallrange_128/int64_t                  -0.0066         -0.0066           712           708           712           707
[simdargsort vs. simdargsort]/smallrange_256/int64_t                  -0.0007         -0.0007          1769          1768          1769          1768
[simdargsort vs. simdargsort]/smallrange_512/int64_t                  +0.0036         +0.0036          4726          4743          4726          4743
[simdargsort vs. simdargsort]/smallrange_1k/int64_t                   +0.0005         +0.0005         10482         10487         10482         10487
[simdargsort vs. simdargsort]/smallrange_5k/int64_t                   +0.0008         +0.0008         36405         36433         36404         36432
[simdargsort vs. simdargsort]/smallrange_100k/int64_t                 -0.0046         -0.0046        598561        595794        598523        595772
[simdargsort vs. simdargsort]/smallrange_1m/int64_t                   +0.0006         +0.0006      12853449      12860812      12851648      12859441
[simdargsort vs. simdargsort]/smallrange_10m/int64_t                  +0.0154         +0.0153     254696519     258610489     254687134     258578062
[simdargsort vs. simdargsort]/sorted_10k/int64_t                      +0.0013         +0.0012        133022        133190        133018        133182
[simdargsort vs. simdargsort]/constant_10k/int64_t                    -0.0077         -0.0078         10208         10129         10208         10129
[simdargsort vs. simdargsort]/reverse_10k/int64_t                     +0.0049         +0.0050        131248        131897        131240        131891
[simdargsort vs. simdargsort]/random_128/uint64_t                     +0.0011         +0.0011           708           708           708           708
[simdargsort vs. simdargsort]/random_256/uint64_t                     -0.0025         -0.0025          1774          1770          1774          1770
[simdargsort vs. simdargsort]/random_512/uint64_t                     -0.0008         -0.0008          4780          4776          4780          4776
[simdargsort vs. simdargsort]/random_1k/uint64_t                      +0.0012         +0.0012         10499         10512         10499         10512
[simdargsort vs. simdargsort]/random_5k/uint64_t                      -0.0026         -0.0026         67187         67011         67186         67010
[simdargsort vs. simdargsort]/random_100k/uint64_t                    -0.0033         -0.0033       1982152       1975621       1982110       1975532
[simdargsort vs. simdargsort]/random_1m/uint64_t                      +0.0102         +0.0102      34620257      34974263      34618430      34971369
[simdargsort vs. simdargsort]/random_10m/uint64_t                     +0.0174         +0.0173     973052241     989962388     972931691     989806861
[simdargsort vs. simdargsort]/random_100m/uint64_t                    +0.0131         +0.0131   18180572120   18419348100   18179060199   18417524574
[simdargsort vs. simdargsort]/smallrange_128/uint64_t                 -0.0005         -0.0005           709           708           709           708
[simdargsort vs. simdargsort]/smallrange_256/uint64_t                 -0.0005         -0.0006          1771          1770          1771          1770
[simdargsort vs. simdargsort]/smallrange_512/uint64_t                 +0.0040         +0.0040          4725          4744          4724          4743
[simdargsort vs. simdargsort]/smallrange_1k/uint64_t                  +0.0046         +0.0046         10450         10498         10450         10498
[simdargsort vs. simdargsort]/smallrange_5k/uint64_t                  -0.0007         -0.0007         36421         36396         36420         36394
[simdargsort vs. simdargsort]/smallrange_100k/uint64_t                -0.0026         -0.0026        567075        565605        567055        565570
[simdargsort vs. simdargsort]/smallrange_1m/uint64_t                  +0.0082         +0.0081      12659654      12763813      12658847      12761470
[simdargsort vs. simdargsort]/smallrange_10m/uint64_t                 +0.0147         +0.0147     254257559     258002559     254236796     257979946
[simdargsort vs. simdargsort]/sorted_10k/uint64_t                     +0.0019         +0.0019        133128        133379        133120        133373
[simdargsort vs. simdargsort]/constant_10k/uint64_t                   +0.0033         +0.0033         10189         10222         10188         10222
[simdargsort vs. simdargsort]/reverse_10k/uint64_t                    -0.0002         -0.0002        132130        132105        132129        132099
[simdargsort vs. simdargsort]/random_128/double                       +0.0003         +0.0003           593           593           593           593
[simdargsort vs. simdargsort]/random_256/double                       -0.0050         -0.0050          1685          1677          1685          1677
[simdargsort vs. simdargsort]/random_512/double                       -0.0011         -0.0011          4449          4444          4449          4444
[simdargsort vs. simdargsort]/random_1k/double                        -0.0005         -0.0005          9390          9385          9390          9385
[simdargsort vs. simdargsort]/random_5k/double                        +0.0002         +0.0002         62240         62252         62237         62250
[simdargsort vs. simdargsort]/random_100k/double                      -0.0019         -0.0019       1932422       1928737       1932342       1928645
[simdargsort vs. simdargsort]/random_1m/double                        +0.0025         +0.0025      34069904      34153722      34064397      34149306
[simdargsort vs. simdargsort]/random_10m/double                       +0.0144         +0.0143     977139131     991171143     977076623     991055474
[simdargsort vs. simdargsort]/random_100m/double                      +0.0141         +0.0141   18011347407   18264792514   18009602270   18262952817
[simdargsort vs. simdargsort]/smallrange_128/double                   +0.0000         +0.0000           593           593           593           593
[simdargsort vs. simdargsort]/smallrange_256/double                   -0.0046         -0.0046          1686          1678          1686          1678
[simdargsort vs. simdargsort]/smallrange_512/double                   -0.0014         -0.0014          4451          4445          4451          4445
[simdargsort vs. simdargsort]/smallrange_1k/double                    -0.0004         -0.0004          9391          9388          9391          9388
[simdargsort vs. simdargsort]/smallrange_5k/double                    +0.0014         +0.0014         62221         62308         62218         62307
[simdargsort vs. simdargsort]/smallrange_100k/double                  -0.0029         -0.0029       1931648       1926055       1931579       1925915
[simdargsort vs. simdargsort]/smallrange_1m/double                    -0.0088         -0.0087      34204188      33904811      34200604      33902125
[simdargsort vs. simdargsort]/smallrange_10m/double                   +0.0171         +0.0171     975591167     992318358     975525280     992164654
[simdargsort vs. simdargsort]/sorted_10k/double                       -0.0052         -0.0052        126130        125470        126124        125464
[simdargsort vs. simdargsort]/constant_10k/double                     -0.0091         -0.0091          9855          9766          9855          9765
[simdargsort vs. simdargsort]/reverse_10k/double                      -0.0059         -0.0059        124230        123495        124225        123491
[simdargsort vs. simdargsort]/random_128/int32_t                      +0.0086         +0.0086           548           553           548           553
[simdargsort vs. simdargsort]/random_256/int32_t                      -0.0120         -0.0120          1538          1520          1538          1520
[simdargsort vs. simdargsort]/random_512/int32_t                      -0.0059         -0.0059          4142          4118          4142          4117
[simdargsort vs. simdargsort]/random_1k/int32_t                       +0.0039         +0.0040         10364         10405         10363         10405
[simdargsort vs. simdargsort]/random_5k/int32_t                       -0.0012         -0.0012         56696         56627         56695         56625
[simdargsort vs. simdargsort]/random_100k/int32_t                     -0.0029         -0.0029       1713254       1708278       1713174       1708167
[simdargsort vs. simdargsort]/random_1m/int32_t                       +0.0015         +0.0015      27559498      27600790      27556444      27596865
[simdargsort vs. simdargsort]/random_10m/int32_t                      +0.0153         +0.0153     784234644     796196890     784109142     796116316
[simdargsort vs. simdargsort]/random_100m/int32_t                     +0.0144         +0.0143   15861032786   16088806891   15859602036   16087053301
[simdargsort vs. simdargsort]/smallrange_128/int32_t                  +0.0081         +0.0081           548           553           548           553
[simdargsort vs. simdargsort]/smallrange_256/int32_t                  -0.0120         -0.0120          1539          1520          1539          1520
[simdargsort vs. simdargsort]/smallrange_512/int32_t                  -0.0081         -0.0081          4056          4024          4056          4023
[simdargsort vs. simdargsort]/smallrange_1k/int32_t                   +0.0074         +0.0074          8944          9010          8944          9010
[simdargsort vs. simdargsort]/smallrange_5k/int32_t                   -0.0027         -0.0028         31977         31889         31976         31888
[simdargsort vs. simdargsort]/smallrange_100k/int32_t                 -0.0144         -0.0143        500798        493600        500750        493588
[simdargsort vs. simdargsort]/smallrange_1m/int32_t                   +0.0062         +0.0061       8686812       8740271       8686393       8739170
[simdargsort vs. simdargsort]/smallrange_10m/int32_t                  +0.0109         +0.0109     188496481     190556551     188472473     190535144
[simdargsort vs. simdargsort]/sorted_10k/int32_t                      -0.0016         -0.0016        116197        116016        116191        116007
[simdargsort vs. simdargsort]/constant_10k/int32_t                    -0.0114         -0.0114          9772          9661          9771          9660
[simdargsort vs. simdargsort]/reverse_10k/int32_t                     -0.0002         -0.0002        115242        115222        115236        115218
[simdargsort vs. simdargsort]/random_128/uint32_t                     +0.0122         +0.0122           547           554           547           554
[simdargsort vs. simdargsort]/random_256/uint32_t                     -0.0168         -0.0168          1538          1512          1538          1512
[simdargsort vs. simdargsort]/random_512/uint32_t                     -0.0072         -0.0071          4142          4112          4142          4112
[simdargsort vs. simdargsort]/random_1k/uint32_t                      +0.0046         +0.0046         10359         10406         10358         10406
[simdargsort vs. simdargsort]/random_5k/uint32_t                      -0.0043         -0.0043         56750         56504         56747         56503
[simdargsort vs. simdargsort]/random_100k/uint32_t                    -0.0059         -0.0059       1721643       1711483       1721531       1711345
[simdargsort vs. simdargsort]/random_1m/uint32_t                      -0.0035         -0.0035      27632647      27537051      27629361      27533426
[simdargsort vs. simdargsort]/random_10m/uint32_t                     +0.0162         +0.0162     784189413     796893113     784097807     796814678
[simdargsort vs. simdargsort]/random_100m/uint32_t                    +0.0146         +0.0146   15834571253   16066239419   15833056033   16064600550
[simdargsort vs. simdargsort]/smallrange_128/uint32_t                 +0.0140         +0.0141           546           554           546           554
[simdargsort vs. simdargsort]/smallrange_256/uint32_t                 -0.0169         -0.0169          1538          1512          1538          1512
[simdargsort vs. simdargsort]/smallrange_512/uint32_t                 -0.0064         -0.0063          4047          4021          4047          4021
[simdargsort vs. simdargsort]/smallrange_1k/uint32_t                  +0.0084         +0.0084          8934          9009          8934          9009
[simdargsort vs. simdargsort]/smallrange_5k/uint32_t                  -0.0007         -0.0007         31897         31874         31895         31873
[simdargsort vs. simdargsort]/smallrange_100k/uint32_t                -0.0128         -0.0128        499109        492723        499090        492700
[simdargsort vs. simdargsort]/smallrange_1m/uint32_t                  +0.0019         +0.0019       8716062       8732262       8715415       8731796
[simdargsort vs. simdargsort]/smallrange_10m/uint32_t                 +0.0119         +0.0119     188475240     190726231     188459995     190704960
[simdargsort vs. simdargsort]/sorted_10k/uint32_t                     -0.0013         -0.0012        116088        115942        116079        115938
[simdargsort vs. simdargsort]/constant_10k/uint32_t                   -0.0101         -0.0101          9729          9631          9729          9631
[simdargsort vs. simdargsort]/reverse_10k/uint32_t                    +0.0014         +0.0014        115172        115330        115168        115325
[simdargsort vs. simdargsort]/random_128/float                        +0.0101         +0.0101           588           594           588           594
[simdargsort vs. simdargsort]/random_256/float                        -0.0100         -0.0101          1646          1630          1646          1630
[simdargsort vs. simdargsort]/random_512/float                        -0.0071         -0.0071          4349          4318          4349          4318
[simdargsort vs. simdargsort]/random_1k/float                         +0.0070         +0.0070          9661          9729          9661          9729
[simdargsort vs. simdargsort]/random_5k/float                         -0.0012         -0.0012         55115         55049         55113         55047
[simdargsort vs. simdargsort]/random_100k/float                       +0.0095         +0.0094       1796926       1813938       1796870       1813840
[simdargsort vs. simdargsort]/random_1m/float                         +0.0088         +0.0089      28188680      28436729      28184750      28434581
[simdargsort vs. simdargsort]/random_10m/float                        +0.0219         +0.0219     794429482     811811981     794336341     811712266
[simdargsort vs. simdargsort]/random_100m/float                       +0.0175         +0.0175   15901863954   16180183840   15900407092   16178671682
[simdargsort vs. simdargsort]/smallrange_128/float                    +0.0053         +0.0053           591           594           591           594
[simdargsort vs. simdargsort]/smallrange_256/float                    -0.0142         -0.0142          1648          1624          1648          1624
[simdargsort vs. simdargsort]/smallrange_512/float                    -0.0073         -0.0073          4352          4320          4351          4320
[simdargsort vs. simdargsort]/smallrange_1k/float                     +0.0086         +0.0086          9666          9749          9666          9748
[simdargsort vs. simdargsort]/smallrange_5k/float                     -0.0014         -0.0014         55137         55063         55136         55059
[simdargsort vs. simdargsort]/smallrange_100k/float                   +0.0125         +0.0125       1791101       1813575       1791058       1813450
[simdargsort vs. simdargsort]/smallrange_1m/float                     +0.0101         +0.0101      28174881      28458671      28172645      28455836
[simdargsort vs. simdargsort]/smallrange_10m/float                    +0.0224         +0.0224     798710356     816587769     798640299     816509722
[simdargsort vs. simdargsort]/sorted_10k/float                        -0.0008         -0.0008        122370        122276        122365        122270
[simdargsort vs. simdargsort]/constant_10k/float                      -0.0080         -0.0081          9789          9710          9789          9710
[simdargsort vs. simdargsort]/reverse_10k/float                       +0.0041         +0.0041        120900        121399        120894        121393

Copy link
Contributor

@r-devulap r-devulap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@r-devulap r-devulap merged commit d6e0d49 into intel:main Nov 12, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants