-
Notifications
You must be signed in to change notification settings - Fork 5
Benchmark
dfttest(gray16_1080p, slocation=[0.0, 1.0, 1.0, 10.0])
, AmusementClub Tools 2023H1p
data format: fps
cpu: amd epyc zen4 16c @ 3.40GHz
ID | tbsize=1 | tbsize=3 | tbsize=5 | tbsize=7 | description |
---|---|---|---|---|---|
1 | 189.1 | 50.8 | 30.9 | 22.4 | vs-dfttest r7 (avx2) |
2 | 355.1 | 135.7 | 80.7 | 47.4 | v5 cpu (avx2) |
3 | 470.8 | 172.2 | 100.1 | 66.1 | v5 cpu (avx-512) |
cpu: intel xeon sapphire rapids 16c @ 2.70GHz
ID | tbsize=1 | tbsize=3 | tbsize=5 | tbsize=7 | description |
---|---|---|---|---|---|
1 | 138.3 | 45.4 | 27.5 | 18.4 | vs-dfttest r7 (avx2) |
2 | 294.2 | 116.6 | 62.0 | 38.3 | v5 cpu (avx2) |
3 | 423.7 | 151.9 | 90.9 | 59.9 | v5 cpu (avx-512) |
ARM64 cpus (32c, clang-16-20221005052905 -march=native -ffast-math
, ubuntu, clang-17-20230622042329 on graviton3e)
ID | tbsize=1 | tbsize=3 | tbsize=5 | tbsize=7 | device |
---|---|---|---|---|---|
1 | 304.1 | 115.1 | 57.5 | 38.1 | Graviton 3E |
2 | 257.2 | 93.3 | 52.4 | 31.5 | Graviton 3 |
3 | 215.8 | 77.3 | 44.2 | 27.6 | Yitian 710 |
4 | 192.5 | 72.5 | 40.6 | 25.8 | Ampere Altra |
5 | 175.3 | 64.2 | 36.9 | 23.5 | Graviton 2 |
6 | 127.8 | 47.3 | 26.2 | 15.3 | Kunpeng 920 |
7 | 37.2 | 12.9 | 5.7 | 2.9 | Graviton (16c) |
cpu: hygon c86 7390 32c @ 2.70GHz
(L1i: 32 x 64 KB, L1d: 32 x 32 KB, L2: 32 x 512 KB, L3: 8 x 8 MB)
ID | tbsize=1 | tbsize=3 | tbsize=5 | tbsize=7 | description |
---|---|---|---|---|---|
1 | 188.40 | 75.03 | 44.30 | 29.40 | v5 cpu (avx2) |
dfttest(gray16_1080p, slocation=[0.0, 1.0, 1.0, 10.0])
, AmusementClub Tools 2022H2b3p
data format: fps
cpu: intel xeon sapphire rapids 16c @ 1.90GHz (no hyper-threading, DDR4-4000)
ID | tbsize=1 | tbsize=3 | tbsize=5 | tbsize=7 | description |
---|---|---|---|---|---|
1 | 106.5 | 32.5 | 18.8 | 12.6 | vs-dfttest r7 (avx2) |
2 | 217.8 | 87.2 | 38.4 | 23.4 | v4 cpu (avx2) |
3 | 234.2 | 97.2 | 59.0 | 40.6 | v4 cpu (avx-512) native build |
cpu: intel xeon ice lake 16c @ 2.90GHz
ID | tbsize=1 | tbsize=3 | tbsize=5 | tbsize=7 | description |
---|---|---|---|---|---|
1 | 118.1 | 38.2 | 22.1 | 14.9 | vs-dfttest r7 (avx2) |
2 | 243.0 | 99.5 | 54.1 | 34.4 | v4 cpu (avx2) |
3 | 346.6 | 132.0 | 79.9 | 55.8 | v4 cpu (avx-512) native build |
cpu: amd epyc zen3 16c @ 2.65GHz
ID | tbsize=1 | tbsize=3 | tbsize=5 | tbsize=7 | description |
---|---|---|---|---|---|
1 | 147.9 | 42.9 | 26.1 | 19.7 | vs-dfttest r7 (avx2) |
2 | 205.8 | 89.2 | 51.6 | 34.0 | v4 cpu (avx2) |
2 | 187.8 | 85.6 | 50.9 | 33.6 | v4 cpu (avx2) native build |
cpu: intel xeon cooper lake 16c @ 3.30GHz
ID | tbsize=1 | tbsize=3 | tbsize=5 | tbsize=7 | description |
---|---|---|---|---|---|
1 | 123.1 | 36.7 | 21.2 | 13.8 | vs-dfttest r7 |
2 | 213.7 | 82.0 | 44.5 | 28.6 | v4 cpu (avx2) |
3 | 362.3 | 137.4 | 74.6 | 46.7 | v4 cpu (avx-512) native build |
cpu: amd epyc zen2 16c @ 2.80GHz
ID | tbsize=1 | tbsize=3 | tbsize=5 | tbsize=7 | description |
---|---|---|---|---|---|
1 | 128.7 | 37.7 | 23.5 | 15.5 | vs-dfttest r7 (avx2) |
2 | 164.4 | 73.3 | 44.8 | 30.0 | v4 cpu (avx2) |
3 | 174.1 | 85.5 | 50.1 | 32.4 | v4 cpu (avx2) native build |
cpu: intel xeon broadwell 16c @ 2.30GHz
ID | tbsize=1 | tbsize=3 | tbsize=5 | tbsize=7 | description |
---|---|---|---|---|---|
1 | 78.6 | 23.7 | 13.5 | 8.8 | vs-dfttest r7 (avx2) |
2 | 150.0 | 58.3 | 31.1 | 16.1 | v4 cpu (avx2) |
3 | 149.5 | 57.7 | 31.0 | 17.7 | v4 cpu (avx2) native build |
cpu: intel xeon ivy bridge 16c (8c/socket x 2 sockets) @ 2.80GHz
ID | tbsize=1 | tbsize=3 | tbsize=5 | tbsize=7 | description |
---|---|---|---|---|---|
1 | 81.4 | 23.8 | 14.1 | 9.6 | vs-dfttest r7 (sse2) |
2 | 117.8 | 44.0 | 24.3 | 14.3 | v4 cpu (avx) native build |
3 | 72.6 | 27.1 | 15.2 | 9.3 | v4 cpu (sse4.1) native build |
cpu: amd epyc zen1 16c @ 2.20GHz
ID | tbsize=1 | tbsize=3 | tbsize=5 | tbsize=7 | description |
---|---|---|---|---|---|
1 | 76.2 | 21.1 | 14.2 | 9.8 | vs-dfttest r7 (avx2) |
2 | 93.8 | 37.9 | 21.8 | 14.3 | v4 cpu (avx2) |
3 | 93.8 | 37.0 | 21.7 | 14.0 | v4 cpu (avx2) native build |
4 | 82.5 | 30.6 | 18.1 | 11.6 | v4 cpu (sse4.1) native build |
dfttest(gray16_1080p, slocation=[0.0, 1.0, 1.0, 10.0])
, AmusementClub Tools 2022H2b3p
data format: fps / memory
Comparing to v2 (backend=Backend.NVRTC()
):
ID | tbsize=1 | tbsize=3 | tbsize=5 | tbsize=7 | device |
---|---|---|---|---|---|
1 | 1133/552 -> 1010/551 | 423/821 -> 463/819 | 246/1089 -> 287/1087 | 118/1356 -> 190/1354 | A100 80GB |
2 | 690/400 -> 653/408 | 316/668 -> 352/677 | 206/ 936 -> 242/ 945 | 56/1213 -> 161/1211 | 3090 |
3 | 661/386 -> 596/384 | 304/655 -> 320/653 | 186/ 923 -> 212/ 921 | 44/1190 -> 111/1188 | A10G |
4 | 430/422 -> 397/415 | 194/689 -> 213/689 | 111/ 957 -> 124/ 957 | 37/1223 -> 74/1223 | 2080 Ti |
5 | 359/238 -> 371/236 | 134/507 -> 168/505 | 74/ 775 -> 92/ 773 | 20/1042 -> 47/1040 | T4 |
6 | 359/428 -> 330/424 | 109/694 -> 153/692 | 48/ 963 -> 56/ 960 | 14/1229 -> 18/1226 | 1080 Ti |
7 | 437/291 -> 417/291 | 105/560 -> 161/560 | 45/ 828 -> 53/ 828 | 11/1095 -> 15/1095 | P40 |
ID | tbsize=1 | tbsize=3 | tbsize=5 | tbsize=7 | device |
---|---|---|---|---|---|
1 | 1995/824 -> 1880/823 | 636/1629 -> 723/1627 | 349/2433 -> 434/2431 | 135/3232 -> 267/2320 | A100 80GB (3 streams) |
2 | 1654/688 -> 1537/687 | 596/1225 -> 668/1223 | 331/1761 -> 404/1759 | 130/2294 -> 250/2292 | A100 80GB (2 streams) |
3 | 839/547 -> 838/545 | 427/1083 -> 495/1011 | 263/1619 -> 344/1617 | 61/2151 -> 205/2149 | 3090 (2 streams) |
3 | 970/522 -> 980/520 | 448/1059 -> 498/1057 | 258/1595 -> 326/1593 | 40/2128 -> 136/2126 | A10G (2 streams) |
4 | 568/550 -> 518/553 | 328/1086 -> 397/1085 | 172/1622 -> 216/1621 | 45/2154 -> 101/2153 | 2080 Ti (2 streams) |
5 | 445/374 -> 468/372 | 170/ 911 -> 225/ 909 | 82/1447 -> 109/1445 | 21/1980 -> 52/1978 | T4 (2 streams) |
6 | 373/563 -> 365/560 | 141/1099 -> 205/1096 | 56/1635 -> 68/1632 | 14/2167 -> 20/2164 | 1080 Ti (2 streams) |
7 | 631/437 -> 606/427 | 119/ 964 -> 201/ 964 | 48/1150 -> 57/1500 | 11/2033 -> 15/2033 | P40 (2 streams) |
- NVIDIA A100-SXM4-80GB, driver 516.94, windows server 2022, dfttest2 v3
NVRTC
- NVIDIA RTX 3090, driver 516.94, windows server 2019, dfttest2 v3
NVRTC
- NVIDIA A10G, driver 516.94, windows server 2022, dfttest2 v3
NVRTC
- NVIDIA T4, driver 516.94, windows server 2022, dfttest2 v3
NVRTC
- NVIDIA P40, driver 516.94, windows server 2019, dfttest2 v3
NVRTC
- NVIDIA RTX 2080 Ti, driver 516.94, windows 10 ltsc 2021, dfttest2 v3
NVRTC
- NVIDIA GTX 1080 Ti, driver 516.94, windows 10 ltsc 2021, dfttest2 v3
NVRTC
dfttest(gray16_1080p, slocation=[0.0, 1.0, 1.0, 10.0])
, AmusementClub Tools 2022H2b3p
data format: fps / memory
Comparing to v1 (backend=Backend.cuFFT(in_place=False)
):
ID | tbsize=1 | tbsize=3 | tbsize=5 | tbsize=7 | device |
---|---|---|---|---|---|
1 | 651/848 -> 1133/552 | 196/1693 -> 423/821 | 120/2537 -> 246/1089 | 86/3380 -> 118/1356 | A100 80GB |
2 | 297/689 -> 690/400 | 83/1546 -> 316/668 | 52/2390 -> 206/ 936 | 39/3232 -> 56/1213 | 3090 |
3 | 214/682 -> 661/386 | 60/1527 -> 304/655 | 36/2371 -> 186/ 923 | 26/3214 -> 44/1190 | A10G |
4 | 293/740 -> 709/444 | 86/1585 -> 276/713 | 53/2429 -> 151/ 981 | 38/3272 -> 35/1248 | V100 32GB |
5 | 184/589 -> 430/422 | 54/1433 -> 194/689 | 33/2276 -> 111/ 957 | 24/3018 -> 37/1223 | 2080 Ti |
6 | 104/535 -> 359/238 | 28/1379 -> 134/507 | 17/2223 -> 74/ 775 | 12/3066 -> 20/1042 | T4 |
7 | 113/578 -> 359/428 | 36/1422 -> 109/694 | 22/2266 -> 48/ 963 | 16/3107 -> 14/1229 | 1080 Ti |
8 | 82/583 -> 437/291 | 24/1428 -> 105/560 | 14/2272 -> 45/ 828 | 10/3115 -> 11/1095 | P40 |
ID | tbsize=1 | tbsize=3 | tbsize=5 | tbsize=7 | device |
---|---|---|---|---|---|
1 | 1995/824 | 636/1629 | 349/2433 | 135/3232 | A100 80GB (3 streams) |
2 | 1654/688 | 596/1225 | 331/1761 | 130/2294 | A100 80GB (2 streams) |
3 | 839/547 | 427/1083 | 263/1619 | 61/2151 | 3090 (2 streams) |
4 | 970/522 | 448/1059 | 258/1595 | 40/2128 | A10G (2 streams) |
5 | 1041/580 | 371/1117 | 208/1653 | 37/2186 | V100 80GB (2 streams) |
6 | 568/550 | 328/1086 | 172/1622 | 45/2154 | 2080 Ti (2 streams) |
7 | 445/374 | 170/ 911 | 82/1447 | 21/1980 | T4 (2 streams) |
8 | 373/563 | 141/1099 | 56/1635 | 14/2167 | 1080 Ti (2 streams) |
9 | 631/437 | 119/ 964 | 48/1150 | 11/2033 | P40 (2 streams) |
- NVIDIA A100-SXM4-80GB, driver 516.94, windows server 2022, dfttest2 v2
NVRTC
- NVIDIA V100-SXM2-32GB, driver 516.94, windows server 2022, dfttest2 v2
NVRTC
- NVIDIA RTX 3090, driver 516.94, windows server 2019, dfttest2 v2
NVRTC
- NVIDIA A10G, driver 516.94, windows server 2022, dfttest2 v2
NVRTC
- NVIDIA T4, driver 516.94, windows server 2022, dfttest2 v2
NVRTC
- NVIDIA P40, driver 516.94, windows server 2019, dfttest2 v2
NVRTC
- NVIDIA RTX 2080 Ti, driver 516.94, windows 10 ltsc 2021, dfttest2 v2
NVRTC
- NVIDIA GTX 1080 Ti, driver 516.94, windows 10 ltsc 2021, dfttest2 v2
NVRTC
dfttest(gray16_1080p, slocation=[0.0, 1.0, 1.0, 10.0])
, AmusementClub Tools 2022H2b3p
data format: (out-of-place
fps / memory (in-place
fps / memory))
ID | tbsize=1 | tbsize=3 | tbsize=5 | tbsize=7 | device |
---|---|---|---|---|---|
1 | 651/ 848 (587/718) | 196/1693 (177/1303) | 120/2537 (107/1887) | 86/3380 (77/2472) | A100 80GB |
2 | 317/ 658 (297/528) | 95/1503 ( 89/1113) | 58/2347 ( 54/1697) | 42/3190 (39/2282) | A30 |
3 | 293/ 740 (272/610) | 86/1585 ( 80/1195) | 53/2429 ( 47/1779) | 38/3272 (34/2364) | V100 32GB |
4 | 297/ 689 (253/572) | 83/1546 ( 83/1156) | 52/2390 ( 53/1740) | 39/3232 (39/2324) | 3090 |
5 | 257/ 698 (230/568) | 78/1542 ( 73/1152) | 48/2386 ( 45/1736) | 35/3228 (32/2320) | A6000 |
6 | 251/ 639 (235/511) | 76/1485 ( 76/1095) | 47/2227 ( 47/1679) | 35/3171 (34/2263) | A5000 |
7 | 237/ 694 (212/564) | 68/1539 ( 62/1149) | 41/2383 ( 37/1733) | 30/3226 (27/2318) | A40 |
8 | 214/ 682 (193/552) | 60/1527 ( 55/1137) | 36/2371 ( 33/1721) | 26/3214 (23/2306) | A10G |
9 | 184/ 589 (162/458) | 54/1433 ( 50/1042) | 33/2276 ( 31/1626) | 24/3018 (22/2210) | 2080 Ti |
10 | 113/ 578 ( 97/447) | 36/1422 ( 29/1021) | 22/2266 ( 18/1605) | 16/3107 (13/2199) | 1080 Ti |
11 | 104/ 535 (100/404) | 28/1379 ( 27/ 989) | 17/2223 ( 16/1573) | 12/3066 (12/2158) | T4 |
12 | 276/3980 | 87/3854 | 53/4513 | 38/4919 | Zen3 |
13 | 251/1615 | 72/2157 | 47/2705 | 33/3317 | Zen3 |
14 | 234/1593 | 70/2175 | 40/2791 | 29/3322 | IceLake |
15 | 218/3194 | 61/3805 | 39/4411 | 27/4974 | IceLake |
16 | 180/1170 | 56/1626 | 30/2059 | 21/2521 | CooperLake |
17 | 180/2554 | 53/2849 | 32/3263 | 23/3716 | CooperLake |
- NVIDIA A100-SXM4-80GB, driver 516.31, windows server 2022, dfttest2 v1
- NVIDIA A30, driver 516.31, windows 10 ltsc 2021, dfttest2 v1
- NVIDIA V100-SXM2-32GB, driver 516.31, windows server 2022, dfttest2 v1
- NVIDIA RTX 3090, driver 516.59, windows 10 ltsc 2021, dfttest2 v1
- NVIDIA A40, driver 516.31, windows 10 ltsc 2021, dfttest2 v1
- NVIDIA A6000, driver 516.59, windows 10 ltsc 2021, dfttest2 v1
- NVIDIA A5000, driver 516.59, windows 10 ltsc 2021, dfttest2 v1
- NVIDIA A10G, driver 516.31, windows server 2022, dfttest2 v1
- NVIDIA RTX 2080 Ti, driver 516.59, windows 10 ltsc 2021, dfttest2 v1
- NVIDIA GTX 1080 Ti, driver 516.59, windows 10 ltsc 2021, dfttest2 v1
- NVIDIA T4, driver 516.31, windows server 2022, dfttest2 v1
- AMD EPYC Zen3 32C, windows server 2022, neo_DFTTest r7
- AMD EPYC Zen3 32C, windows server 2022, VapourSynth DFTTest r7
- Intel Ice Lake 32C, windows server 2022, VapourSynth DFTTest r7
- Intel Ice Lake 32C, windows server 2022, neo_DFTTest r7
- Intel Cooper Lake 24C, windows server 2022, VapourSynth DFTTest r7
- Intel Cooper Lake 24C, windows server 2022, neo_DFTTest r7