Skip to content

Benchmark

WolframRhodium edited this page Nov 2, 2023 · 20 revisions

v5

dfttest(gray16_1080p, slocation=[0.0, 1.0, 1.0, 10.0]), AmusementClub Tools 2023H1p

data format: fps

case 1

cpu: amd epyc zen4 16c @ 3.40GHz

ID tbsize=1 tbsize=3 tbsize=5 tbsize=7 description
1 189.1 50.8 30.9 22.4 vs-dfttest r7 (avx2)
2 355.1 135.7 80.7 47.4 v5 cpu (avx2)
3 470.8 172.2 100.1 66.1 v5 cpu (avx-512)

case 2

cpu: intel xeon sapphire rapids 16c @ 2.70GHz

ID tbsize=1 tbsize=3 tbsize=5 tbsize=7 description
1 138.3 45.4 27.5 18.4 vs-dfttest r7 (avx2)
2 294.2 116.6 62.0 38.3 v5 cpu (avx2)
3 423.7 151.9 90.9 59.9 v5 cpu (avx-512)

case 3 (arm)

ARM64 cpus (32c, clang-16-20221005052905 -march=native -ffast-math, ubuntu, clang-17-20230622042329 on graviton3e)

ID tbsize=1 tbsize=3 tbsize=5 tbsize=7 device
1 304.1 115.1 57.5 38.1 Graviton 3E
2 257.2 93.3 52.4 31.5 Graviton 3
3 215.8 77.3 44.2 27.6 Yitian 710
4 192.5 72.5 40.6 25.8 Ampere Altra
5 175.3 64.2 36.9 23.5 Graviton 2
6 127.8 47.3 26.2 15.3 Kunpeng 920
7 37.2 12.9 5.7 2.9 Graviton (16c)

case 4

cpu: hygon c86 7390 32c @ 2.70GHz

(L1i: 32 x 64 KB, L1d: 32 x 32 KB, L2: 32 x 512 KB, L3: 8 x 8 MB)

ID tbsize=1 tbsize=3 tbsize=5 tbsize=7 description
1 188.40 75.03 44.30 29.40 v5 cpu (avx2)

v4

dfttest(gray16_1080p, slocation=[0.0, 1.0, 1.0, 10.0]), AmusementClub Tools 2022H2b3p

data format: fps

case 1

cpu: intel xeon sapphire rapids 16c @ 1.90GHz (no hyper-threading, DDR4-4000)

ID tbsize=1 tbsize=3 tbsize=5 tbsize=7 description
1 106.5 32.5 18.8 12.6 vs-dfttest r7 (avx2)
2 217.8 87.2 38.4 23.4 v4 cpu (avx2)
3 234.2 97.2 59.0 40.6 v4 cpu (avx-512) native build

case 2

cpu: intel xeon ice lake 16c @ 2.90GHz

ID tbsize=1 tbsize=3 tbsize=5 tbsize=7 description
1 118.1 38.2 22.1 14.9 vs-dfttest r7 (avx2)
2 243.0 99.5 54.1 34.4 v4 cpu (avx2)
3 346.6 132.0 79.9 55.8 v4 cpu (avx-512) native build

case 3

cpu: amd epyc zen3 16c @ 2.65GHz

ID tbsize=1 tbsize=3 tbsize=5 tbsize=7 description
1 147.9 42.9 26.1 19.7 vs-dfttest r7 (avx2)
2 205.8 89.2 51.6 34.0 v4 cpu (avx2)
2 187.8 85.6 50.9 33.6 v4 cpu (avx2) native build

case 4

cpu: intel xeon cooper lake 16c @ 3.30GHz

ID tbsize=1 tbsize=3 tbsize=5 tbsize=7 description
1 123.1 36.7 21.2 13.8 vs-dfttest r7
2 213.7 82.0 44.5 28.6 v4 cpu (avx2)
3 362.3 137.4 74.6 46.7 v4 cpu (avx-512) native build

case 5

cpu: amd epyc zen2 16c @ 2.80GHz

ID tbsize=1 tbsize=3 tbsize=5 tbsize=7 description
1 128.7 37.7 23.5 15.5 vs-dfttest r7 (avx2)
2 164.4 73.3 44.8 30.0 v4 cpu (avx2)
3 174.1 85.5 50.1 32.4 v4 cpu (avx2) native build

case 6

cpu: intel xeon broadwell 16c @ 2.30GHz

ID tbsize=1 tbsize=3 tbsize=5 tbsize=7 description
1 78.6 23.7 13.5 8.8 vs-dfttest r7 (avx2)
2 150.0 58.3 31.1 16.1 v4 cpu (avx2)
3 149.5 57.7 31.0 17.7 v4 cpu (avx2) native build

case 7

cpu: intel xeon ivy bridge 16c (8c/socket x 2 sockets) @ 2.80GHz

ID tbsize=1 tbsize=3 tbsize=5 tbsize=7 description
1 81.4 23.8 14.1 9.6 vs-dfttest r7 (sse2)
2 117.8 44.0 24.3 14.3 v4 cpu (avx) native build
3 72.6 27.1 15.2 9.3 v4 cpu (sse4.1) native build

case 8

cpu: amd epyc zen1 16c @ 2.20GHz

ID tbsize=1 tbsize=3 tbsize=5 tbsize=7 description
1 76.2 21.1 14.2 9.8 vs-dfttest r7 (avx2)
2 93.8 37.9 21.8 14.3 v4 cpu (avx2)
3 93.8 37.0 21.7 14.0 v4 cpu (avx2) native build
4 82.5 30.6 18.1 11.6 v4 cpu (sse4.1) native build

v3

dfttest(gray16_1080p, slocation=[0.0, 1.0, 1.0, 10.0]), AmusementClub Tools 2022H2b3p

data format: fps / memory

Comparing to v2 (backend=Backend.NVRTC()):

single stream

ID tbsize=1 tbsize=3 tbsize=5 tbsize=7 device
1 1133/552 -> 1010/551 423/821 -> 463/819 246/1089 -> 287/1087 118/1356 -> 190/1354 A100 80GB
2 690/400 -> 653/408 316/668 -> 352/677 206/ 936 -> 242/ 945 56/1213 -> 161/1211 3090
3 661/386 -> 596/384 304/655 -> 320/653 186/ 923 -> 212/ 921 44/1190 -> 111/1188 A10G
4 430/422 -> 397/415 194/689 -> 213/689 111/ 957 -> 124/ 957 37/1223 -> 74/1223 2080 Ti
5 359/238 -> 371/236 134/507 -> 168/505 74/ 775 -> 92/ 773 20/1042 -> 47/1040 T4
6 359/428 -> 330/424 109/694 -> 153/692 48/ 963 -> 56/ 960 14/1229 -> 18/1226 1080 Ti
7 437/291 -> 417/291 105/560 -> 161/560 45/ 828 -> 53/ 828 11/1095 -> 15/1095 P40

multiple streams

ID tbsize=1 tbsize=3 tbsize=5 tbsize=7 device
1 1995/824 -> 1880/823 636/1629 -> 723/1627 349/2433 -> 434/2431 135/3232 -> 267/2320 A100 80GB (3 streams)
2 1654/688 -> 1537/687 596/1225 -> 668/1223 331/1761 -> 404/1759 130/2294 -> 250/2292 A100 80GB (2 streams)
3 839/547 -> 838/545 427/1083 -> 495/1011 263/1619 -> 344/1617 61/2151 -> 205/2149 3090 (2 streams)
3 970/522 -> 980/520 448/1059 -> 498/1057 258/1595 -> 326/1593 40/2128 -> 136/2126 A10G (2 streams)
4 568/550 -> 518/553 328/1086 -> 397/1085 172/1622 -> 216/1621 45/2154 -> 101/2153 2080 Ti (2 streams)
5 445/374 -> 468/372 170/ 911 -> 225/ 909 82/1447 -> 109/1445 21/1980 -> 52/1978 T4 (2 streams)
6 373/563 -> 365/560 141/1099 -> 205/1096 56/1635 -> 68/1632 14/2167 -> 20/2164 1080 Ti (2 streams)
7 631/437 -> 606/427 119/ 964 -> 201/ 964 48/1150 -> 57/1500 11/2033 -> 15/2033 P40 (2 streams)

config

  1. NVIDIA A100-SXM4-80GB, driver 516.94, windows server 2022, dfttest2 v3 NVRTC
  2. NVIDIA RTX 3090, driver 516.94, windows server 2019, dfttest2 v3 NVRTC
  3. NVIDIA A10G, driver 516.94, windows server 2022, dfttest2 v3 NVRTC
  4. NVIDIA T4, driver 516.94, windows server 2022, dfttest2 v3 NVRTC
  5. NVIDIA P40, driver 516.94, windows server 2019, dfttest2 v3 NVRTC
  6. NVIDIA RTX 2080 Ti, driver 516.94, windows 10 ltsc 2021, dfttest2 v3 NVRTC
  7. NVIDIA GTX 1080 Ti, driver 516.94, windows 10 ltsc 2021, dfttest2 v3 NVRTC

v2

dfttest(gray16_1080p, slocation=[0.0, 1.0, 1.0, 10.0]), AmusementClub Tools 2022H2b3p

data format: fps / memory

single stream

Comparing to v1 (backend=Backend.cuFFT(in_place=False)):

ID tbsize=1 tbsize=3 tbsize=5 tbsize=7 device
1 651/848 -> 1133/552 196/1693 -> 423/821 120/2537 -> 246/1089 86/3380 -> 118/1356 A100 80GB
2 297/689 -> 690/400 83/1546 -> 316/668 52/2390 -> 206/ 936 39/3232 -> 56/1213 3090
3 214/682 -> 661/386 60/1527 -> 304/655 36/2371 -> 186/ 923 26/3214 -> 44/1190 A10G
4 293/740 -> 709/444 86/1585 -> 276/713 53/2429 -> 151/ 981 38/3272 -> 35/1248 V100 32GB
5 184/589 -> 430/422 54/1433 -> 194/689 33/2276 -> 111/ 957 24/3018 -> 37/1223 2080 Ti
6 104/535 -> 359/238 28/1379 -> 134/507 17/2223 -> 74/ 775 12/3066 -> 20/1042 T4
7 113/578 -> 359/428 36/1422 -> 109/694 22/2266 -> 48/ 963 16/3107 -> 14/1229 1080 Ti
8 82/583 -> 437/291 24/1428 -> 105/560 14/2272 -> 45/ 828 10/3115 -> 11/1095 P40

multiple streams

ID tbsize=1 tbsize=3 tbsize=5 tbsize=7 device
1 1995/824 636/1629 349/2433 135/3232 A100 80GB (3 streams)
2 1654/688 596/1225 331/1761 130/2294 A100 80GB (2 streams)
3 839/547 427/1083 263/1619 61/2151 3090 (2 streams)
4 970/522 448/1059 258/1595 40/2128 A10G (2 streams)
5 1041/580 371/1117 208/1653 37/2186 V100 80GB (2 streams)
6 568/550 328/1086 172/1622 45/2154 2080 Ti (2 streams)
7 445/374 170/ 911 82/1447 21/1980 T4 (2 streams)
8 373/563 141/1099 56/1635 14/2167 1080 Ti (2 streams)
9 631/437 119/ 964 48/1150 11/2033 P40 (2 streams)

config

  1. NVIDIA A100-SXM4-80GB, driver 516.94, windows server 2022, dfttest2 v2 NVRTC
  2. NVIDIA V100-SXM2-32GB, driver 516.94, windows server 2022, dfttest2 v2 NVRTC
  3. NVIDIA RTX 3090, driver 516.94, windows server 2019, dfttest2 v2 NVRTC
  4. NVIDIA A10G, driver 516.94, windows server 2022, dfttest2 v2 NVRTC
  5. NVIDIA T4, driver 516.94, windows server 2022, dfttest2 v2 NVRTC
  6. NVIDIA P40, driver 516.94, windows server 2019, dfttest2 v2 NVRTC
  7. NVIDIA RTX 2080 Ti, driver 516.94, windows 10 ltsc 2021, dfttest2 v2 NVRTC
  8. NVIDIA GTX 1080 Ti, driver 516.94, windows 10 ltsc 2021, dfttest2 v2 NVRTC

v1

dfttest(gray16_1080p, slocation=[0.0, 1.0, 1.0, 10.0]), AmusementClub Tools 2022H2b3p

data format: (out-of-place fps / memory (in-place fps / memory))

ID tbsize=1 tbsize=3 tbsize=5 tbsize=7 device
1 651/ 848 (587/718) 196/1693 (177/1303) 120/2537 (107/1887) 86/3380 (77/2472) A100 80GB
2 317/ 658 (297/528) 95/1503 ( 89/1113) 58/2347 ( 54/1697) 42/3190 (39/2282) A30
3 293/ 740 (272/610) 86/1585 ( 80/1195) 53/2429 ( 47/1779) 38/3272 (34/2364) V100 32GB
4 297/ 689 (253/572) 83/1546 ( 83/1156) 52/2390 ( 53/1740) 39/3232 (39/2324) 3090
5 257/ 698 (230/568) 78/1542 ( 73/1152) 48/2386 ( 45/1736) 35/3228 (32/2320) A6000
6 251/ 639 (235/511) 76/1485 ( 76/1095) 47/2227 ( 47/1679) 35/3171 (34/2263) A5000
7 237/ 694 (212/564) 68/1539 ( 62/1149) 41/2383 ( 37/1733) 30/3226 (27/2318) A40
8 214/ 682 (193/552) 60/1527 ( 55/1137) 36/2371 ( 33/1721) 26/3214 (23/2306) A10G
9 184/ 589 (162/458) 54/1433 ( 50/1042) 33/2276 ( 31/1626) 24/3018 (22/2210) 2080 Ti
10 113/ 578 ( 97/447) 36/1422 ( 29/1021) 22/2266 ( 18/1605) 16/3107 (13/2199) 1080 Ti
11 104/ 535 (100/404) 28/1379 ( 27/ 989) 17/2223 ( 16/1573) 12/3066 (12/2158) T4
12 276/3980 87/3854 53/4513 38/4919 Zen3
13 251/1615 72/2157 47/2705 33/3317 Zen3
14 234/1593 70/2175 40/2791 29/3322 IceLake
15 218/3194 61/3805 39/4411 27/4974 IceLake
16 180/1170 56/1626 30/2059 21/2521 CooperLake
17 180/2554 53/2849 32/3263 23/3716 CooperLake

config

  1. NVIDIA A100-SXM4-80GB, driver 516.31, windows server 2022, dfttest2 v1
  2. NVIDIA A30, driver 516.31, windows 10 ltsc 2021, dfttest2 v1
  3. NVIDIA V100-SXM2-32GB, driver 516.31, windows server 2022, dfttest2 v1
  4. NVIDIA RTX 3090, driver 516.59, windows 10 ltsc 2021, dfttest2 v1
  5. NVIDIA A40, driver 516.31, windows 10 ltsc 2021, dfttest2 v1
  6. NVIDIA A6000, driver 516.59, windows 10 ltsc 2021, dfttest2 v1
  7. NVIDIA A5000, driver 516.59, windows 10 ltsc 2021, dfttest2 v1
  8. NVIDIA A10G, driver 516.31, windows server 2022, dfttest2 v1
  9. NVIDIA RTX 2080 Ti, driver 516.59, windows 10 ltsc 2021, dfttest2 v1
  10. NVIDIA GTX 1080 Ti, driver 516.59, windows 10 ltsc 2021, dfttest2 v1
  11. NVIDIA T4, driver 516.31, windows server 2022, dfttest2 v1
  12. AMD EPYC Zen3 32C, windows server 2022, neo_DFTTest r7
  13. AMD EPYC Zen3 32C, windows server 2022, VapourSynth DFTTest r7
  14. Intel Ice Lake 32C, windows server 2022, VapourSynth DFTTest r7
  15. Intel Ice Lake 32C, windows server 2022, neo_DFTTest r7
  16. Intel Cooper Lake 24C, windows server 2022, VapourSynth DFTTest r7
  17. Intel Cooper Lake 24C, windows server 2022, neo_DFTTest r7