Skip to content

Conversation

@nrspruit
Copy link
Contributor

@nrspruit nrspruit commented Aug 7, 2024

  • Enabled Immediate Command list usage per queue given Intel DG2 HW.
  • Removed default setting of false on windows.

@nrspruit nrspruit requested a review from a team as a code owner August 7, 2024 21:59
@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2024

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/10292370779

@github-actions github-actions bot added the level-zero L0 adapter specific issues label Aug 7, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2024

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/10292370779
Job status: success. Test status: success.

Summary

Benchmark This PR baseline
api_overhead_benchmark_sycl SubmitKernel out of order 26.281 **23.082**
api_overhead_benchmark_sycl SubmitKernel in order 25.33 **22.972**
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 330.433 **298.574**
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 **204.698** 222.377
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 6.699 **6.408**
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 **3.091** 3.116
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.875 **2.806**
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 2.403 **2.322**
miscellaneous_benchmark_sycl VectorSum **858.246** 859.353
Velocity-Bench Hashtable 330.956108 **328.705328**
Velocity-Bench Bitcracker **35.6949** 35.7419
Velocity-Bench CudaSift **218.517** 218.846
Velocity-Bench Easywave **239** 246.0
Velocity-Bench QuickSilver 117.23 **117.06**
Velocity-Bench Sobel Filter 612.759 **610.354**

Charts

api_overhead_benchmark_sycl SubmitKernel out of order
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl SubmitKernel out of order
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)

        This PR (26.281 μs)   : crit, 0, 26

        baseline (23.082 μs)   :  0, 23

    -   : 0, 0

    -   : 0, 0

Loading
api_overhead_benchmark_sycl SubmitKernel in order
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl SubmitKernel in order
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)

        This PR (25.33 μs)   : crit, 0, 25

        baseline (22.972 μs)   :  0, 22

    -   : 0, 0

    -   : 0, 0

Loading
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QueueInOrderMemcpy(api=sycl<br>IsCopyOnly=0<br>sourcePlacement=Device<br>destinationPlacement=Device<br>size=1KB<br>count=100)

        This PR (330.433 μs)   : crit, 0, 330

        baseline (298.574 μs)   :  0, 298

    -   : 0, 0

    -   : 0, 0

Loading
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QueueInOrderMemcpy(api=sycl<br>IsCopyOnly=0<br>sourcePlacement=Host<br>destinationPlacement=Device<br>size=1KB<br>count=100)

        This PR (204.698 μs)   : crit, 0, 204

        baseline (222.377 μs)   :  0, 222

    -   : 0, 0

    -   : 0, 0

Loading
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QueueMemcpy(api=sycl<br>sourcePlacement=Device<br>destinationPlacement=Device<br>size=1KB)

        This PR (6.699 μs)   : crit, 0, 6

        baseline (6.408 μs)   :  0, 6

    -   : 0, 0

    -   : 0, 0

Loading
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240
    todayMarker off
    dateFormat  X
    axisFormat %s

    section StreamMemory(api=sycl<br>type=Triad<br>size=10KB<br>useEvents=0<br>contents=Zeros<br>memoryPlacement=Device)

        This PR (3.091 μs)   : crit, 0, 3

        baseline (3.116 μs)   :  0, 3

    -   : 0, 0

    -   : 0, 0

Loading
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section ExecImmediateCopyQueue(api=sycl<br>IsCopyOnly=1<br>MeasureCompletionTime=0<br>src=Device<br>dst=Device<br>size=1KB<br>ioq=0)

        This PR (2.875 μs)   : crit, 0, 2

        baseline (2.806 μs)   :  0, 2

    -   : 0, 0

    -   : 0, 0

Loading
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section ExecImmediateCopyQueue(api=sycl<br>IsCopyOnly=1<br>MeasureCompletionTime=0<br>src=Host<br>dst=Host<br>size=1KB<br>ioq=1)

        This PR (2.403 μs)   : crit, 0, 2

        baseline (2.322 μs)   :  0, 2

    -   : 0, 0

    -   : 0, 0

Loading
miscellaneous_benchmark_sycl VectorSum
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title miscellaneous_benchmark_sycl VectorSum
    todayMarker off
    dateFormat  X
    axisFormat %s

    section VectorSum(api=sycl<br>numberOfElementsX=512<br>numberOfElementsY=256<br>numberOfElementsZ=256)

        This PR (858.246 μs)   : crit, 0, 858

        baseline (859.353 μs)   :  0, 859

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench Hashtable
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Hashtable
    todayMarker off
    dateFormat  X
    axisFormat %s

    section hashtable

        This PR (330.956108 M keys/sec)   : crit, 0, 330

        baseline (328.705328 M keys/sec)   :  0, 328

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench Bitcracker
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Bitcracker
    todayMarker off
    dateFormat  X
    axisFormat %s

    section bitcracker

        This PR (35.6949 s)   : crit, 0, 35

        baseline (35.7419 s)   :  0, 35

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench CudaSift
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench CudaSift
    todayMarker off
    dateFormat  X
    axisFormat %s

    section cudaSift

        This PR (218.517 ms)   : crit, 0, 218

        baseline (218.846 ms)   :  0, 218

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench Easywave
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Easywave
    todayMarker off
    dateFormat  X
    axisFormat %s

    section easywave

        This PR (239 ms)   : crit, 0, 239

        baseline (246.0 ms)   :  0, 246

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench QuickSilver
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench QuickSilver
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QuickSilver

        This PR (117.23 MMS/CTT)   : crit, 0, 117

        baseline (117.06 MMS/CTT)   :  0, 117

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench Sobel Filter
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Sobel Filter
    todayMarker off
    dateFormat  X
    axisFormat %s

    section sobel_filter

        This PR (612.759 ms)   : crit, 0, 612

        baseline (610.354 ms)   :  0, 610

    -   : 0, 0

    -   : 0, 0

Loading

Details

SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),26.281,26.341,5.63%,23.607,460.934,[CPU],[us]

SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),25.330,25.568,6.15%,22.364,433.367,[CPU],[us]

QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),330.433,332.998,2.91%,300.883,732.033,[CPU],[us]

QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),204.698,197.360,17.86%,190.390,577.874,[CPU],[us]

QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),6.699,6.406,19.81%,6.019,116.584,[CPU],[us]

StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.091,3.097,2.84%,0.686,3.368,[CPU],[GB/s]

ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),2.875,2.866,7.42%,2.668,60.993,[CPU],[us]

ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),2.403,2.397,6.06%,2.280,43.808,[CPU],[us]

VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256),858.246,858.902,0.45%,821.607,871.997,[GPU],bw [GB/s]

hashtable

Environment Variables:

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.405545 s
330.956108 million keys/second

bitcracker

Environment Variables:

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00403515 s
bitcracker - total time for whole calculation: 35.6949 s

cudaSift

Environment Variables:

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Output:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1235 1272 33.5324% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1264 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1263 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1201 1262 32.6093% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1200 1259 32.5821% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1263 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1269 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1270 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1215 1271 32.9894% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1235 1268 33.5324% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1266 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1174 1267 31.8762% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1218 1254 33.0709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1268 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1031 1261 27.9935% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1259 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1266 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1265 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1265 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1261 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1134 1250 30.7901% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1259 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1261 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1257 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1256 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1083 1259 29.4054% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1058 1264 28.7266% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1214 1262 32.9623% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1091 1247 29.6226% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1228 1261 33.3424% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1231 1264 33.4238% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1110 1255 30.1385% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1270 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1136 1259 30.8444% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1256 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1258 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1120 1273 30.41% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1129 1261 30.6544% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1261 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1240 1274 33.6682% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1266 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1110 1272 30.1385% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1259 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1241 1275 33.6954% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1199 1279 32.555% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1212 1259 32.908% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1055 1267 28.6451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1082 1264 29.3782% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1217 1264 33.0437% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1253 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 218.517 ms

easywave

Environment Variables:

Command:

/home/test-user/bench_workdir/easywave/easyWave_sycl -grid /home/test-user/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/test-user/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.3.29735+27)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.283490e-01 6.218600e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.604760e-01 7.619230e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.576670e-01 7.764070e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.666110e-01 8.343250e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.296530e-01 7.981060e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.294980e-01 7.726620e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.260260e-01 7.726120e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.259680e-01 7.911310e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.279000e-01 7.911720e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.248170e-01 7.645080e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.116e+07 1.116e+07 1.116e+07 0.000e+00 100.00
cycleInit 10 3.477e+06 3.477e+06 3.477e+06 0.000e+00 100.00
cycleTracking 10 7.685e+06 7.685e+06 7.685e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.944e+06 4.944e+06 4.944e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.200e+05 2.200e+05 2.200e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.170e+02 4.170e+02 4.170e+02 0.000e+00 100.00
Figure Of Merit 117.23 [Num Mega Segments / Cycle Tracking Time]

sobel_filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.45873 s
sobelfilter - total time for whole calculation: 0.612759 s

@nrspruit nrspruit marked this pull request as draft August 7, 2024 22:22
@nrspruit nrspruit force-pushed the dg2_immediate_default branch from 0c07008 to 2a381d5 Compare August 8, 2024 23:53
@github-actions github-actions bot added the conformance Conformance test suite issues. label Aug 8, 2024
@nrspruit nrspruit added the v0.10.x Include in the v0.10.x release label Aug 9, 2024
@nrspruit nrspruit force-pushed the dg2_immediate_default branch from 2a381d5 to 1912fb4 Compare August 9, 2024 16:37
@nrspruit nrspruit marked this pull request as ready for review August 9, 2024 16:39
@nrspruit nrspruit requested a review from a team as a code owner August 9, 2024 16:39
@nrspruit
Copy link
Contributor Author

nrspruit commented Aug 9, 2024

After rebase, usage of immediate command lists is exposing a bug running the new tests in ac7eb17 by @EwanC .

EwanC pushed a commit to Bensuo/unified-runtime that referenced this pull request Aug 12, 2024
As discovered in
oneapi-src#1951 (comment)
there is an issue running the command-buffer CTS tests on L0 with
command-lists enabled.

After investigating, this was because the CTS tests we're not using the
output event parameter to command-buffer enqueue. In the L0 adapter
code a path for registering an event with the queue to waiting on
the workload was guarded by this event being set. Therefore
`urQueueFinish` was not working as expected, as the queue was not aware
of this work to wait on.

Fixed by always registering the work to wait on with the queue, and
miss out on propagation of the created event when user doesn't pass
an output event, rather than not creating it at all.
@nrspruit nrspruit force-pushed the dg2_immediate_default branch from 1912fb4 to e2cc90c Compare August 13, 2024 15:42
nrspruit added a commit to intel/llvm that referenced this pull request Aug 13, 2024
-pre-commit PR for oneapi-src/unified-runtime#1951

Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>
@nrspruit nrspruit marked this pull request as draft August 14, 2024 00:19
@nrspruit
Copy link
Contributor Author

Failures seen in intel/llvm#15054 for memcpy 2d, until that is resolved this PR is in draft.

@omarahmed1111
Copy link
Contributor

@nrspruit Is this PR a WIP?

@nrspruit
Copy link
Contributor Author

@nrspruit Is this PR a WIP?

Hello @omarahmed1111 , yes, this patch exposes a problem in a L0 Driver so this is pending and will most likely need to be updated before it can be merged. I will remove the 0.10.x until we can get this resolved.

@nrspruit nrspruit removed the v0.10.x Include in the v0.10.x release label Aug 20, 2024
@nrspruit nrspruit force-pushed the dg2_immediate_default branch from e2cc90c to a6d049f Compare August 26, 2024 18:03
- Enabled Immediate Command list usage per queue given Intel DG2 HW.
- Removed default setting of false on windows.
- Added check to only enable this default given a minimum driver
  version.

Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>
@nrspruit nrspruit force-pushed the dg2_immediate_default branch from a6d049f to 6cd6338 Compare August 26, 2024 19:08
@nrspruit nrspruit marked this pull request as ready for review August 26, 2024 20:05
@nrspruit nrspruit added the v0.10.x Include in the v0.10.x release label Aug 26, 2024
@nrspruit
Copy link
Contributor Author

E2E failure is unrelated to this change.

nrspruit added a commit to nrspruit/llvm that referenced this pull request Aug 26, 2024
-pre-commit PR for oneapi-src/unified-runtime#1951

Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>
@nrspruit nrspruit added the ready to merge Added to PR's which are ready to merge label Aug 26, 2024
@nrspruit
Copy link
Contributor Author

Based on CI results here: intel/llvm#15031 && intel/llvm#15054 there are no failures caused by this change. The Jenkins jobs are randomly failing.

@nrspruit nrspruit removed the ready to merge Added to PR's which are ready to merge label Aug 26, 2024
@nrspruit
Copy link
Contributor Author

awaiting one more re-review before ready to merge.

@nrspruit nrspruit added the ready to merge Added to PR's which are ready to merge label Aug 27, 2024
@omarahmed1111 omarahmed1111 merged commit a4e2219 into oneapi-src:main Aug 27, 2024
omarahmed1111 pushed a commit to nrspruit/llvm that referenced this pull request Aug 27, 2024
-pre-commit PR for oneapi-src/unified-runtime#1951

Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>
omarahmed1111 pushed a commit to omarahmed1111/unified-runtime that referenced this pull request Sep 10, 2024
[L0] Enable Immediate Command List by default given Intel DG2
kbenzie pushed a commit that referenced this pull request Nov 4, 2024
[L0] Enable Immediate Command List by default given Intel DG2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conformance Conformance test suite issues. level-zero L0 adapter specific issues ready to merge Added to PR's which are ready to merge v0.10.x Include in the v0.10.x release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants