Skip to content

Conversation

@bratpiorka
Copy link
Contributor

bump UMF tag to switch to 0.9.0-rc3 release - this branch contains all latest fixes

@bratpiorka bratpiorka marked this pull request as ready for review August 1, 2024 08:34
@bratpiorka bratpiorka requested a review from a team as a code owner August 1, 2024 08:34
Copy link
Contributor

@lukaszstolarczuk lukaszstolarczuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@github-actions
Copy link
Contributor

github-actions bot commented Aug 1, 2024

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/10195136649

@github-actions
Copy link
Contributor

github-actions bot commented Aug 1, 2024

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/10195136649
Job status: success. Test status: success.

Summary

Benchmark This PR baseline
api_overhead_benchmark_sycl SubmitKernel out of order 25.896 **23.082**
api_overhead_benchmark_sycl SubmitKernel in order 24.96 **22.972**
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 320.328 **298.574**
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 **214.715** 222.377
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 6.691 **6.408**
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 **3.063** 3.116
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 3.106 **2.806**
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 2.339 **2.322**
miscellaneous_benchmark_sycl VectorSum **859.309** 859.353
Velocity-Bench Hashtable 331.430759 **328.705328**
Velocity-Bench Bitcracker **35.7344** 35.7419
Velocity-Bench CudaSift **217.873** 218.846
Velocity-Bench Easywave **243** 246.0
Velocity-Bench QuickSilver 117.12 **117.06**
Velocity-Bench Sobel Filter 613.329 **610.354**

Charts

api_overhead_benchmark_sycl SubmitKernel out of order
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl SubmitKernel out of order
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)

        This PR (25.896 μs)   : crit, 0, 25

        baseline (23.082 μs)   :  0, 23

    -   : 0, 0

    -   : 0, 0

Loading
api_overhead_benchmark_sycl SubmitKernel in order
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl SubmitKernel in order
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)

        This PR (24.96 μs)   : crit, 0, 24

        baseline (22.972 μs)   :  0, 22

    -   : 0, 0

    -   : 0, 0

Loading
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QueueInOrderMemcpy(api=sycl<br>IsCopyOnly=0<br>sourcePlacement=Device<br>destinationPlacement=Device<br>size=1KB<br>count=100)

        This PR (320.328 μs)   : crit, 0, 320

        baseline (298.574 μs)   :  0, 298

    -   : 0, 0

    -   : 0, 0

Loading
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QueueInOrderMemcpy(api=sycl<br>IsCopyOnly=0<br>sourcePlacement=Host<br>destinationPlacement=Device<br>size=1KB<br>count=100)

        This PR (214.715 μs)   : crit, 0, 214

        baseline (222.377 μs)   :  0, 222

    -   : 0, 0

    -   : 0, 0

Loading
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QueueMemcpy(api=sycl<br>sourcePlacement=Device<br>destinationPlacement=Device<br>size=1KB)

        This PR (6.691 μs)   : crit, 0, 6

        baseline (6.408 μs)   :  0, 6

    -   : 0, 0

    -   : 0, 0

Loading
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240
    todayMarker off
    dateFormat  X
    axisFormat %s

    section StreamMemory(api=sycl<br>type=Triad<br>size=10KB<br>useEvents=0<br>contents=Zeros<br>memoryPlacement=Device)

        This PR (3.063 μs)   : crit, 0, 3

        baseline (3.116 μs)   :  0, 3

    -   : 0, 0

    -   : 0, 0

Loading
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section ExecImmediateCopyQueue(api=sycl<br>IsCopyOnly=1<br>MeasureCompletionTime=0<br>src=Device<br>dst=Device<br>size=1KB<br>ioq=0)

        This PR (3.106 μs)   : crit, 0, 3

        baseline (2.806 μs)   :  0, 2

    -   : 0, 0

    -   : 0, 0

Loading
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section ExecImmediateCopyQueue(api=sycl<br>IsCopyOnly=1<br>MeasureCompletionTime=0<br>src=Host<br>dst=Host<br>size=1KB<br>ioq=1)

        This PR (2.339 μs)   : crit, 0, 2

        baseline (2.322 μs)   :  0, 2

    -   : 0, 0

    -   : 0, 0

Loading
miscellaneous_benchmark_sycl VectorSum
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title miscellaneous_benchmark_sycl VectorSum
    todayMarker off
    dateFormat  X
    axisFormat %s

    section VectorSum(api=sycl<br>numberOfElementsX=512<br>numberOfElementsY=256<br>numberOfElementsZ=256)

        This PR (859.309 μs)   : crit, 0, 859

        baseline (859.353 μs)   :  0, 859

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench Hashtable
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Hashtable
    todayMarker off
    dateFormat  X
    axisFormat %s

    section hashtable

        This PR (331.430759 M keys/sec)   : crit, 0, 331

        baseline (328.705328 M keys/sec)   :  0, 328

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench Bitcracker
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Bitcracker
    todayMarker off
    dateFormat  X
    axisFormat %s

    section bitcracker

        This PR (35.7344 s)   : crit, 0, 35

        baseline (35.7419 s)   :  0, 35

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench CudaSift
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench CudaSift
    todayMarker off
    dateFormat  X
    axisFormat %s

    section cudaSift

        This PR (217.873 ms)   : crit, 0, 217

        baseline (218.846 ms)   :  0, 218

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench Easywave
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Easywave
    todayMarker off
    dateFormat  X
    axisFormat %s

    section easywave

        This PR (243 ms)   : crit, 0, 243

        baseline (246.0 ms)   :  0, 246

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench QuickSilver
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench QuickSilver
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QuickSilver

        This PR (117.12 MMS/CTT)   : crit, 0, 117

        baseline (117.06 MMS/CTT)   :  0, 117

    -   : 0, 0

    -   : 0, 0

Loading
Velocity-Bench Sobel Filter
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Sobel Filter
    todayMarker off
    dateFormat  X
    axisFormat %s

    section sobel_filter

        This PR (613.329 ms)   : crit, 0, 613

        baseline (610.354 ms)   :  0, 610

    -   : 0, 0

    -   : 0, 0

Loading

Details

SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),25.896,26.165,6.03%,22.159,417.970,[CPU],[us]

SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),24.960,25.021,5.77%,21.921,440.977,[CPU],[us]

QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),320.328,320.504,1.50%,302.537,728.410,[CPU],[us]

QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),214.715,201.361,22.94%,194.134,946.381,[CPU],[us]

QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),6.691,6.618,11.90%,6.163,72.608,[CPU],[us]

StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.063,3.068,3.38%,0.392,3.372,[CPU],[GB/s]

ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),3.106,3.102,10.95%,2.719,73.133,[CPU],[us]

ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),2.339,2.332,3.17%,2.221,9.739,[CPU],[us]

VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256)

Environment Variables:

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type
VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256),859.309,859.782,0.41%,818.933,871.393,[GPU],bw [GB/s]

hashtable

Environment Variables:

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.404965 s
331.430759 million keys/second

bitcracker

Environment Variables:

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

==================================
Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================
Type of attack: User Password
Psw per thread: 1
max_num_pswd_per_read: 60000
Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt
MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000
Kernel execution:
Effective passwords: 60000
Passwords Range:
npknpByH7N2m3OnLNH1X9DJxLrzIFWk
.....
dL_7uuf3QCz-c6K3xDu0

================================================
Bitcracker attack completed
Total passwords evaluated: 60000
Password not found!

time to subtract from total: 0.00422884 s
bitcracker - total time for whole calculation: 35.7344 s

cudaSift

Environment Variables:

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Output:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1263 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1176 1273 31.9305% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1214 1260 32.9623% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1232 1264 33.451% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1131 1263 30.7087% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1115 1262 30.2742% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1220 1252 33.1252% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1261 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1263 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1134 1278 30.7901% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1271 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1109 1257 30.1113% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1116 1266 30.3014% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1237 1276 33.5868% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1175 1252 31.9033% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1261 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1225 1260 33.2609% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1038 1255 28.1835% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1122 1272 30.4643% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1268 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1263 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1230 1263 33.3967% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1124 1269 30.5186% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1109 1261 30.1113% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1240 1275 33.6682% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1221 1253 33.1523% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1268 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1104 1261 29.9756% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1227 1262 33.3152% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1233 1269 33.4781% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1084 1259 29.4325% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1214 1266 32.9623% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1074 1268 29.161% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1137 1263 30.8716% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1068 1259 28.9981% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1236 1269 33.5596% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1176 1266 31.9305% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1114 1269 30.2471% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1123 1269 30.4914% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1060 1261 28.7809% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1234 1269 33.5053% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1157 1260 31.4146% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1223 1257 33.2066% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1229 1261 33.3695% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1226 1261 33.2881% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1242 1277 33.7225% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1116 1273 30.3014% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1092 1243 29.6497% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1224 1257 33.2338% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 1218 1253 33.0709% 1 2

Performing data verification
Data verification is SUCCESSFUL.

Avg workload time = 217.873 ms

easywave

Environment Variables:

Command:

/home/test-user/bench_workdir/easywave/easyWave_sycl -grid /home/test-user/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/test-user/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program
MAIN: Attempting to clean up previous eWave tsunami files
MAIN: Clean up completed
SYCL: SYCL Queue initialization successful
SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.3.29735+27)
SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero
MAIN: Program successfully completed

QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016
Lawrence Livermore National Security, LLC
All Rights Reserved
Quicksilver Version :
Quicksilver Git Hash :
MPI Version : 3.0
Number of MPI ranks : 1
Number of OpenMP Threads: 1
Number of OpenMP CPUs : 1

Loading params
Finished loading params
Simulation:
dt: 1e-08
fMax: 0.1
inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
energySpectrum:
boundaryCondition: octant
loadBalance: 1
cycleTimers: 0
debugThreads: 0
lx: 100
ly: 100
lz: 100
nParticles: 10000000
batchSize: 0
nBatches: 10
nSteps: 10
nx: 10
ny: 10
nz: 10
seed: 1029384756
xDom: 0
yDom: 0
zDom: 0
eMax: 20
eMin: 1e-09
nGroups: 230
lowWeightCutoff: 0.001
bTally: 1
fTally: 1
cTally: 1
coralBenchmark: 0
crossSectionsOut:

Geometry:
material: sourceMaterial
shape: brick
xMax: 100
xMin: 0
yMax: 100
yMin: 0
zMax: 100
zMin: 0

Material:
name: sourceMaterial
mass: 1000
nIsotopes: 10
nReactions: 9
sourceRate: 1e+10
totalCrossSection: 0.1
absorptionCrossSection: flat
fissionCrossSection: flat
scatteringCrossSection: flat
absorptionCrossSectionRatio: 0
fissionCrossSectionRatio: 0
scatteringCrossSectionRatio: 1

CrossSection:
name: flat
A: 0
B: 0
C: 0
D: 0
E: 1
nuBar: 2.4
setting GPU
setting parameters
Building partition 0
Building partition 1
Building partition 2
Building partition 3
Building MC_Domain 0
Building MC_Domain 1
Building MC_Domain 2
Building MC_Domain 3
Starting Consistency Check
Finished Consistency Check
Finished initMesh
Started copyMaterialDatabase_device
Finished copyMaterialDatabase_device
Finished copyNuclearData_device
Finished copyDomainDevice
cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize
0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 3.721360e-01 6.177350e-01 0.000000e+00
1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.650110e-01 7.592740e-01 0.000000e+00
2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 3.602990e-01 7.777950e-01 0.000000e+00
3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 3.846310e-01 8.323910e-01 0.000000e+00
4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 3.459540e-01 7.954090e-01 0.000000e+00
5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.410340e-01 7.721720e-01 0.000000e+00
6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 3.405390e-01 7.705490e-01 0.000000e+00
7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 3.443270e-01 7.967460e-01 0.000000e+00
8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 3.438110e-01 7.907660e-01 0.000000e+00
9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 3.436370e-01 7.791930e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative
Name number microSecs microSecs microSecs microSecs Efficiency
of calls min avg max stddev Rating
main 1 1.123e+07 1.123e+07 1.123e+07 0.000e+00 100.00
cycleInit 10 3.541e+06 3.541e+06 3.541e+06 0.000e+00 100.00
cycleTracking 10 7.692e+06 7.692e+06 7.692e+06 0.000e+00 100.00
cycleTracking_Kernel 104 4.945e+06 4.945e+06 4.945e+06 0.000e+00 100.00
cycleTracking_MPI 117 2.057e+05 2.057e+05 2.057e+05 0.000e+00 100.00
cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00
cycleFinalize 20 4.100e+02 4.100e+02 4.100e+02 0.000e+00 100.00
Figure Of Merit 117.12 [Num Mega Segments / Cycle Tracking Time]

sobel_filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload.
SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png
SYMN: Launching SYCL kernel with # of iterations: 5
time to subtract from total: 7.4503 s
sobelfilter - total time for whole calculation: 0.613329 s

@pbalcer pbalcer merged commit 4f2ce7f into oneapi-src:main Aug 1, 2024
@github-actions github-actions bot added the common Changes or additions to common utilities label Aug 1, 2024
@pbalcer pbalcer added the v0.10.x Include in the v0.10.x release label Aug 1, 2024
kbenzie pushed a commit that referenced this pull request Aug 6, 2024
bump UMF tag to switch to rc3 release
@kbenzie kbenzie mentioned this pull request Aug 6, 2024
53 tasks
AllanZyne added a commit to AllanZyne/unified-runtime that referenced this pull request Aug 26, 2024
commit fe18b4a
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Thu Aug 15 18:01:28 2024 +0800

    fix reviews

commit 74d30dc
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Wed Aug 14 14:38:57 2024 +0800

    fix comments

commit e264cc1
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Wed Aug 14 14:37:46 2024 +0800

    address comments

commit 3e3bd51
Merge: 864da64 e02d78b
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Tue Aug 13 00:46:39 2024 -0700

    Merge branch 'llvm' into review/yang/fix_dsan_destruction

commit e02d78b
Merge: e50a4dd c12957b
Author: Omar Ahmed <omar.ahmed@codeplay.com>
Date:   Fri Aug 9 15:41:55 2024 +0100

    Merge pull request oneapi-src#1933 from nrspruit/fix_driver_version_check

    [L0] Fix Driver Version check to use extension and tuple check

commit e50a4dd
Merge: 3c12bbc 6b373e3
Author: Omar Ahmed <omar.ahmed@codeplay.com>
Date:   Fri Aug 9 14:34:49 2024 +0100

    Merge pull request oneapi-src#1923 from sarnex/buildlog

    [L0] Return the build log on compilation failure

commit 3c12bbc
Merge: 83f7ad9 ac7eb17
Author: Omar Ahmed <omar.ahmed@codeplay.com>
Date:   Fri Aug 9 10:51:05 2024 +0100

    Merge pull request oneapi-src#1910 from Bensuo/sync_point

    [CUDA][HIP] Improve command-buffer sync points

commit 83f7ad9
Merge: ab9baf5 8fb6824
Author: Omar Ahmed <omar.ahmed@codeplay.com>
Date:   Thu Aug 8 11:11:13 2024 +0100

    Merge pull request oneapi-src#1860 from PietroGhg/pietro/fill

    [NATIVECPU] Fix pointer arithmetic in USMfill

commit ab9baf5
Merge: 1fef4e2 c571ec4
Author: Omar Ahmed <omar.ahmed@codeplay.com>
Date:   Thu Aug 8 11:09:15 2024 +0100

    Merge pull request oneapi-src#1911 from ProGTX/peter/xpti-static

    [CUDA] Don't import XPTI symbols in the plugin library

commit 1fef4e2
Merge: 2d3524e ca68aca
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Wed Aug 7 17:46:52 2024 +0200

    Merge pull request oneapi-src#1949 from pbalcer/ci-benches

    add info how to run benchmarks in CI

commit ca68aca
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Wed Aug 7 17:44:52 2024 +0200

    add info how to run benchmarks in CI

commit 2d3524e
Merge: 6b2e678 d6e93fa
Author: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Date:   Wed Aug 7 14:23:09 2024 +0100

    Merge pull request oneapi-src#1930 from oneapi-src/benie/no-import-in-pragma-region

    Make pragma region names joined by _

commit 6b2e678
Merge: d8058ed 6e295e1
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Wed Aug 7 13:58:53 2024 +0200

    Merge pull request oneapi-src#1944 from ldorau/CI_Add_possibility_to_start_manually_the_Nightly_GHA_workflow

    [CI] Add possibility to start manually the Nightly GHA workflow

commit d8058ed
Merge: 1445b66 4e4b04c
Author: Omar Ahmed <omar.ahmed@codeplay.com>
Date:   Wed Aug 7 12:30:47 2024 +0100

    Merge pull request oneapi-src#1843 from AllanZyne/review/yang/invalid_arguments

    [DeviceSanitizer] Support check invalid kernel argument

commit 1445b66
Merge: a89657c 355c4c3
Author: Omar Ahmed <omar.ahmed@codeplay.com>
Date:   Wed Aug 7 12:25:52 2024 +0100

    Merge pull request oneapi-src#1850 from Bensuo/native_enqueue_cosmetic

    Cosmetic tweaks to native enqueue spec

commit a89657c
Merge: 2355a7d be7057c
Author: Omar Ahmed <omar.ahmed@codeplay.com>
Date:   Wed Aug 7 12:01:33 2024 +0100

    Merge pull request oneapi-src#1699 from PietroGhg/pietro/usm_fixes

    [NATIVECPU] Implement urUSMGetMemAllocInfo and aligned alloc

commit 2355a7d
Merge: 450be81 b112525
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Wed Aug 7 13:00:51 2024 +0200

    Merge pull request oneapi-src#1945 from pbalcer/suppress-failures

    Suppress e2e test failures in L0 and OpenCL

commit d6e93fa
Author: Kenneth Benzie (Benie) <kenneth.benzie@intel.com>
Date:   Mon Aug 5 08:20:53 2024 -0700

    Make pragma region names joined by _

    On Windows the region name `usm import release (experimental)` cause
    compile errors in certain situations which look like this:

    ```
    error C7586: a 'import' directive must end with a ';' on the same line
    ```

    This patch replaces spaces with `_` in the region names to avoid this
    compile error.

commit b112525
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Wed Aug 7 10:40:21 2024 +0200

    Suppress e2e test failures in L0 and OpenCL

commit 450be81
Merge: 7f65917 b33c0e7
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Wed Aug 7 09:49:18 2024 +0200

    Merge pull request oneapi-src#1943 from kbenzie/benie/fix-coverity-issues

    Fix various Coverity defects

commit 6e295e1
Author: Lukasz Dorau <lukasz.dorau@intel.com>
Date:   Wed Aug 7 09:02:09 2024 +0200

    [CI] Add possibility to start manually the Nightly GHA workflow

    Add possibility to start manually the Nightly GHA workflow
    in order to check it on demand.

    Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>

commit 864da64
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Tue Aug 6 21:48:22 2024 -0700

    fix test

commit b33c0e7
Author: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Date:   Tue Aug 6 18:07:42 2024 +0100

    Coverity: Fix 14 instances of Resource leak

    Addresses the following defect CIDs; 1594026, 1594028, 1594029, 1594030,
    1594031, 1594032, 1594033, 1594034, 1594035, 1594036, 1594037, 1595372,
    1595373, and 1598546.

commit 9f2166c
Author: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Date:   Tue Aug 6 17:49:38 2024 +0100

    Coverity: Fix 1598473 Resource leak

commit ac7eb17
Author: Ewan Crawford <ewan@codeplay.com>
Date:   Wed Jul 31 12:54:45 2024 +0100

    [CUDA][HIP] Improve command-buffer sync points

    Several improvements to sync-point implementation
    in HIP and CUDA command-buffer adapters with
    additional CTS coverage to back it up.

    * In the CUDA/HIP adapters we assume that there is always
      a return sync-point passed by the user. However, this is not
      required by the UR API, so we should check that
      the return value is non-null before dereferencing.
    * The Fill helper function is can implement as fill as several commands
      for certain pattern sizes, we were creating a sync point for every
      internal command. This is not required, these commands from a linear
      dependency chain, so only the leaf command is required to be a sync
      point for future commands to depend on.
    * Remove `shared_ptr` from `CUgraphNode` objects stored for sync-points.
      `CUgraphNode` is a pointer type, and is managed by the CUDA driver
      runtime rather than us.
    * Simplify handling of return results. We don't always use the helper
      macro for returning the `ur_result_t` value no a function call fail,
      and also often unnecessarily use a variable to store return code.
    * Use `hipMemcpyDefault` for USM memcopy
    * Remove error from prefetch & advise

commit 7f65917
Merge: d2ffcce 8de9747
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Tue Aug 6 17:57:35 2024 +0200

    Merge pull request oneapi-src#1941 from pbalcer/cuda-runner-timeout

    add 1 hour time limit for e2e tests

commit e150934
Author: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Date:   Tue Aug 6 16:29:21 2024 +0100

    Coverity: Fix 1595225 Data race condition

commit d51935e
Author: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Date:   Tue Aug 6 15:43:29 2024 +0100

    Coverity: Fix 1594597 Dereference after null check

commit 8de9747
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Tue Aug 6 15:51:03 2024 +0200

    add 1 hour time limit for e2e tests

commit 132349c
Author: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Date:   Tue Aug 6 15:36:16 2024 +0100

    Coverity: Fix 1595785 Use of auto that causes a copy

commit 7a370a4
Author: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Date:   Tue Aug 6 15:15:19 2024 +0100

    Coverity: Fix 1595594 Copy instead of move

commit ee749e4
Author: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Date:   Tue Aug 6 14:46:03 2024 +0100

    Coverity: Fix 1595568, 1595570 Use of auto that causes a copy

    Use `const auto &` instead of `auto` in the mock parameter struct
    accesses.

commit d08fc6a
Author: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Date:   Tue Aug 6 13:39:01 2024 +0100

    Coverity: Fix 1594027 Uncaught exception

    The `UR_CHECK_ERROR()` utility macro in the CUDA adapter calls the
    `checkErrorUR()` utility function, this throws a `ur_result_t` which was
    not being caught.

commit 669797f
Author: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Date:   Tue Aug 6 12:55:26 2024 +0100

    Coverity: Fix 1574354 Uninitialized scalar field

    Always zero initialize the `ArrayDesc` data member of `SurfaceMem` in
    the CUDA adapter. Simplify other construction logic.

commit d2ffcce
Merge: 9024918 c5d8106
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Tue Aug 6 13:40:15 2024 +0200

    Merge pull request oneapi-src#1913 from igchor/separate_adapter

    [L0 v2] Make L0 v2 implementation a seperate adapter

commit 9024918
Merge: 2233030 b93ecbb
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Tue Aug 6 13:37:58 2024 +0200

    Merge pull request oneapi-src#1912 from igchor/latency_tracker_histogram_hdr

    [common] Histogram-based latency tracker

commit 2233030
Merge: 9deaabc b6454e4
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Tue Aug 6 13:36:16 2024 +0200

    Merge pull request oneapi-src#1932 from igchor/raii_l0

    [L0 v2] Add raii wrapper for L0 handles

commit 250f759
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Tue Aug 6 00:16:03 2024 -0700

    add mutex for adapter

commit fbecf2a
Merge: d67cfec c5d2175
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Tue Aug 6 00:02:09 2024 -0700

    Merge branch 'llvm' into review/yang/fix_dsan_destruction

commit d67cfec
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Mon Aug 5 23:52:19 2024 -0700

    update test

commit 982667e
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Mon Aug 5 19:24:03 2024 -0700

    fix repeat hold adapter handle

commit c12957b
Author: Neil R. Spruit <neil.r.spruit@intel.com>
Date:   Mon Aug 5 16:37:45 2024 -0700

    [L0] Fix Driver Version check to use extension and tuple check

    - Fixed the isDriverVersionNewerOrSimilar to use the new intel driver
      version string if it exists and use a tuple to compare the minimum and
    existing versions.
    - Moved version check within the platform handle.

    Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>

commit 9deaabc
Merge: 84f5e70 ca2916e
Author: Omar Ahmed <omar.ahmed@codeplay.com>
Date:   Mon Aug 5 21:02:44 2024 +0100

    Merge pull request oneapi-src#1929 from oneapi-src/revert-1880-l0-native-enqueue

    Revert "[L0] L0 impl for enqueue native command"

commit b6454e4
Author: Igor Chorazewicz <igor.chorazewicz@intel.com>
Date:   Thu Jul 11 19:50:50 2024 +0000

    [L0 v2] Add raii wrapper for L0 handles

    that encapsulate lifetime management logic (including
    support for ownZeHandle).

commit b93ecbb
Author: Igor Chorazewicz <igor.chorazewicz@intel.com>
Date:   Thu May 9 02:21:53 2024 +0000

    [common] add latency tracker based on hdr_histogram

    This tracker allows for tracking min,max,mean,stdev and arbitrary percentile values.

    Calling TRACK_SCOPE_LATENCY(name) registers a latency tracker for a given scope.
    All latency measurements are collected to a per-thread histogram instance.
    When the program exits, all per-thread histograms (for the same scope) are
    agregated into a single histogram and all statistics are printed.

commit c5d8106
Author: Igor Chorazewicz <igor.chorazewicz@intel.com>
Date:   Wed Jul 31 23:41:26 2024 +0000

    [L0 v2] Make L0 v2 implementation a seperate adapter

    Initially, L0 v2 adapter was supposed to reside in a separate
    namespace but be a part of legacy L0 adapter (with runtime option
    to switch between executing on legacy or v2). However, this
    turns out to require a lot of changes in the legacy code to
    allow for function dispatching to legacy/v2 implementations of
    queue, event, etc.

    This approach allows us to keep the implementations separate while
    still resuing files when appropriate (e.g. for adapter.cpp or
    platform.cpp).

commit 6b373e3
Author: Sarnie, Nick <nick.sarnie@intel.com>
Date:   Fri Aug 2 08:32:55 2024 -0700

    [L0] Return the build log on compilation failure

    Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>

commit ca2916e
Author: Omar Ahmed <omarpiratee2010@gmail.com>
Date:   Mon Aug 5 15:42:34 2024 +0100

    Revert "[L0] L0 impl for enqueue native command"

commit 84f5e70
Merge: b5cd44c 721d523
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Mon Aug 5 15:58:40 2024 +0200

    Merge pull request oneapi-src#1927 from pbalcer/fix-scorecard

    fix scorecard job

commit 721d523
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Mon Aug 5 15:54:46 2024 +0200

    fix scorecard job

    The scorecard action must run on the official GitHub-hosted
    ubuntu runners...

commit b5cd44c
Merge: a25fc21 a2e35c0
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Mon Aug 5 15:30:00 2024 +0200

    Merge pull request oneapi-src#1922 from lukaszstolarczuk/bump-umf

    Bump UMF version with latest fixes

commit a25fc21
Merge: 65b4922 ae594ba
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Mon Aug 5 15:29:52 2024 +0200

    Merge pull request oneapi-src#1926 from oneapi-src/benie/force-libstdc++

    Add option to force use of libstdc++ on Linux

commit c571ec4
Author: Peter Žužek <peter@codeplay.com>
Date:   Mon Aug 5 14:27:54 2024 +0100

    [CUDA] Don't import XPTI symbols in the plugin library

    The CUDA plugin builds an XPTI file directly.
    By default the symbol visibility in that XPTI file is presumed
    to import symbols, but there are no XPTI symbols being exported,
    since XPTI is not built as a separate library.

    This causes a compilation failure on Windows.
    The fix is to define `XPTI_STATIC_LIBRARY`,
    which changes the visibility of symbols -
    on Windows this means no longer using `dllimport`
    (and neither using `dllexport`).

commit 65b4922
Merge: 9b93cb1 bcda0f8
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Mon Aug 5 15:27:16 2024 +0200

    Merge pull request oneapi-src#1921 from pbalcer/switch-runners

    switch ubuntu runners to a shared pool

commit a2e35c0
Author: Łukasz Stolarczuk <lukasz.stolarczuk@intel.com>
Date:   Fri Aug 2 16:57:04 2024 +0200

    Bump UMF version with latest fixes

commit ae594ba
Author: Kenneth Benzie (Benie) <kenneth.benzie@intel.com>
Date:   Mon Aug 5 05:07:46 2024 -0700

    Add option to force use of libstdc++ on Linux

    The UR_FORCE_LIBSTDCXX option defaults to OFF can be used in situations
    where the build is configured to use libc++ but the libstdc++ ABI is
    required for stability reasons.

commit bcda0f8
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Fri Aug 2 12:40:51 2024 +0200

    switch ubuntu runners to a shared pool

commit 9b93cb1
Merge: 96ae6b3 d7ea11f
Author: Omar Ahmed <omar.ahmed@codeplay.com>
Date:   Fri Aug 2 22:15:59 2024 +0100

    Merge pull request oneapi-src#1812 from nrspruit/fix_l0_program

    Fix L0 Program CTS failures

commit 96ae6b3
Merge: 27135eb 3972690
Author: Omar Ahmed <omar.ahmed@codeplay.com>
Date:   Fri Aug 2 18:56:44 2024 +0100

    Merge pull request oneapi-src#1810 from nrspruit/fix_l0_kernel_cts

    [L0] Fix kernel error handling and enumeration checking

commit d7ea11f
Author: Neil R. Spruit <neil.r.spruit@intel.com>
Date:   Wed Jul 10 13:28:21 2024 -0700

    Fix return value for multi device

    Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>

commit 7436827
Author: Neil R. Spruit <neil.r.spruit@intel.com>
Date:   Tue Jul 9 17:59:36 2024 -0700

    Fix Native Device Init

    Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>

commit cd4b111
Author: Neil R. Spruit <neil.r.spruit@intel.com>
Date:   Tue Jul 9 17:40:27 2024 -0700

    Fix multi device module/kernel access

    Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>

commit fa3a6a9
Author: Neil R. Spruit <neil.r.spruit@intel.com>
Date:   Tue Jul 2 12:41:00 2024 -0700

    [L0] Fix Get info Binaries And source and handle/pointer checks

    Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>

commit 64ad451
Author: Neil R. Spruit <neil.r.spruit@intel.com>
Date:   Tue Jul 2 11:01:22 2024 -0700

    [L0] Fix program get info

    Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>

commit 3972690
Author: Neil R. Spruit <neil.r.spruit@intel.com>
Date:   Tue Jul 2 09:40:40 2024 -0700

    [L0] Fix kernel error handling and enumeration checking

    - Fixed kernel create to free memory and close with nullptr
    - Fixed argument index checking for kernels and argument size checks
    - UR_KERNEL_INFO_NUM_REGS to be reported same as UR_KERNEL_INFO_NUM_ARGS

    Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>

commit 27135eb
Merge: a69e1b5 bfc7536
Author: Omar Ahmed <omar.ahmed@codeplay.com>
Date:   Fri Aug 2 15:08:27 2024 +0100

    Merge pull request oneapi-src#1896 from omarahmed1111/change-opencl-sampler-info-size

    Map ur_bool_t to cl_bool in sampler getinfo for opencl adapter

commit a69e1b5
Merge: 6539561 b816700
Author: Omar Ahmed <omar.ahmed@codeplay.com>
Date:   Fri Aug 2 14:24:18 2024 +0100

    Merge pull request oneapi-src#1906 from nrspruit/flex_gpu_copy_engine

    [L0] Add check for Intel Flex/Arc for disabling use of copy engines.

commit 6539561
Merge: 90b381c d3faf1a
Author: Kenneth Benzie (Benie) <k.benzie@codeplay.com>
Date:   Fri Aug 2 13:28:43 2024 +0100

    Merge pull request oneapi-src#1917 from oneapi-src/benie/mock-init-callbacks-earlier

    Initalize mock callbacks earlier

commit 90b381c
Merge: 4ae5a92 9b16bfc
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Fri Aug 2 14:18:33 2024 +0200

    Merge pull request oneapi-src#1797 from lukaszstolarczuk/update-badges

    Update badges (for active workflows) in README

commit 4ae5a92
Merge: 509035d 728fac6
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Fri Aug 2 12:48:55 2024 +0200

    Merge pull request oneapi-src#1918 from pbalcer/fix-pvc-feature

    update L0 e2e workflow

commit 728fac6
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Fri Jul 26 11:02:57 2024 +0200

    update L0 e2e workflow

    suppressing the latest failing tests

commit 5859e3c
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Fri Aug 2 01:40:59 2024 -0700

    fix crash

commit 509035d
Merge: c1d8162 cb5cb6e
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Fri Aug 2 09:09:30 2024 +0200

    Merge pull request oneapi-src#1883 from aarongreig/aaron/asanObjectLifetimeIssues

    Don't retain device handle references in sanitizer layer.

commit 56ed0b8
Merge: 9e6923f 3e762e0
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Thu Aug 1 23:30:05 2024 -0700

    Merge branch 'llvm' into review/yang/fix_dsan_destruction

commit cb5cb6e
Author: Aaron Greig <aaron.greig@codeplay.com>
Date:   Mon Jul 29 16:09:59 2024 +0100

    Add comment denoting change as a temporary fix.

commit 55539ac
Author: Aaron Greig <aaron.greig@codeplay.com>
Date:   Fri Jul 19 14:29:24 2024 +0100

    Don't retain device handle references in sanitizer layer.

commit c1d8162
Merge: 4f2ce7f 7ce7387
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Fri Aug 2 07:44:20 2024 +0200

    Merge pull request oneapi-src#1920 from zhaomaosu/devsan-add-missing-lib

    [DeviceSanitizer] Add missing required library

commit 7ce7387
Author: Maosu Zhao <maosu.zhao@intel.com>
Date:   Fri Aug 2 11:08:46 2024 +0800

    [DeviceSanitizer] Add missing required library

    Fix syclos post commit failure:
    https://github.com/intel/llvm/actions/runs/10196353773/job/28206962107

commit d3faf1a
Author: Kenneth Benzie (Benie) <kenneth.benzie@intel.com>
Date:   Thu Aug 1 04:35:19 2024 -0700

    Initalize mock callbacks earlier

    Avoid use after static destruction in sycl unittests by moving the
    initialization of `mock::callbacks` from static function scope to static
    global scope.

commit 4f2ce7f
Merge: 90180f4 ae03bf6
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Thu Aug 1 12:25:01 2024 +0200

    Merge pull request oneapi-src#1915 from bratpiorka/rrudnick_umf_rc3

    bump UMF tag to switch to rc3 release

commit ae03bf6
Author: Rafal Rudnicki <rafal.rudnicki@intel.com>
Date:   Thu Aug 1 10:25:15 2024 +0200

    bump UMF tag to switch to rc3 release

commit 90180f4
Merge: c5d2175 1ff321c
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Thu Aug 1 10:30:30 2024 +0200

    Merge pull request oneapi-src#1902 from pbalcer/benchmark-automation-2

    improve benchmarks automation

commit 4e4b04c
Merge: 7b04b92 bc1a28e
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Thu Aug 1 00:09:59 2024 -0700

    Merge branch 'llvm' into review/yang/invalid_arguments

commit 7b04b92
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Thu Aug 1 00:07:03 2024 -0700

    default enable

commit c5d2175
Merge: 99489ad c86beb6
Author: Omar Ahmed <omar.ahmed@codeplay.com>
Date:   Wed Jul 31 14:52:26 2024 +0100

    Merge pull request oneapi-src#1882 from przemektmalon/przemek/interop-map-memory

    [Bindless][Exp] Add interop memory mapping to USM.

commit 8fb6824
Merge: a4510ac 99489ad
Author: uwedolinsky <uwe@codeplay.com>
Date:   Wed Jul 31 13:27:42 2024 +0100

    Merge branch 'main' into pietro/fill

commit 99489ad
Merge: 3e762e0 3f13f69
Author: Omar Ahmed <omar.ahmed@codeplay.com>
Date:   Wed Jul 31 13:23:29 2024 +0100

    Merge pull request oneapi-src#1880 from hdelan/l0-native-enqueue

    [L0] L0 impl for enqueue native command

commit a4510ac
Merge: 385cd05 3e762e0
Author: Uwe Dolinsky <uwe@codeplay.com>
Date:   Wed Jul 31 12:46:38 2024 +0100

    Merge remote-tracking branch 'upstream/main' into pietro/fill

commit 3e762e0
Merge: c805a71 a2a053d
Author: Omar Ahmed <omar.ahmed@codeplay.com>
Date:   Wed Jul 31 12:26:34 2024 +0100

    Merge pull request oneapi-src#1884 from callumfare/callum/fix_printtrace

    Enable PrintTrace when SYCL UR tracing is enabled

commit 3f13f69
Merge: 716ee15 c805a71
Author: Hugh Delaney <hugh.delaney@codeplay.com>
Date:   Wed Jul 31 11:10:25 2024 +0100

    Merge branch 'main' into l0-native-enqueue

commit c805a71
Merge: 24d3e68 f566e5b
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Wed Jul 31 11:48:18 2024 +0200

    Merge pull request oneapi-src#1142 from lukaszstolarczuk/dockers-adapters

    Update and extend dockers

commit c86beb6
Author: Duncan Brawley <duncan.brawley@codeplay.com>
Date:   Tue Jul 30 15:44:27 2024 +0100

    Remove LegacyMessage and small formatting fix

commit b816700
Author: Neil R. Spruit <neil.r.spruit@intel.com>
Date:   Fri Jul 26 10:32:24 2024 -0700

    [L0] Add check for Intel Flex/Arc for disabling use of copy engines.

    Signed-off-by: Neil R. Spruit <neil.r.spruit@intel.com>

commit bfc7536
Author: omarahmed1111 <omar.ahmed@codeplay.com>
Date:   Thu Jul 25 11:58:18 2024 +0100

    Map ur_bool_t to cl_bool in opencl sampler getinfo

commit 6935b17
Author: Duncan Brawley <duncan.brawley@codeplay.com>
Date:   Tue Jul 30 13:20:36 2024 +0100

    Remote 'interop' keyword

commit b9bd031
Merge: c3baef7 47ab963
Author: Duncan Brawley <duncan.brawley@codeplay.com>
Date:   Tue Jul 30 12:59:42 2024 +0100

    merge 'origin/sycl' into przemek/interop-map-memory

commit a2a053d
Author: Callum Fare <callum@codeplay.com>
Date:   Tue Jul 23 16:30:13 2024 +0100

    Enable PrintTrace when SYCL UR tracing is enabled

commit 716ee15
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Tue Jul 30 11:00:24 2024 +0200

    always execute the command list between ops in native enqueue

commit 1528f4c
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Tue Jul 30 10:42:58 2024 +0200

    fix ordering between operations in native enqueue

commit 1ff321c
Author: Piotr Balcer <piotr.balcer@intel.com>
Date:   Fri Jul 26 14:15:34 2024 +0200

    improve benchmarks automation

    This patch:
     - adds an option to run a benchmark a few times to pick a median value
     - adds a timeout for benchmarks, set at 10 minutes by default.
     - adds an option to filter out benchmarks by name
     - adds an option to pick a specific compiler commit to test with
     - adds more compute benchmarks
     - fixes cudaSift
     - uses upstream Velocity Bench
     - adds a simple summary table with results

commit 352015f
Author: Hugh Delaney <hugh.delaney@codeplay.com>
Date:   Mon Jul 29 14:36:11 2024 +0100

    Update comment

    Clarify wording in comment.

commit 071223f
Author: Hugh Delaney <hugh.delaney@codeplay.com>
Date:   Mon Jul 29 12:17:59 2024 +0100

    Add extra synchronization

    Enqueue things to L0 before calling queueFinish.

commit 38d10ec
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Thu Jul 25 20:10:59 2024 -0700

    argument index start from 1

commit 5e1195e
Author: Hugh Delaney <hugh.delaney@codeplay.com>
Date:   Thu Jul 25 15:05:14 2024 +0100

    Update source/adapters/level_zero/enqueue_native.cpp

    Co-authored-by: Piotr Balcer <piotr.balcer@intel.com>

commit 632ba6b
Author: Hugh Delaney <hugh.delaney@codeplay.com>
Date:   Thu Jul 25 13:57:48 2024 +0100

    Update matchfile

commit 5b12e29
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Wed Jul 24 20:05:23 2024 -0700

    change log message

commit ef0e07f
Merge: 1391baa e161516
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Wed Jul 24 19:59:51 2024 -0700

    Merge branch 'llvm' into review/yang/invalid_arguments

commit 6111fb2
Author: Hugh Delaney <hugh.delaney@codeplay.com>
Date:   Wed Jul 24 12:46:37 2024 +0100

    For out of order queues call queue finish

    We can't use normal synchronization for out of order queues, so use
    brute force queueFinish.

commit 382325d
Author: Hugh Delaney <hugh.delaney@codeplay.com>
Date:   Wed Jul 24 12:43:42 2024 +0100

    Remove comment

commit 245afb3
Author: Hugh Delaney <hugh.delaney@codeplay.com>
Date:   Wed Jul 24 12:33:29 2024 +0100

    Update source/adapters/level_zero/enqueue_native.cpp

    Co-authored-by: Piotr Balcer <piotr.balcer@intel.com>

commit 7fbc58b
Author: Hugh Delaney <hugh.delaney@codeplay.com>
Date:   Wed Jul 24 11:35:26 2024 +0100

    Remove lock

commit d76742e
Author: Hugh Delaney <hugh.delaney@codeplay.com>
Date:   Wed Jul 24 11:33:19 2024 +0100

    Use ScopedCommandList to get thread local CL

    Same as the CUDA implementation. This means that any CommandList
    obtained through urQueueGetNativeHandle will be the same CommmandList
    that is synchronized with before the interop func call.

commit 8020612
Author: Hugh Delaney <hugh.delaney@codeplay.com>
Date:   Tue Jul 23 11:02:37 2024 +0100

    Add match files

    Add empty match files for level_zero.

commit 7d14d84
Author: Hugh Delaney <hugh.delaney@codeplay.com>
Date:   Mon Jul 22 16:46:49 2024 +0100

    Update entry point

    Thanks pbalcer for suggestion.

commit f2afed2
Author: Hugh Delaney <hugh.delaney@codeplay.com>
Date:   Mon Jul 22 14:21:58 2024 +0100

    Try L0 impl for enqueue native command

    Draft impl for discussion.

commit f566e5b
Author: Łukasz Stolarczuk <lukasz.stolarczuk@intel.com>
Date:   Wed Jul 24 11:07:23 2024 +0200

    [CI] Add more docker recipes

    and update the existing ones.

commit 1391baa
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Tue Jul 23 20:27:03 2024 -0700

    default disable

commit 237a4af
Merge: 88f2156 f11caf9
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Tue Jul 23 20:24:40 2024 -0700

    Merge branch 'llvm' into review/yang/invalid_arguments

commit 9e6923f
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Tue Jul 23 19:54:50 2024 -0700

    wip

commit c3baef7
Author: Przemek Malon <przemek.malon@codeplay.com>
Date:   Fri May 31 16:42:51 2024 +0100

    [Bindless][Exp] Add interop memory mapping to USM.

    This patch introduces `urBindlessImagesMapExternalLinearMemoryExp` to
    allow mapping interop memory to USM regions.

commit ae7dea6
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Mon Jul 22 01:40:27 2024 -0700

    using unordered_set

commit 6449148
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Sun Jul 21 22:42:41 2024 -0700

    Add UR_CALL

commit df5fd8b
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Sun Jul 21 22:36:23 2024 -0700

    fix destruction

commit 88f2156
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Fri Jul 19 00:07:31 2024 -0700

    fix crash

commit 0a916a1
Merge: cc40e85 38a575b
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Thu Jul 18 22:27:47 2024 -0700

    Merge branch 'main' into review/yang/invalid_arguments

commit be7057c
Author: PietroGhg <pietro.ghiglio@codeplay.com>
Date:   Mon Jun 3 16:30:29 2024 +0100

    Use pointer metadata

commit be3ed4c
Author: PietroGhg <pietro.ghiglio@codeplay.com>
Date:   Wed May 29 08:28:39 2024 +0100

    Implement urUSMGetMemAllocInfo and aligned alloc

commit cc40e85
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Wed Jul 17 03:30:33 2024 -0700

    fix lit

commit 4949b1a
Merge: 70dc457 6c2329e
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Tue Jul 16 20:03:15 2024 -0700

    Merge branch 'main' into review/yang/invalid_arguments

commit 70dc457
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Tue Jul 16 05:02:29 2024 -0700

    fix build

commit d2e4949
Merge: 5ba3170 7e38af7
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Mon Jul 15 22:58:49 2024 -0700

    Merge branch 'main' into review/yang/invalid_arguments

commit 385cd05
Author: PietroGhg <pietro.ghiglio@codeplay.com>
Date:   Mon Jul 8 13:24:38 2024 +0100

    Fix pointer arithmetic in USMfill

commit 355c4c3
Author: Ewan Crawford <ewan@codeplay.com>
Date:   Wed Jul 10 16:03:47 2024 +0100

    Cosmetic tweaks to native enqueue spec

    Pedantic things I noticed while reading spec.

commit 5ba3170
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Wed Jul 10 01:08:35 2024 -0700

    fix build

commit ee2a5f1
Author: Zhao, Yang2 <yang2.zhao@intel.com>
Date:   Wed Jul 10 01:07:02 2024 -0700

    chack invalid arg in kernel

commit 9b16bfc
Author: Łukasz Stolarczuk <lukasz.stolarczuk@intel.com>
Date:   Thu Jun 27 16:44:41 2024 +0200

    Update badges (for active workflows) in README

    E2E workflows run now as part of "Build and test" workflow.
    Add missing other workflows, to track if they are green or not.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Changes or additions to common utilities v0.10.x Include in the v0.10.x release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants