GitHub - carnaval/IACA.jl: wrapper around intel's arch code analyzer for Julia

rough work in progress

steps

use a version of Julia at least as new as 7897a9131bf
install IACA https://software.intel.com/en-us/articles/intel-architecture-code-analyzer-download
add @iaca macro around a loop or straight code
run analyze(f, sig)

options

analyze(..., iaca_path = "/path/to/iaca.sh")
analyze(..., arch = :haswell | :ivy_bridge | :sandy_bridge | :westmere | :nehalem)
analyze(..., analysis = :throughput | :latency)

julia> using IACA

julia> function f(y::Float64)
    x = 0.0
    @iaca for i=1:100
        x += 2*y*i
    end
    x
end

julia> function g(y::Float64)
    x1 = x2 = x3 = x4 = x5 = x6 = x7 = 0.0
    @iaca for i=1:7:100
        x1 += 2*y*i
        x2 += 2*y*(i+1)
        x3 += 2*y*(i+2)
        x4 += 2*y*(i+3)
        x5 += 2*y*(i+4)
        x6 += 2*y*(i+5)
        x7 += 2*y*(i+6)
    end
    x1 + x2 + x3 + x4 + x5 + x6 + x7
end

julia> println(analyze(f, Tuple{Float64}))
Intel(R) Architecture Code Analyzer Version - 2.1
Analyzed File - /tmp/tmplWkO5l
Binary Format - 64Bit
Architecture  - HSW
Analysis Type - Throughput

Throughput Analysis Report
--------------------------
Block Throughput: 12.00 Cycles       Throughput Bottleneck: InterIteration

Port Binding In Cycles Per Iteration:
---------------------------------------------------------------------------------------
|  Port  |  0   -  DV  |  1   |  2   -  D   |  3   -  D   |  4   |  5   |  6   |  7   |
---------------------------------------------------------------------------------------
| Cycles | 1.0    0.0  | 2.0  | 0.0    0.0  | 0.0    0.0  | 0.0  | 1.0  | 2.0  | 0.0  |
---------------------------------------------------------------------------------------

N - port number or number of cycles resource conflict caused delay, DV - Divider pipe (on port 0)
D - Data fetch pipe (on ports 2 and 3), CP - on a critical path
F - Macro Fusion with the previous instruction occurred
* - instruction micro-ops not bound to a port
^ - Micro Fusion happened
# - ESP Tracking sync uop was issued
@ - SSE instruction followed an AVX256 instruction, dozens of cycles penalty is expected
! - instruction not supported, was not accounted in Analysis

| Num Of |                    Ports pressure in cycles                     |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |  6  |  7  |    |
---------------------------------------------------------------------------------
|   2    |           | 1.0 |           |           |     | 1.0 |     |     | CP | vcvtsi2sd xmm2, xmm0, rax
|   1    |           |     |           |           |     |     | 1.0 |     |    | add rax, 0x1
|   1    | 1.0       |     |           |           |     |     |     |     | CP | vmulsd xmm2, xmm1, xmm2
|   1    |           | 1.0 |           |           |     |     |     |     | CP | vaddsd xmm0, xmm0, xmm2
|   1    |           |     |           |           |     |     | 1.0 |     |    | cmp rax, 0x65
|   0F   |           |     |           |           |     |     |     |     |    | jnz 0xffffffffffffffe1
Total Num Of Uops: 6

julia> println(analyze(g, Tuple{Float64}))
Intel(R) Architecture Code Analyzer Version - 2.1
Analyzed File - /tmp/tmpLPZdgg
Binary Format - 64Bit
Architecture  - HSW
Analysis Type - Throughput

Throughput Analysis Report
--------------------------
Block Throughput: 14.00 Cycles       Throughput Bottleneck: Port1, Port5

Port Binding In Cycles Per Iteration:
---------------------------------------------------------------------------------------
|  Port  |  0   -  DV  |  1   |  2   -  D   |  3   -  D   |  4   |  5   |  6   |  7   |
---------------------------------------------------------------------------------------
| Cycles | 7.0    0.0  | 14.0 | 0.0    0.0  | 0.0    0.0  | 0.0  | 14.0 | 2.0  | 0.0  |
---------------------------------------------------------------------------------------

N - port number or number of cycles resource conflict caused delay, DV - Divider pipe (on port 0)
D - Data fetch pipe (on ports 2 and 3), CP - on a critical path
F - Macro Fusion with the previous instruction occurred
* - instruction micro-ops not bound to a port
^ - Micro Fusion happened
# - ESP Tracking sync uop was issued
@ - SSE instruction followed an AVX256 instruction, dozens of cycles penalty is expected
! - instruction not supported, was not accounted in Analysis

| Num Of |                    Ports pressure in cycles                     |    |
|  Uops  |  0  - DV  |  1  |  2  -  D  |  3  -  D  |  4  |  5  |  6  |  7  |    |
---------------------------------------------------------------------------------
|   0*   |           |     |           |           |     |     |     |     |    | vxorps xmm0, xmm0, xmm0
|   2    |           | 1.0 |           |           |     | 1.0 |     |     | CP | vcvtsi2sd xmm0, xmm0, rcx
|   1    | 1.0       |     |           |           |     |     |     |     |    | vmulsd xmm0, xmm7, xmm0
|   1    |           | 1.0 |           |           |     |     |     |     | CP | vaddsd xmm8, xmm8, xmm0
|   1    |           |     |           |           |     | 1.0 |     |     | CP | lea rdx, ptr [rcx+0x1]
|   0*   |           |     |           |           |     |     |     |     |    | vxorps xmm0, xmm0, xmm0
|   2    |           | 1.0 |           |           |     | 1.0 |     |     | CP | vcvtsi2sd xmm0, xmm0, rdx
|   1    | 1.0       |     |           |           |     |     |     |     |    | vmulsd xmm0, xmm7, xmm0
|   1    |           | 1.0 |           |           |     |     |     |     | CP | vaddsd xmm1, xmm1, xmm0
|   1    |           |     |           |           |     | 1.0 |     |     | CP | lea rdx, ptr [rcx+0x2]
|   0*   |           |     |           |           |     |     |     |     |    | vxorps xmm0, xmm0, xmm0
|   2    |           | 1.0 |           |           |     | 1.0 |     |     | CP | vcvtsi2sd xmm0, xmm0, rdx
|   1    | 1.0       |     |           |           |     |     |     |     |    | vmulsd xmm0, xmm7, xmm0
|   1    |           | 1.0 |           |           |     |     |     |     | CP | vaddsd xmm2, xmm2, xmm0
|   1    |           |     |           |           |     | 1.0 |     |     | CP | lea rdx, ptr [rcx+0x3]
|   0*   |           |     |           |           |     |     |     |     |    | vxorps xmm0, xmm0, xmm0
|   2    |           | 1.0 |           |           |     | 1.0 |     |     | CP | vcvtsi2sd xmm0, xmm0, rdx
|   1    | 1.0       |     |           |           |     |     |     |     |    | vmulsd xmm0, xmm7, xmm0
|   1    |           | 1.0 |           |           |     |     |     |     | CP | vaddsd xmm3, xmm3, xmm0
|   1    |           |     |           |           |     | 1.0 |     |     | CP | lea rdx, ptr [rcx+0x4]
|   0*   |           |     |           |           |     |     |     |     |    | vxorps xmm0, xmm0, xmm0
|   2    |           | 1.0 |           |           |     | 1.0 |     |     | CP | vcvtsi2sd xmm0, xmm0, rdx
|   1    | 1.0       |     |           |           |     |     |     |     |    | vmulsd xmm0, xmm7, xmm0
|   1    |           | 1.0 |           |           |     |     |     |     | CP | vaddsd xmm4, xmm4, xmm0
|   1    |           |     |           |           |     | 1.0 |     |     | CP | lea rdx, ptr [rcx+0x5]
|   0*   |           |     |           |           |     |     |     |     |    | vxorps xmm0, xmm0, xmm0
|   2    |           | 1.0 |           |           |     | 1.0 |     |     | CP | vcvtsi2sd xmm0, xmm0, rdx
|   1    | 1.0       |     |           |           |     |     |     |     |    | vmulsd xmm0, xmm7, xmm0
|   1    |           | 1.0 |           |           |     |     |     |     | CP | vaddsd xmm5, xmm5, xmm0
|   1    |           |     |           |           |     | 1.0 |     |     | CP | lea rdx, ptr [rcx+0x6]
|   0*   |           |     |           |           |     |     |     |     |    | vxorps xmm0, xmm0, xmm0
|   2    |           | 1.0 |           |           |     | 1.0 |     |     | CP | vcvtsi2sd xmm0, xmm0, rdx
|   1    | 1.0       |     |           |           |     |     |     |     |    | vmulsd xmm0, xmm7, xmm0
|   1    |           | 1.0 |           |           |     |     |     |     | CP | vaddsd xmm6, xmm6, xmm0
|   1    |           |     |           |           |     |     | 1.0 |     |    | cmp rcx, rax
|   1    |           |     |           |           |     | 1.0 |     |     | CP | lea rcx, ptr [rcx+0x7]
|   1    |           |     |           |           |     |     | 1.0 |     |    | jnz 0xffffffffffffff60
Total Num Of Uops: 37

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
README.md		README.md
REQUIRE		REQUIRE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rough work in progress

steps

options

About

Releases

Packages

Contributors 2

Languages

carnaval/IACA.jl

Folders and files

Latest commit

History

Repository files navigation

rough work in progress

steps

options

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages