Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot bit shift vec of bool #53

Closed
sjkelly opened this issue Jul 17, 2019 · 4 comments
Closed

Cannot bit shift vec of bool #53

sjkelly opened this issue Jul 17, 2019 · 4 comments

Comments

@sjkelly
Copy link
Contributor

sjkelly commented Jul 17, 2019

Julia scalar:

julia> typeof(true)
Bool

julia> true << 0x03
8

SIMD.jl:

julia> using SIMD

julia> Vec((true,true,false,true)) << Vec((0x01,0x02,0x03,0x04))
ERROR: InexactError: Bool(8)
Stacktrace:
 [1] Type at ./float.jl:73 [inlined]
 [2] llvmconst(::Int64, ::Type{Bool}, ::Int64) at /home/steve/.julia/dev/SIMD/src/SIMD.jl:277
 [3] #s17#41 at /home/steve/.julia/dev/SIMD/src/SIMD.jl:859 [inlined]
 [4] #s17#41(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any) at ./none:0
 [5] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any,N} where N) at ./boot.jl:524
 [6] <<(::Vec{4,Bool}, ::Vec{4,UInt8}) at /home/steve/.julia/dev/SIMD/src/SIMD.jl:1030
 [7] top-level scope at REPL[20]:1

I have tried this with all Int types and it does not seem to work. It does however work when converting, but this generates less than ideal code.

I am just starting to look at the code here and I do not know if this is related:

llvmtype(::Type{Bool}) = "i8" # Julia represents Tuple{Bool} as [1 x i8]

It seems that the shift operator calls this llvmconst where it is initialized as i1 rather than i8 where julia uses i8 for Bool.

SIMD.jl/src/SIMD.jl

Lines 266 to 270 in 1245797

function llvmconst(::Type{Bool}, val)
Bool(val) === false && return "zeroinitializer"
typ = "i1"
"$typ $(Int(val))"
end

@eschnett
Copy link
Owner

Booleans do not support bitshift operations. When you call true << 0x03, then true is implicitly converted to Int first. Since conversions can be expensive for SIMD values, there is no automatic conversion (promotion) implemented in SIMD.jl.

You can convert the vector yourself, choosing the target integer size:

f8(x::Vec{4, Bool}) = convert(Vec{4, Int8}, x) << Vec((0x01,0x02,0x03,0x04))
f16(x::Vec{4, Bool}) = convert(Vec{4, Int16}, x) << Vec((0x01,0x02,0x03,0x04))

The first (using Int8) is efficient, as the conversion requires only a single machine instruction (and i8 1):

define { <4 x i8> } @julia_f8_16841({ <4 x i8> } addrspace(11)* nocapture nonnull readonly dereferenceable(4)) {
top:
          %1 = getelementptr inbounds { <4 x i8> }, { <4 x i8> } addrspace(11)* %0, i64 0, i32 0
          %2 = load <4 x i8>, <4 x i8> addrspace(11)* %1, align 1
          %3 = and <4 x i8> %2, <i8 1, i8 1, i8 1, i8 1>
     %tmp.i = shl <4 x i8> %3, <i8 1, i8 2, i8 3, i8 4>
  %.fca.0.insert = insertvalue { <4 x i8> } undef, <4 x i8> %tmp.i, 0
  ret { <4 x i8> } %.fca.0.insert
}

The second (using Int16) is much more expensive:

define { <4 x i16> } @julia_f16_16842({ <4 x i8> } addrspace(11)* nocapture nonnull readonly dereferenceable(4)) {
        %1 = getelementptr inbounds { <4 x i8> }, { <4 x i8> } addrspace(11)* %0, i64 0, i32 0, i64 0
          %2 = load i8, i8 addrspace(11)* %1, align 1
          %3 = zext i8 %2 to i16
        %4 = getelementptr inbounds { <4 x i8> }, { <4 x i8> } addrspace(11)* %0, i64 0, i32 0, i64 1
          %5 = load i8, i8 addrspace(11)* %4, align 1
          %6 = zext i8 %5 to i16
        %7 = getelementptr inbounds { <4 x i8> }, { <4 x i8> } addrspace(11)* %0, i64 0, i32 0, i64 2
          %8 = load i8, i8 addrspace(11)* %7, align 1
          %9 = zext i8 %8 to i16
        %10 = getelementptr inbounds { <4 x i8> }, { <4 x i8> } addrspace(11)* %0, i64 0, i32 0, i64 3
          %11 = load i8, i8 addrspace(11)* %10, align 1
          %12 = zext i8 %11 to i16
     %13 = insertelement <4 x i16> undef, i16 %3, i32 0
     %14 = insertelement <4 x i16> %13, i16 %6, i32 1
     %15 = insertelement <4 x i16> %14, i16 %9, i32 2
     %16 = insertelement <4 x i16> %15, i16 %12, i32 3
     %tmp.i = shl <4 x i16> %16, <i16 1, i16 2, i16 3, i16 4>
  %.fca.0.insert = insertvalue { <4 x i16> } undef, <4 x i16> %tmp.i, 0
  ret { <4 x i16> } %.fca.0.insert
}

One could argue that SIMD should auto-convert Bool to Int8 here, but I think that would only add confusion and would not be very useful in practice. A manual conversion seems the right approach here. (And a proper error message for trying to shift booleans.)

@sjkelly
Copy link
Contributor Author

sjkelly commented Jul 17, 2019

I understand why explicit is better than implicit here. Fortunately I am using UInt8 and in my updated example the codegen looks very good. Thank you for the explanation.

@eschnett
Copy link
Owner

You should be able to shave off the and instruction if you use Base.bitcast to convert Bool to UInt8. I don't know why Julia inserts the and i8 1 – this looks as if the internal representation of Bool could leave the unused bits undefined, instead of ensuring they are zero at all times.

@KristofferC
Copy link
Collaborator

Yeah, Bool is not really a computational type and Base just immediately promotes it in arithmetic. So just convert it is the right thing to do.

I tired the f8, f16 examples in #63 and it seems the codegen is good now (no and operation and normal code for i16).

julia> code_llvm(f8, Tuple{Vec{4, Bool}}; debuginfo=:none)

define { <4 x i8> } @julia_f8_17139({ <4 x i8> } addrspace(11)* nocapture nonnull readonly dereferenceable(4)) {
top:
  %1 = getelementptr inbounds { <4 x i8> }, { <4 x i8> } addrspace(11)* %0, i64 0, i32 0
  %2 = load <4 x i8>, <4 x i8> addrspace(11)* %1, align 4
  %3 = shl <4 x i8> %2, <i8 1, i8 2, i8 3, i8 4>
  %.fca.0.insert = insertvalue { <4 x i8> } undef, <4 x i8> %3, 0
  ret { <4 x i8> } %.fca.0.insert
}

julia> code_llvm(f16, Tuple{Vec{4, Bool}}; debuginfo=:none)

define { <4 x i16> } @julia_f16_17156({ <4 x i8> } addrspace(11)* nocapture nonnull readonly dereferenceable(4)) {
top:
  %1 = getelementptr inbounds { <4 x i8> }, { <4 x i8> } addrspace(11)* %0, i64 0, i32 0
  %2 = load <4 x i8>, <4 x i8> addrspace(11)* %1, align 4
  %3 = sext <4 x i8> %2 to <4 x i16>
  %4 = shl <4 x i16> %3, <i16 1, i16 2, i16 3, i16 4>
  %.fca.0.insert = insertvalue { <4 x i16> } undef, <4 x i16> %4, 0
  ret { <4 x i16> } %.fca.0.insert
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants