function is mysteriously slow unless a second (irrelevant) method is defined #52020

matthias314 · 2023-11-03T22:36:35Z

This seems to be a weird bug: A function with a single method that should be fast is instead slow and allocates. However, once a second method is defined (but not used), the first method suddenly becomes fast. Unfortunately, I haven't managed to create a MWE that doesn't need additional packages. Here is the example I have:

using SmallTypes, JuliaBug, BenchmarkTools

sc = SimplicialComplex([[1,2]])
x = SRMonom(sc, [1,0])

function g(xs::SRMonom...)
    sc = xs[1].sc
    all(x -> x.sc === sc, xs)    # A
    w = sum(x -> x.weight, xs)   # B1
    support(w) in sc             # B2
end

@btime g($x, $x)

g(x::Int, y::Int) = nothing

@btime g($x, $x)

The output is

  196.348 ns (6 allocations: 416 bytes)
  5.030 ns (0 allocations: 0 bytes)

(for master). I see no reason why the timings for the two calls to g should be different. The second one is as expected; the first one is mysteriously slow. I get essentially the same timings for Julia 1.9.0 and 1.10.0-beta3. If the line labeled A above is removed, both calls are fast. The same is true if instead B1 and B2 are removed.

The example uses the package SmallTypes.jl (which is not registered) and the module JuliaBug.jl.

Code for JuliaBug.jl

module JuliaBug

export SimplicialComplex, SRMonom

import Base: in, @propagate_inbounds

using SmallTypes

const max_vertices = 64

#
# SimplicialComplex
#

struct SimplicialComplex
    vertices::Int
    maximal::Vector{SmallSet}   # this has to be normalized
    hash::UInt
end

n_vertices(sc::SimplicialComplex) = sc.vertices

function hassupset(M::AbstractVector{SmallSet}, s::SmallSet)
    any(t -> issubset(s, t), M)
end

function _lt(s::SmallSet, t::SmallSet)
  ls = length(s)
  lt = length(t)
  ls > lt || (ls == lt && bitmask(s) < bitmask(t))
end

function SimplicialComplex(iter, n_vertices::Union{Int,Missing} = missing)
    isempty(iter) && error("at least one simplex must be given")
    # L = sort!(map(SmallSet, iter), by = length, rev = true)
    L = sort!(map(SmallSet, iter), lt = _lt)
    M = empty(L)  # SmallSet[]
    n = 0
    for s in L
        if !isempty(s)
            n = max(n, last(s))
    end
        hassupset(M, s) || push!(M, s)
    end
    if n_vertices !== missing
        if n_vertices >= n
        n = n_vertices
    else
            error("given number of vertices is too small")
    end
    end
    # TODO: check earlier
    n > max_vertices && error("not more than $max_vertices vertices allowed")
    SimplicialComplex(n, M, hash((M, n)))
end

in(s::SmallSet, sc::SimplicialComplex) = hassupset(sc.maximal, s)

#
# SRMonom
#

struct SRMonom
    sc::SimplicialComplex
    weight::SmallVec
    @propagate_inbounds function SRMonom(sc::SimplicialComplex, weight::AbstractVector{<:Integer})
        @boundscheck begin
            n_vertices(sc) == length(weight) || error("length of weight does not match number of vertices")
    end
        weight = SmallVec(weight)
        @boundscheck begin
            minimum(weight) >= 0 ||
            error("weights must be non-negative")
            hassupset(sc.maximal, support(weight)) ||
            error("simplicial complex does not support the given weight")
    end
        new(sc, weight)
    end
end

end  # module

I've had some other weird effects with functions similar to g. However, I couldn't make them reproducible.

Julia Version 1.11.0-DEV.823
Commit 5eaad532f9 (2023-11-02 21:35 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × Intel(R) Core(TM) i3-10110U CPU @ 2.10GHz
  WORD_SIZE: 64
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
  Threads: 1 on 4 virtual cores

The text was updated successfully, but these errors were encountered:

nsajko · 2023-11-04T06:36:55Z

Maybe related to the nondeterminism in type inference #35800 or #50735?

Unfortunately, I haven't managed to create a MWE that doesn't need additional packages.

You could start by copying all code required for the MWE into a single file (without package imports), then removing irrelevant code while checking that the issue still reproduces.

matthias314 · 2023-11-04T13:17:16Z

Maybe related to the nondeterminism in type inference #35800 or #50735?

Not sure. In those issues, @code_warntype and @inferred report problems that later disappear. In my case, the output of @code_warntype g(x, x) seems fine already before g(x, x) is called the first time:

MethodInstance for g(::SRMonom, ::SRMonom)
  from g(xs::SRMonom...) @ Main In[3]:1
Arguments
  #self#::Core.Const(g)
  xs::Tuple{SRMonom, SRMonom}
Locals
  #6::var"#6#8"
  #5::var"#5#7"{SimplicialComplex}
  w::SmallVec
  sc::SimplicialComplex
Body::Bool
1 ─ %1  = Base.getindex(xs, 1)::SRMonom
│         (sc = Base.getproperty(%1, :sc))
│   %3  = Main.:(var"#5#7")::Core.Const(var"#5#7")
│   %4  = Core.typeof(sc)::Core.Const(SimplicialComplex)
│   %5  = Core.apply_type(%3, %4)::Core.Const(var"#5#7"{SimplicialComplex})
│         (#5 = %new(%5, sc))
│   %7  = #5::var"#5#7"{SimplicialComplex}
│         Main.all(%7, xs)
│         (#6 = %new(Main.:(var"#6#8")))
│   %10 = #6::Core.Const(var"#6#8"())
│         (w = Main.sum(%10, xs))
│   %12 = Main.support(w)::SmallSet
│   %13 = (%12 in sc)::Bool
└──       return %13

Also, @inferred g(x, x) doesn't give an error.

I've created a single file (no modules besides BenchmarkTools and Test) that reproduces the error, in case that helps.

single file

# module SmallTypes

import Base: show, ==, hash, isempty, in, first, last, iterate,
    length, size, eltype, IndexStyle, getindex,
    zero, +, -, *, sum, prod, vcat,
    hasfastin, issubset, maximum, minimum

using Base: @propagate_inbounds

#
# small sets
#

const TS = UInt64
const NS = 8*sizeof(TS)

struct SmallSet <: AbstractSet{Int}
    mask::TS
    SmallSet(::Nothing, mask) = new(mask)
end

_SmallSet(mask) = SmallSet(nothing, mask)

bitmask(s::SmallSet) = s.mask

SmallSet(s::SmallSet) = s

@propagate_inbounds function _push(mask::TS, iter)
    for n in iter
        @boundscheck (n isa Integer && n > 0 && n <= NS) || error("SmallSet can only contain integers between 1 and $NS")
        mask |= one(TS) << (n-1)
    end
    _SmallSet(mask)
end

@propagate_inbounds SmallSet(iter) = _push(zero(TS), iter)

SmallSet(ns::Integer...) = SmallSet(ns)

function SmallSet(r::AbstractUnitRange{<:Integer})
    r0, r1 = first(r), last(r)
    (r0 > 0 && r1 <= NS) || error("SmallSet can only contain integers between 1 and $NS")
    if r1 < r0
        SmallSet()
    else
        m = one(TS) << (r1-r0+1)
        _SmallSet((m-1) << (r0-1))
    end
end

isempty(s::SmallSet) = iszero(bitmask(s))

eltype(::SmallSet) = Int

length(s::SmallSet) = count_ones(s.mask)

# hasfastin(::Type{SmallSet}) = true
# this is the default for AbstractSet

function in(n, s::SmallSet)
    if n isa Integer
        !iszero(s.mask & one(TS) << (n-1))
    else
        false
    end
end

function iterate(s::SmallSet, state = (s.mask, 0))
    mask, n = state
    if iszero(mask)
        return nothing
    else
        t = trailing_zeros(mask)+1
        n += t
        return (n, (mask >> t, n))
    end
end

function show(io::IO, s::SmallSet)
    print(io, "SmallSet(")
    join(io, s, ',')
    print(io, ')')
end

==(s::SmallSet, t::SmallSet) = s.mask == t.mask

# TODO: this does not give the same result as for Set
hash(s::SmallSet, h0::UInt) = hash(s.mask, h0)

#
# small vectors
#

const T = Int8
const N = UInt(8*sizeof(T))
const TT = UInt128
const L = div(sizeof(TT), sizeof(T))  # should this be UInt?
const M = one(TT) << N - one(TT)

conv_in(n) = (T(n) % UInt8) % TT
conv_out(n) = n % T

struct SmallVec <: AbstractVector{T}
    x::TT
    SmallVec(::Nothing, x) = new(x)
end

_SmallVec(x::TT) = SmallVec(nothing, x)

SmallVec(v::SmallVec) = v

@propagate_inbounds function SmallVec(iter)
    x = zero(TT)
    l = zero(UInt)
    for n in iter
        l += N
        x |= conv_in(n) << l
    end
    k = div(l, N)
    @boundscheck k < L || error("illegal size for SmallVec")
    x |= k
    _SmallVec(x)
end

SmallVec(xs::Integer...) = SmallVec(xs)

==(v::SmallVec, w::SmallVec) = v.x == w.x

# TODO: doesn't gives the same result as for Vector
hash(v::SmallVec, h0::UInt) = hash(v.x, h0)

eltype(::SmallVec) = T

length(v::SmallVec) =  (v.x & M) % Int

size(v::SmallVec) = (length(v),)

IndexStyle(::Type{SmallVec}) = IndexLinear()

@generated function _getindex(x::UInt128, k::Int64)
    quote
        $(Expr(:meta, :inline));
        Base.llvmcall("""
    %vx = bitcast i128 %0 to <16 x i8>
    %r = extractelement <16 x i8> %vx, i64 %1
    ret i8 %r
            """, Int8, Tuple{UInt128,Int64}, x, k)
    end
end

@propagate_inbounds function getindex(v::SmallVec, k::Integer)
    @boundscheck checkbounds(v, k)
    conv_out(_getindex(v.x, k % Int64))
end

@generated function _setindex(x::UInt128, n::Int8, k::Int64)
    quote
        $(Expr(:meta, :inline));
        Base.llvmcall("""
    %vx = bitcast i128 %0 to <16 x i8>
    %vr = insertelement <16 x i8> %vx, i8 %1, i64 %2
    %r = bitcast <16 x i8> %vr to i128
    ret i128 %r
            """, UInt128, Tuple{UInt128,Int8,Int64}, x, n, k)
    end
end

zero(v::SmallVec) = _SmallVec(length(v) % TT)

function show(io::IO, v::SmallVec)
    print(io, "Int8[")
    join(io, v, ',')
    print(io, ']')
end

@generated function _add(x::UInt128, y::UInt128)
    quote
        $(Expr(:meta, :inline));
        Base.llvmcall("""
    %vx = bitcast i128 %0 to <16 x i8>
    %vy = bitcast i128 %1 to <16 x i8>
    %vz = add <16 x i8> %vx, %vy
    %z = bitcast <16 x i8> %vz to i128
    %len = extractelement <16 x i8> %vx, i8 0
    %len0 = zext i8 %len to i128
    %r = sub i128 %z, %len0
    ret i128 %r
            """, UInt128, Tuple{UInt128,UInt128}, x, y)
    end
end

+(v::SmallVec) = v

@propagate_inbounds function +(v::SmallVec, w::SmallVec)
    @boundscheck length(v) == length(w) || error("dimensions don't match")
    _SmallVec(_add(v.x, w.x))
end

@generated function _support(x::UInt128)
    quote
        $(Expr(:meta, :inline));
        Base.llvmcall("""
    %vx = bitcast i128 %0 to <16 x i8>
    %vs = icmp ne <16 x i8> %vx, zeroinitializer
    %s = bitcast <16 x i1> %vs to i16
    %s1 = lshr i16 %s, 1
    %r = zext i16 %s1 to i64
    ret i64 %r
            """, UInt64, Tuple{UInt128}, x)
    end
end

support(v::SmallVec) = _SmallSet(_support(v.x))

# end # module

# module JuliaBug

import Base: in

#
# SimplicialComplex
#

struct SimplicialComplex
    maximal::Vector{SmallSet}
end

hassupset(M::AbstractVector{SmallSet}, s::SmallSet) = any(t -> issubset(s, t), M)

Base.in(s::SmallSet, sc::SimplicialComplex) = hassupset(sc.maximal, s)

#
# SRMonom
#

struct SRMonom
    sc::SimplicialComplex
    weight::SmallVec
end

# end # module

using BenchmarkTools, Test

sc = SimplicialComplex([SmallSet(1,2)])
x = SRMonom(sc, SmallVec(1,0))

function g(xs::SRMonom...)
    sc = xs[1].sc
    all(x -> x.sc === sc, xs)    # A
    w = sum(x -> x.weight, xs)   # B1
    support(w) in sc             # B2
end

println(VERSION)

@inferred g(x, x)

@btime g($x, $x)

g(x::Int, y::Int) = nothing

@btime g($x, $x)

nsajko · 2023-11-04T13:42:15Z

Changing the line function g(xs::SRMonom...) to function g(xs::Vararg{SRMonom,2}) fixes the performance issue, so this is a classic case of Julia not specializing for Vararg by default. This is covered in the performance tips, but it still is a footgun, I guess.

It's curious that defining the other method somehow results in the first method being specialized afterwards, though.

matthias314 · 2023-11-04T15:27:29Z

Thanks!

I've run the file posted in my previous comment with Julia 1.10.0-beta3 and --track-allocation=user. Since then, the allocations are reproducibly gone for 1.10 (without using any --track-allocation).

matthias314 closed this as completed Nov 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

function is mysteriously slow unless a second (irrelevant) method is defined #52020

function is mysteriously slow unless a second (irrelevant) method is defined #52020

matthias314 commented Nov 3, 2023

nsajko commented Nov 4, 2023 •

edited

Loading

matthias314 commented Nov 4, 2023 •

edited

Loading

nsajko commented Nov 4, 2023

matthias314 commented Nov 4, 2023 •

edited

Loading

function is mysteriously slow unless a second (irrelevant) method is defined #52020

function is mysteriously slow unless a second (irrelevant) method is defined #52020

Comments

matthias314 commented Nov 3, 2023

nsajko commented Nov 4, 2023 • edited Loading

matthias314 commented Nov 4, 2023 • edited Loading

nsajko commented Nov 4, 2023

matthias314 commented Nov 4, 2023 • edited Loading

nsajko commented Nov 4, 2023 •

edited

Loading

matthias314 commented Nov 4, 2023 •

edited

Loading

matthias314 commented Nov 4, 2023 •

edited

Loading