Major regression in in #16809

GunnarFarneback · 2016-06-07T09:19:44Z

I found a large regression between 0.4 and 0.5 in some code of mine, which profiling localized to containment testing. This simplified test case shows a clear picture:

function specialized_in(x::Int, y::Vector{Int})
    for k = 1:length(y)
        if x == y[k]
            return true
        end
    end
    return false
end

function benchmark1(n)
    x = rand(Int, 20)
    s = 0
    for k = 1:n
        if rand(Int) in x
            s += 1
        end
    end
    return s
end

function benchmark2(n)
    x = rand(Int, 20)
    s = 0
    for k = 1:n
        if specialized_in(rand(Int), x)
            s += 1
        end
    end
    return s
end

@time benchmark1(1)
@time benchmark2(1)
@time benchmark1(100_000_000)
@time benchmark2(100_000_000)

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.5 (2016-03-18 00:58 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-linux-gnu

julia> include("/tmp/test.jl")
  0.105317 seconds (70.71 k allocations: 3.181 MB, 5.77% gc time)
  0.013302 seconds (4.11 k allocations: 209.523 KB)
  2.547684 seconds (10 allocations: 560 bytes)
  2.442474 seconds (10 allocations: 560 bytes)

              _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0-dev+4561 (2016-06-06 21:03 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 23ad581* (0 days old master)
|__/                   |  x86_64-linux-gnu

julia> include("/tmp/test.jl")
  0.070273 seconds (53.54 k allocations: 2.304 MB)
  0.010838 seconds (3.92 k allocations: 180.852 KB)
 49.842859 seconds (7 allocations: 560 bytes)
  2.507180 seconds (7 allocations: 560 bytes)

vtjnash · 2016-06-07T16:13:04Z

same issue as seen in #16185

we could fix this by doing something like:

diff --git a/base/reduce.jl b/base/reduce.jl
index f83c86b..585876d 100644
--- a/base/reduce.jl
+++ b/base/reduce.jl
@@ -192,14 +192,14 @@ sc_finish(::typeof(|))  = false

 ## short-circuiting (sc) mapreduce definitions

-function mapreduce_sc_impl(f, op, itr::AbstractArray)
+function mapreduce_sc_impl(f, op::ShortCircuiting, itr::AbstractArray)
     for x in itr
         shortcircuits(op, f(x)) && return shorted(op)
     end
     return sc_finish(op)
 end

-function mapreduce_sc_impl(f, op, itr)
+function mapreduce_sc_impl(f, op::ShortCircuiting, itr)
     for x in itr
         shortcircuits(op, f(x)) && return shorted(op)
     end

but I'm inclined just to go with simplifying the code instead:

diff --git a/base/reduce.jl b/base/reduce.jl
index f83c86b..90ba6d1e 100644
--- a/base/reduce.jl
+++ b/base/reduce.jl
@@ -181,29 +181,19 @@ end

 const ShortCircuiting = Union{typeof(&), typeof(|)}

-shortcircuits(::typeof(&), x::Bool) = !x
-shortcircuits(::typeof(|),  x::Bool) =  x
-
-shorted(::typeof(&)) = false
-shorted(::typeof(|))  = true
-
-sc_finish(::typeof(&)) = true
-sc_finish(::typeof(|))  = false
-
 ## short-circuiting (sc) mapreduce definitions
-
-function mapreduce_sc_impl(f, op, itr::AbstractArray)
+function mapreduce_sc_impl(f, op::typeof(&), itr)
     for x in itr
-        shortcircuits(op, f(x)) && return shorted(op)
+        f(x) || return false
     end
-    return sc_finish(op)
+    return true
 end

-function mapreduce_sc_impl(f, op, itr)
+function mapreduce_sc_impl(f, op::typeof(|), itr)
     for x in itr
-        shortcircuits(op, f(x)) && return shorted(op)
+        f(x) && return true
     end
-    return sc_finish(op)
+    return false
 end

 # mapreduce_sc tests if short-circuiting is safe;

JeffBezanson · 2016-06-07T16:50:55Z

Ha--- I have that exact change (the second one) in my working tree.

timholy · 2016-06-07T20:05:05Z

There's an argument to avoid short-circuiting here, at least for containers that don't contain duplicates:

When the item is in the container, on average half of the elements are checked before finding it. So in the best-case scenario short-circuiting buys you a factor of 2, but at the cost of introducing a branch in the inner loop.
When the item is not in the container, each element of the container has to be checked anyway.

If you don't short-circuit, there is no branch, and you can use @simd. As soon as you can vectorize more than two elements simultaneously, it's a clear win to skip short-circuiting.

Containers with lots of duplicates mess with that, however.

Keno · 2016-06-07T20:21:01Z

Seems like even with the short circuit the code should still be vectorizable.

nalimilan · 2016-06-07T21:05:34Z

Containers with lots of duplicates mess with that, however.

Sounds like an argument for a specialized function to use when no duplicates are present.

timholy · 2016-06-07T21:15:43Z

Seems like even with the short circuit the code should still be vectorizable.

I guess with the new hand-vectorization, that might be true (I haven't played with it yet).

Sounds like an argument for a specialized function to use when no duplicates are present.

Unless one introduces a special type, you can't check this without visiting all the elements anyway. From a codegen perspective, a special type would almost surely be a bad idea if it's used widely---of course, a narrow application where it really matters might be different.

nalimilan · 2016-06-07T21:18:57Z

I just meant a separate function that people would call when they know there are likely very few or no duplicates.

timholy · 2016-06-07T21:21:24Z

I just meant a separate function that people would call when they know there are likely very few or no duplicates.

That's essentially searchsorted, which of course imposes an additional constraint but which introduces massive gains (logN vs N). That's clearly worth a separate method. For constant factors, I am less convinced it would be worth the added complexity.

Keno · 2016-06-07T21:21:31Z

I guess with the new hand-vectorization, that might be true (I haven't played with it yet).

Why shouldn't llvm be able to figure this out?

timholy · 2016-06-07T21:35:26Z

Would be lovely. Currently LLVM can't even vectorize

function anyneg_nobranch(v)
    hasneg = false
    @simd for i in eachindex(v)
        @inbounds val = v[i]
        hasneg |= val < 0
    end
    hasneg
end

although it does handle

function anyneg_nobranch(v)
    nneg = 0
    @simd for i in eachindex(v)
        @inbounds val = v[i]
        nneg += val < 0
    end
    nneg > 0
end

It's fairly clear why the first is harder than the second, but it does suggest that the vectorization is not yet Sufficiently Smart.

fix #16809, perf regression in `in`

nalimilan · 2016-06-08T07:13:01Z

@timholy Maybe that's the same issue as #16739?

ivarne added performance Must go faster regression Regression in behavior compared to a previous version labels Jun 7, 2016

JeffBezanson self-assigned this Jun 7, 2016

JeffBezanson closed this as completed in e20b029 Jun 7, 2016

JeffBezanson added a commit that referenced this issue Jun 7, 2016

Merge pull request #16816 from JuliaLang/jb/fix16809

0d1b4b6

fix #16809, perf regression in `in`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major regression in in #16809

Major regression in in #16809

GunnarFarneback commented Jun 7, 2016

vtjnash commented Jun 7, 2016

JeffBezanson commented Jun 7, 2016

timholy commented Jun 7, 2016

Keno commented Jun 7, 2016

nalimilan commented Jun 7, 2016

timholy commented Jun 7, 2016

nalimilan commented Jun 7, 2016

timholy commented Jun 7, 2016

Keno commented Jun 7, 2016

timholy commented Jun 7, 2016

nalimilan commented Jun 8, 2016

Major regression in in #16809

Major regression in in #16809

Comments

GunnarFarneback commented Jun 7, 2016

vtjnash commented Jun 7, 2016

JeffBezanson commented Jun 7, 2016

timholy commented Jun 7, 2016

Keno commented Jun 7, 2016

nalimilan commented Jun 7, 2016

timholy commented Jun 7, 2016

nalimilan commented Jun 7, 2016

timholy commented Jun 7, 2016

Keno commented Jun 7, 2016

timholy commented Jun 7, 2016

nalimilan commented Jun 8, 2016