Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major regression in in #16809

Closed
GunnarFarneback opened this issue Jun 7, 2016 · 11 comments
Closed

Major regression in in #16809

GunnarFarneback opened this issue Jun 7, 2016 · 11 comments
Assignees
Labels
performance Must go faster regression Regression in behavior compared to a previous version

Comments

@GunnarFarneback
Copy link
Contributor

I found a large regression between 0.4 and 0.5 in some code of mine, which profiling localized to containment testing. This simplified test case shows a clear picture:

function specialized_in(x::Int, y::Vector{Int})
    for k = 1:length(y)
        if x == y[k]
            return true
        end
    end
    return false
end

function benchmark1(n)
    x = rand(Int, 20)
    s = 0
    for k = 1:n
        if rand(Int) in x
            s += 1
        end
    end
    return s
end

function benchmark2(n)
    x = rand(Int, 20)
    s = 0
    for k = 1:n
        if specialized_in(rand(Int), x)
            s += 1
        end
    end
    return s
end

@time benchmark1(1)
@time benchmark2(1)
@time benchmark1(100_000_000)
@time benchmark2(100_000_000)
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.5 (2016-03-18 00:58 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-linux-gnu

julia> include("/tmp/test.jl")
  0.105317 seconds (70.71 k allocations: 3.181 MB, 5.77% gc time)
  0.013302 seconds (4.11 k allocations: 209.523 KB)
  2.547684 seconds (10 allocations: 560 bytes)
  2.442474 seconds (10 allocations: 560 bytes)
              _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0-dev+4561 (2016-06-06 21:03 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 23ad581* (0 days old master)
|__/                   |  x86_64-linux-gnu

julia> include("/tmp/test.jl")
  0.070273 seconds (53.54 k allocations: 2.304 MB)
  0.010838 seconds (3.92 k allocations: 180.852 KB)
 49.842859 seconds (7 allocations: 560 bytes)
  2.507180 seconds (7 allocations: 560 bytes)
@ivarne ivarne added performance Must go faster regression Regression in behavior compared to a previous version labels Jun 7, 2016
@JeffBezanson JeffBezanson self-assigned this Jun 7, 2016
@vtjnash
Copy link
Member

vtjnash commented Jun 7, 2016

same issue as seen in #16185

we could fix this by doing something like:

diff --git a/base/reduce.jl b/base/reduce.jl
index f83c86b..585876d 100644
--- a/base/reduce.jl
+++ b/base/reduce.jl
@@ -192,14 +192,14 @@ sc_finish(::typeof(|))  = false

 ## short-circuiting (sc) mapreduce definitions

-function mapreduce_sc_impl(f, op, itr::AbstractArray)
+function mapreduce_sc_impl(f, op::ShortCircuiting, itr::AbstractArray)
     for x in itr
         shortcircuits(op, f(x)) && return shorted(op)
     end
     return sc_finish(op)
 end

-function mapreduce_sc_impl(f, op, itr)
+function mapreduce_sc_impl(f, op::ShortCircuiting, itr)
     for x in itr
         shortcircuits(op, f(x)) && return shorted(op)
     end

but I'm inclined just to go with simplifying the code instead:

diff --git a/base/reduce.jl b/base/reduce.jl
index f83c86b..90ba6d1e 100644
--- a/base/reduce.jl
+++ b/base/reduce.jl
@@ -181,29 +181,19 @@ end

 const ShortCircuiting = Union{typeof(&), typeof(|)}

-shortcircuits(::typeof(&), x::Bool) = !x
-shortcircuits(::typeof(|),  x::Bool) =  x
-
-shorted(::typeof(&)) = false
-shorted(::typeof(|))  = true
-
-sc_finish(::typeof(&)) = true
-sc_finish(::typeof(|))  = false
-
 ## short-circuiting (sc) mapreduce definitions
-
-function mapreduce_sc_impl(f, op, itr::AbstractArray)
+function mapreduce_sc_impl(f, op::typeof(&), itr)
     for x in itr
-        shortcircuits(op, f(x)) && return shorted(op)
+        f(x) || return false
     end
-    return sc_finish(op)
+    return true
 end

-function mapreduce_sc_impl(f, op, itr)
+function mapreduce_sc_impl(f, op::typeof(|), itr)
     for x in itr
-        shortcircuits(op, f(x)) && return shorted(op)
+        f(x) && return true
     end
-    return sc_finish(op)
+    return false
 end

 # mapreduce_sc tests if short-circuiting is safe;

@JeffBezanson
Copy link
Member

Ha--- I have that exact change (the second one) in my working tree.

@timholy
Copy link
Member

timholy commented Jun 7, 2016

There's an argument to avoid short-circuiting here, at least for containers that don't contain duplicates:

  • When the item is in the container, on average half of the elements are checked before finding it. So in the best-case scenario short-circuiting buys you a factor of 2, but at the cost of introducing a branch in the inner loop.
  • When the item is not in the container, each element of the container has to be checked anyway.

If you don't short-circuit, there is no branch, and you can use @simd. As soon as you can vectorize more than two elements simultaneously, it's a clear win to skip short-circuiting.

Containers with lots of duplicates mess with that, however.

@Keno
Copy link
Member

Keno commented Jun 7, 2016

Seems like even with the short circuit the code should still be vectorizable.

@nalimilan
Copy link
Member

Containers with lots of duplicates mess with that, however.

Sounds like an argument for a specialized function to use when no duplicates are present.

@timholy
Copy link
Member

timholy commented Jun 7, 2016

Seems like even with the short circuit the code should still be vectorizable.

I guess with the new hand-vectorization, that might be true (I haven't played with it yet).

Sounds like an argument for a specialized function to use when no duplicates are present.

Unless one introduces a special type, you can't check this without visiting all the elements anyway. From a codegen perspective, a special type would almost surely be a bad idea if it's used widely---of course, a narrow application where it really matters might be different.

@nalimilan
Copy link
Member

I just meant a separate function that people would call when they know there are likely very few or no duplicates.

@timholy
Copy link
Member

timholy commented Jun 7, 2016

I just meant a separate function that people would call when they know there are likely very few or no duplicates.

That's essentially searchsorted, which of course imposes an additional constraint but which introduces massive gains (logN vs N). That's clearly worth a separate method. For constant factors, I am less convinced it would be worth the added complexity.

@Keno
Copy link
Member

Keno commented Jun 7, 2016

I guess with the new hand-vectorization, that might be true (I haven't played with it yet).

Why shouldn't llvm be able to figure this out?

@timholy
Copy link
Member

timholy commented Jun 7, 2016

Would be lovely. Currently LLVM can't even vectorize

function anyneg_nobranch(v)
    hasneg = false
    @simd for i in eachindex(v)
        @inbounds val = v[i]
        hasneg |= val < 0
    end
    hasneg
end

although it does handle

function anyneg_nobranch(v)
    nneg = 0
    @simd for i in eachindex(v)
        @inbounds val = v[i]
        nneg += val < 0
    end
    nneg > 0
end

It's fairly clear why the first is harder than the second, but it does suggest that the vectorization is not yet Sufficiently Smart.

JeffBezanson added a commit that referenced this issue Jun 7, 2016
@nalimilan
Copy link
Member

@timholy Maybe that's the same issue as #16739?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

7 participants