Improve inferability of unique #20317

pabloferz · 2017-01-29T22:42:20Z

Fixes #20105 and supersedes #20106

martinholters · 2017-01-30T08:41:53Z

Is it worth having a helper function that takes the iterator state, seen and out as arguments and calls itself recursively whenever the type of seen and out has to change to be type-stable within the loop?

pabloferz · 2017-01-30T16:30:34Z

I compared the performance of this PR vs. such approach (unique_recursive) in the following case

N = 1000
A = [ones(Int,N);ones(N);im*ones(Int,N);ones(BigInt,N);trues(N);ones(BigFloat,N);ones(String,N)]

and I see

@benchmark $unique(x for x in $A)
BenchmarkTools.Trial: 
  memory estimate:  237.44 kb
  allocs estimate:  10036
  --------------
  minimum time:     1.091 ms (0.00% GC)
  median time:      1.134 ms (0.00% GC)
  mean time:        1.229 ms (6.93% GC)
  maximum time:     9.750 ms (85.19% GC)
  --------------
  samples:          4064
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

@benchmark $unique_recursive(x for x in $A)
BenchmarkTools.Trial: 
  memory estimate:  237.50 kb
  allocs estimate:  10040
  --------------
  minimum time:     1.032 ms (0.00% GC)
  median time:      1.076 ms (0.00% GC)
  mean time:        1.163 ms (7.08% GC)
  maximum time:     9.652 ms (87.88% GC)
  --------------
  samples:          4293
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

I'm not too convinced that the difference is going to be that significant in general, but if anyone has an strong opinion in favor of the "split" approach I can consider it.

martinholters · 2017-01-31T07:47:05Z

If that's all there is to gain it's probably not worth the extra complication. Probably the code generated for out::Array{T,1} where T and seen::Set{T} where T is about as good as if T was fixed to an abstract type. After all, in both cases the involved operations either do not depend on the element type or would have to do dynamic dispatch based on the actual type of an individual element, anyway.

BTW, I think you should have done @benchmark unique($(x for x in A)) so as not to benchmark the generator instantiation. I doubt it makes any significant difference, though.

Sacha0 · 2017-01-31T18:52:27Z

Might be worth benchmarking for small N as well? Best!

Sacha0 · 2017-01-31T18:55:36Z

test/sets.jl

@@ -212,6 +212,9 @@ u = unique([1,1,2])
 @test length(u) == 2
 @test unique(iseven, [5,1,8,9,3,4,10,7,2,6]) == [5,8]
 @test unique(n->n % 3, [5,1,8,9,3,4,10,7,2,6]) == [5,1,9]
+# issue 20105
+@test @inferred(unique(x for x in 1:1)) == [1]
+@test unique(x for x in Any[1,1.0])::Vector{Real} == [1]


Perhaps also worth testing the expected output types? Best!

Do you mean to also type assert the first one?

Moreso the second. Did the ::Vector{Real} appear in the original? If so apologies, I missed it. Best!

Yes, it's been there all along, but don't sweat it! ;)

pabloferz · 2017-01-31T22:09:38Z

OK, here are more timings

`N`	`unique`	`unique_recursive`
1	9.180 μs	8.192 μs
10	21.219 μs	19.047 μs
100	118.783 μs	108.410 μs
1000	1.091 ms	1.032 ms

I still think that this is not much worse and is not worth the extra code complexity.

martinholters · 2017-02-01T09:32:29Z

The best case for the recursive implementation is probably A=ones(Real, N), i.e. if the eltype is abstract but all elements are actually of the same type, so that out and seen have an inferable concrete eltype in the loop.

pabloferz · 2017-02-01T20:38:54Z

OK, found a slightly more compact way to take the loop out than in the gist above and made that change here accordingly.

Sacha0 · 2017-02-02T01:45:16Z

base/set.jl

+    if !done(itr, i)
+        x, i = next(itr, i)
+    end
+    return unique(itr, out, seen, x, i)


Are you attempting to inline the body of _unique specialized for T here?

tkelman · 2017-02-02T12:15:38Z

base/set.jl

+    return unique(itr, out, seen, x, i)
+end
+
+@inline unique(itr, out, seen, x, i) = _unique(itr, out, seen, x, i)


we probably don't want to export this signature under the same public name?

Right. Fixed.

Sacha0 · 2017-02-03T04:05:08Z

base/set.jl

+    if !done(itr, i)
+        x, i = next(itr, i)
+    end
+    return _unique(itr, out, seen, x, i)


IIRC from earlier comments, you wish to inline the body of unique_from here. But IIUC, this construction will only inline the body of _unique here?

That was my thinking, but somehow I got the idea from a question I asked @yuyichao that the body of unique_from would also be inlined. Maybe I assumed it would be the case and it is not.

There was a typo. What I meant was that you should make unique_from always inline and use a wrapper to pick the specific signature that will actually inline the wrapper.

Yeah, I thought of that, but believed it wasn't a typo what you wrote. Thanks for clarifying. I'll fix this.

pabloferz · 2017-02-07T17:02:38Z

If no one objects, will merge once CI passes.

Sacha0 · 2017-02-07T18:36:57Z

base/set.jl

+    x, i = next(itr, i)
+    if !isleaftype(T)
+        S = typeof(x)
+        return _unique_from(itr, S[x], push!(Set{S}(), x), i)


Can push!(Set{S}(), x) simplify to Set{S}(x)?

Not really. The Set constructor is defined for iterables, so that won't work if x is not iterable. But Set{S}((x,)) should work.

Sacha0

Looks great! Thanks Pablo! :)

#20317 improved inference of unique, but problematic cases still arise for containers with known but abstract eltypes. Here, we short-circuit the `typejoin` when the return type is determined by the element type of the input container. For `unique(f, itr)`, this commit also allows the caller to supply `seen::Set` to circumvent the inference challenges.

JuliaLang#20317 improved inference of unique, but problematic cases still arise for containers with known but abstract eltypes. Here, we short-circuit the `typejoin` when the return type is determined by the element type of the input container. For `unique(f, itr)`, this commit also allows the caller to supply `seen::Set` to circumvent the inference challenges.

pabloferz mentioned this pull request Jan 29, 2017

unique now infers element type for Generator #20106

Closed

kshyatt added the compiler:inference Type inference label Jan 29, 2017

pabloferz force-pushed the pz/unique branch from 5771009 to e23fd7e Compare January 31, 2017 04:55

fredrikekre mentioned this pull request Jan 31, 2017

expanding functionality of sum_kbn function to iterable collections #20323 #20336

Merged

Sacha0 reviewed Jan 31, 2017

View reviewed changes

pabloferz force-pushed the pz/unique branch from e23fd7e to 06eac24 Compare February 1, 2017 20:33

Sacha0 reviewed Feb 2, 2017

View reviewed changes

tkelman reviewed Feb 2, 2017

View reviewed changes

pabloferz force-pushed the pz/unique branch from 06eac24 to c34c55a Compare February 2, 2017 16:58

Sacha0 reviewed Feb 3, 2017

View reviewed changes

pabloferz force-pushed the pz/unique branch 4 times, most recently from b73129f to c3dc349 Compare February 7, 2017 17:01

Sacha0 reviewed Feb 7, 2017

View reviewed changes

Sacha0 approved these changes Feb 7, 2017

View reviewed changes

Improve inferability of unique

b8d81c7

pabloferz force-pushed the pz/unique branch from c3dc349 to 3d80f59 Compare February 7, 2017 20:11

Put unique loop in another function

432478d

pabloferz force-pushed the pz/unique branch from 3d80f59 to 432478d Compare February 7, 2017 20:18

pabloferz merged commit e2cceb6 into JuliaLang:master Feb 8, 2017

pabloferz deleted the pz/unique branch February 8, 2017 04:42

timholy mentioned this pull request Jun 14, 2020

Improve inference for unique with abstract eltypes #36280

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve inferability of unique #20317

Improve inferability of unique #20317

pabloferz commented Jan 29, 2017

martinholters commented Jan 30, 2017

pabloferz commented Jan 30, 2017 •

edited

Loading

martinholters commented Jan 31, 2017

Sacha0 commented Jan 31, 2017 •

edited

Loading

Sacha0 Jan 31, 2017

pabloferz Jan 31, 2017

Sacha0 Feb 3, 2017

pabloferz Feb 3, 2017

pabloferz commented Jan 31, 2017 •

edited

Loading

martinholters commented Feb 1, 2017

pabloferz commented Feb 1, 2017

Sacha0 Feb 2, 2017

pabloferz Feb 2, 2017

tkelman Feb 2, 2017

pabloferz Feb 2, 2017

Sacha0 Feb 3, 2017

pabloferz Feb 3, 2017

yuyichao Feb 3, 2017

pabloferz Feb 3, 2017 •

edited

Loading

pabloferz commented Feb 7, 2017

Sacha0 Feb 7, 2017

pabloferz Feb 7, 2017

Sacha0 left a comment

Improve inferability of unique #20317

Improve inferability of unique #20317

Conversation

pabloferz commented Jan 29, 2017

martinholters commented Jan 30, 2017

pabloferz commented Jan 30, 2017 • edited Loading

martinholters commented Jan 31, 2017

Sacha0 commented Jan 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pabloferz commented Jan 31, 2017 • edited Loading

martinholters commented Feb 1, 2017

pabloferz commented Feb 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pabloferz Feb 3, 2017 • edited Loading

Choose a reason for hiding this comment

pabloferz commented Feb 7, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sacha0 left a comment

Choose a reason for hiding this comment

pabloferz commented Jan 30, 2017 •

edited

Loading

Sacha0 commented Jan 31, 2017 •

edited

Loading

pabloferz commented Jan 31, 2017 •

edited

Loading

pabloferz Feb 3, 2017 •

edited

Loading