Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve inferability of unique #20317

Merged
merged 2 commits into from
Feb 8, 2017
Merged

Conversation

pabloferz
Copy link
Contributor

Fixes #20105 and supersedes #20106

@martinholters
Copy link
Member

Is it worth having a helper function that takes the iterator state, seen and out as arguments and calls itself recursively whenever the type of seen and out has to change to be type-stable within the loop?

@pabloferz
Copy link
Contributor Author

pabloferz commented Jan 30, 2017

I compared the performance of this PR vs. such approach (unique_recursive) in the following case

N = 1000
A = [ones(Int,N);ones(N);im*ones(Int,N);ones(BigInt,N);trues(N);ones(BigFloat,N);ones(String,N)]

and I see

@benchmark $unique(x for x in $A)
BenchmarkTools.Trial: 
  memory estimate:  237.44 kb
  allocs estimate:  10036
  --------------
  minimum time:     1.091 ms (0.00% GC)
  median time:      1.134 ms (0.00% GC)
  mean time:        1.229 ms (6.93% GC)
  maximum time:     9.750 ms (85.19% GC)
  --------------
  samples:          4064
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

@benchmark $unique_recursive(x for x in $A)
BenchmarkTools.Trial: 
  memory estimate:  237.50 kb
  allocs estimate:  10040
  --------------
  minimum time:     1.032 ms (0.00% GC)
  median time:      1.076 ms (0.00% GC)
  mean time:        1.163 ms (7.08% GC)
  maximum time:     9.652 ms (87.88% GC)
  --------------
  samples:          4293
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

I'm not too convinced that the difference is going to be that significant in general, but if anyone has an strong opinion in favor of the "split" approach I can consider it.

@martinholters
Copy link
Member

If that's all there is to gain it's probably not worth the extra complication. Probably the code generated for out::Array{T,1} where T and seen::Set{T} where T is about as good as if T was fixed to an abstract type. After all, in both cases the involved operations either do not depend on the element type or would have to do dynamic dispatch based on the actual type of an individual element, anyway.

BTW, I think you should have done @benchmark unique($(x for x in A)) so as not to benchmark the generator instantiation. I doubt it makes any significant difference, though.

@Sacha0
Copy link
Member

Sacha0 commented Jan 31, 2017

Might be worth benchmarking for small N as well? Best!

@@ -212,6 +212,9 @@ u = unique([1,1,2])
@test length(u) == 2
@test unique(iseven, [5,1,8,9,3,4,10,7,2,6]) == [5,8]
@test unique(n->n % 3, [5,1,8,9,3,4,10,7,2,6]) == [5,1,9]
# issue 20105
@test @inferred(unique(x for x in 1:1)) == [1]
@test unique(x for x in Any[1,1.0])::Vector{Real} == [1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also worth testing the expected output types? Best!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to also type assert the first one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreso the second. Did the ::Vector{Real} appear in the original? If so apologies, I missed it. Best!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's been there all along, but don't sweat it! ;)

@pabloferz
Copy link
Contributor Author

pabloferz commented Jan 31, 2017

OK, here are more timings

N unique unique_recursive
1 9.180 μs 8.192 μs
10 21.219 μs 19.047 μs
100 118.783 μs 108.410 μs
1000 1.091 ms 1.032 ms

I still think that this is not much worse and is not worth the extra code complexity.

@martinholters
Copy link
Member

The best case for the recursive implementation is probably A=ones(Real, N), i.e. if the eltype is abstract but all elements are actually of the same type, so that out and seen have an inferable concrete eltype in the loop.

@pabloferz
Copy link
Contributor Author

OK, found a slightly more compact way to take the loop out than in the gist above and made that change here accordingly.

base/set.jl Outdated
if !done(itr, i)
x, i = next(itr, i)
end
return unique(itr, out, seen, x, i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you attempting to inline the body of _unique specialized for T here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah

base/set.jl Outdated
return unique(itr, out, seen, x, i)
end

@inline unique(itr, out, seen, x, i) = _unique(itr, out, seen, x, i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably don't want to export this signature under the same public name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Fixed.

base/set.jl Outdated
if !done(itr, i)
x, i = next(itr, i)
end
return _unique(itr, out, seen, x, i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC from earlier comments, you wish to inline the body of unique_from here. But IIUC, this construction will only inline the body of _unique here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my thinking, but somehow I got the idea from a question I asked @yuyichao that the body of unique_from would also be inlined. Maybe I assumed it would be the case and it is not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a typo. What I meant was that you should make unique_from always inline and use a wrapper to pick the specific signature that will actually inline the wrapper.

Copy link
Contributor Author

@pabloferz pabloferz Feb 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought of that, but believed it wasn't a typo what you wrote. Thanks for clarifying. I'll fix this.

@pabloferz pabloferz force-pushed the pz/unique branch 4 times, most recently from b73129f to c3dc349 Compare February 7, 2017 17:01
@pabloferz
Copy link
Contributor Author

If no one objects, will merge once CI passes.

base/set.jl Outdated
x, i = next(itr, i)
if !isleaftype(T)
S = typeof(x)
return _unique_from(itr, S[x], push!(Set{S}(), x), i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can push!(Set{S}(), x) simplify to Set{S}(x)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. The Set constructor is defined for iterables, so that won't work if x is not iterable. But Set{S}((x,)) should work.

Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks Pablo! :)

@pabloferz pabloferz merged commit e2cceb6 into JuliaLang:master Feb 8, 2017
@pabloferz pabloferz deleted the pz/unique branch February 8, 2017 04:42
timholy added a commit that referenced this pull request Jun 14, 2020
#20317 improved inference of unique, but problematic cases still arise
for containers with known but abstract eltypes. Here, we short-circuit
the `typejoin` when the return type is determined by the element type
of the input container.

For `unique(f, itr)`, this commit also allows the caller to supply
`seen::Set` to circumvent the inference challenges.
timholy added a commit that referenced this pull request Jun 14, 2020
#20317 improved inference of unique, but problematic cases still arise
for containers with known but abstract eltypes. Here, we short-circuit
the `typejoin` when the return type is determined by the element type
of the input container.

For `unique(f, itr)`, this commit also allows the caller to supply
`seen::Set` to circumvent the inference challenges.
timholy added a commit that referenced this pull request Jun 15, 2020
#20317 improved inference of unique, but problematic cases still arise
for containers with known but abstract eltypes. Here, we short-circuit
the `typejoin` when the return type is determined by the element type
of the input container.

For `unique(f, itr)`, this commit also allows the caller to supply
`seen::Set` to circumvent the inference challenges.
JeffBezanson pushed a commit that referenced this pull request Jun 16, 2020
#20317 improved inference of unique, but problematic cases still arise
for containers with known but abstract eltypes. Here, we short-circuit
the `typejoin` when the return type is determined by the element type
of the input container.

For `unique(f, itr)`, this commit also allows the caller to supply
`seen::Set` to circumvent the inference challenges.
simeonschaub pushed a commit to simeonschaub/julia that referenced this pull request Aug 11, 2020
JuliaLang#20317 improved inference of unique, but problematic cases still arise
for containers with known but abstract eltypes. Here, we short-circuit
the `typejoin` when the return type is determined by the element type
of the input container.

For `unique(f, itr)`, this commit also allows the caller to supply
`seen::Set` to circumvent the inference challenges.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:inference Type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants