Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added generic function for abstractvectors #188

Closed
wants to merge 20 commits into from
Closed

Added generic function for abstractvectors #188

wants to merge 20 commits into from

Conversation

theogf
Copy link
Contributor

@theogf theogf commented Oct 26, 2020

Add supports for vector of vectors by implementing a generic function.
Fixes #152, closes #165, could be extended to #187 (not sure how to dispatch on Iterators in general, but the current implementation would support it)

@theogf
Copy link
Contributor Author

theogf commented Nov 3, 2020

Bump

@theogf
Copy link
Contributor Author

theogf commented Nov 16, 2020

Bump :)

Copy link
Member

@dkarrasch dkarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, except for some minor issues. I should say that this is very hard to tell if it will cause issues with what people may have in their packages, but at least I don't see any.

src/generic.jl Show resolved Hide resolved
src/generic.jl Show resolved Hide resolved
src/generic.jl Outdated Show resolved Hide resolved
test/test_dists.jl Show resolved Hide resolved
@theogf
Copy link
Contributor Author

theogf commented Dec 3, 2020

Thanks for the review, I included the changes you suggested. I am not sure what you meant by switching the loop orders. I tried to be consistent with the existing implementation, but it should not matter in the end

@dkarrasch
Copy link
Member

I meant to organize the loops in such a way that you write into r in memory-order, so have the 1:j-1 loop before the j+1:end loop, but otherwise keep what is done within the loops.

It seems the org ran out of Travis credits, so CI doesn't work anymore. We need to move to GithubActions...

@theogf
Copy link
Contributor Author

theogf commented Dec 3, 2020

Ok then I made the right change.

It seems the org ran out of Travis credits, so CI doesn't work anymore. We need to move to GithubActions...

I can make a PR for that too :)

@dkarrasch
Copy link
Member

I can make a PR for that too :)

I think I have something running on my fork. When tests pass there, I'll kick off the PR.

src/generic.jl Outdated Show resolved Hide resolved
@dkarrasch
Copy link
Member

I think we can do the same loop swap in the matrix version. It's formally the exact same amount of work, but may benefit from better memory access pattern. To the very least, it shouldn't be worse.

@theogf
Copy link
Contributor Author

theogf commented Dec 3, 2020

Tests are failing because of the tests on the AbstractVector{<:Real} as you said. I will just remove these tests for now as I guess it deserves a different PR.

@codecov-io
Copy link

codecov-io commented Dec 3, 2020

Codecov Report

Merging #188 (c1b1748) into master (e50899d) will decrease coverage by 0.01%.
The diff coverage is 95.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #188      +/-   ##
==========================================
- Coverage   95.05%   95.04%   -0.01%     
==========================================
  Files           8        8              
  Lines         688      727      +39     
==========================================
+ Hits          654      691      +37     
- Misses         34       36       +2     
Impacted Files Coverage Δ
src/Distances.jl 50.00% <0.00%> (-50.00%) ⬇️
src/metrics.jl 96.50% <50.00%> (-0.25%) ⬇️
src/generic.jl 98.49% <100.00%> (+0.57%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e50899d...c1b1748. Read the comment docs.

@dkarrasch
Copy link
Member

Awesome! Now let's apply the same trick to the good old matrix-version, and I think this is ready to go. For the Vector{<:Number} tests, I think we need to restrict to a subset of the metrics, and in particular exclude the weighted metrics (or adjust their parameter vector). Maybe by checking something like parameters(dist) === nothing? And the [Sq]Mahalanobis stuff failed.

Copy link
Member

@dkarrasch dkarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added scalar tests and the required dims keyword to pairwise. I suggest to replace the three occurrences of transpose in generic.jl by permutedims, since the first acts recursively, but the dims=1 vs dims=2 distinction is only about the orientation of the outermost array.

src/generic.jl Show resolved Hide resolved
src/generic.jl Show resolved Hide resolved
src/generic.jl Outdated Show resolved Hide resolved
test/test_dists.jl Show resolved Hide resolved
src/Distances.jl Outdated Show resolved Hide resolved
src/generic.jl Outdated Show resolved Hide resolved
src/generic.jl Show resolved Hide resolved
src/generic.jl Show resolved Hide resolved
src/generic.jl Outdated Show resolved Hide resolved
src/generic.jl Outdated Show resolved Hide resolved
src/generic.jl Show resolved Hide resolved
@theogf
Copy link
Contributor Author

theogf commented Dec 10, 2020

@dkarrasch @nalimilan Ok I think all issues are solved now.

Copy link
Member

@dkarrasch dkarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small things, and then I think this is ready.

@nalimilan I wonder if it helps FreqTables.jl if we made these pairwise methods fully generic for iterators, and removed the type constraints on a and b. Then the Distances.jl pairwise method would act metric::PreMetric.

Comment on lines +146 to +147
for i = (j + 1):n
r[i, j] = metric(a[i], a[j])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler may do that on its own, but let's be sure:

Suggested change
for i = (j + 1):n
r[i, j] = metric(a[i], a[j])
aj = a[j]
for i = (j + 1):n
r[i, j] = metric(a[i], aj)

@@ -489,6 +489,8 @@ js_divergence(a::AbstractArray, b::AbstractArray) = JSDivergence()(a, b)

result_type(dist::SpanNormDist, a::AbstractArray, b::AbstractArray) =
typeof(eval_op(dist, oneunit(eltype(a)), oneunit(eltype(b))))
result_type(dist::SpanNormDist, a::AbstractArray{<:AbstractArray}, b::AbstractArray{<:AbstractArray}) =
typeof(eval_op(dist, oneunit(eltype(first(a))), oneunit(eltype(first(b)))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
typeof(eval_op(dist, oneunit(eltype(first(a))), oneunit(eltype(first(b)))))
typeof(eval_op(dist, oneunit(eltype(eltype(a))), oneunit(eltype(eltype(b)))))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should rather override result_type(dist::SpanNormDist, a::Type, b::Type) and have global generic methods that call this? It sounds suboptimal to redefine methods for different inputs types.

@@ -34,7 +34,7 @@ data `a` and `b`.
"""
result_type(::PreMetric, ::Type, ::Type) = Float64 # fallback
result_type(dist::PreMetric, a::AbstractArray, b::AbstractArray) = result_type(dist, eltype(a), eltype(b))

result_type(dist::PreMetric, a::AbstractArray{<:AbstractArray}, b::AbstractArray{<:AbstractArray}) = result_type(dist, eltype(eltype(a)), eltype(eltype(b)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result_type(dist::PreMetric, a::AbstractArray{<:AbstractArray}, b::AbstractArray{<:AbstractArray}) = result_type(dist, eltype(eltype(a)), eltype(eltype(b)))
result_type(dist::PreMetric, a::AbstractArray{<:AbstractArray}, b::AbstractArray{<:AbstractArray}) =
result_type(dist, eltype(eltype(a)), eltype(eltype(b)))

Comment on lines +168 to +169
pairwise!(r::AbstractMatrix, metric::PreMetric,
a::AbstractVector, b::AbstractVector=a)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring should explain what happens for vectors. It should probably even be a separate docstring as there's almost nothing in common in the description.

@@ -489,6 +489,8 @@ js_divergence(a::AbstractArray, b::AbstractArray) = JSDivergence()(a, b)

result_type(dist::SpanNormDist, a::AbstractArray, b::AbstractArray) =
typeof(eval_op(dist, oneunit(eltype(a)), oneunit(eltype(b))))
result_type(dist::SpanNormDist, a::AbstractArray{<:AbstractArray}, b::AbstractArray{<:AbstractArray}) =
typeof(eval_op(dist, oneunit(eltype(first(a))), oneunit(eltype(first(b)))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should rather override result_type(dist::SpanNormDist, a::Type, b::Type) and have global generic methods that call this? It sounds suboptimal to redefine methods for different inputs types.

@dkarrasch
Copy link
Member

The more I think about it, the more I believe this really is not about AbstractVector methods, but about iterable objects that contain data points between which we want to compute distances. I included iterator-based pairwise and colwise methods now in #194. My result_type implementation generically handles arrays of arrays (or arrays), but avoids things like #188 (comment). I guess it kind of supersedes this PR (I also included that scalar pairwise tests).

@theogf theogf deleted the vector_support branch December 18, 2020 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pairwise(::PreMetric, ::AbstractVector{<:Real}) pairwise for arrays of arrays
4 participants