Cost Matrix function #104

davibarreira · 2021-06-12T00:48:22Z

This PR is the separate PR for the cost matrix function, which would be a helper function in order to make writing functions as sinkhorndivergence(c, mu, nu) where c is a function.

…imalTransport.jl into sinkhorndivergence

- Created the struct FiniteDiscreteMeasure, - Implemented two versions of sinkhorn_divergence, - Disabled the use of regularization on sinkhorn_divergence, - Fixed docstring with suggestions.

coveralls · 2021-06-12T00:57:06Z

Pull Request Test Coverage Report for Build 931149182

11 of 12 (91.67%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.07%) to 94.6%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/utils.jl	11	12	91.67%

Totals
Change from base Build 928044156:	-0.07%
Covered Lines:	473
Relevant Lines:	500

💛 - Coveralls

Project.toml

src/OptimalTransport.jl

devmotion · 2021-06-12T11:36:33Z

src/OptimalTransport.jl

 export quadreg
 export ot_cost, ot_plan, wasserstein, squared2wasserstein
+export cost_matrix


I am not sure if this should be exposed to users.

Yeah, I remember you said this. Can you expand on why you think so? As a user, I'd like to have access to this function, since if I wanted to create the cost matrix, it would save me some time (I always misuse pairwise at first). I don't see the downside in making this available.

Also, do you think the cost_matrix function is useful (even if only for internal use)? Cause I remember you were not sure about it.

The problem is that the best implementation would be to just forward everything to pairwise since the type of the support and the cost function should know best what to do. E.g., if you use ColVecs or RowVecs then there is no need to concatenate vectors, you can just call pairwise with the underlying matrix. This is also how it is implemented in KernelFunctions and it is much much more efficient than extracting and combining all columns or rows.

Yeah, but if I construct a FiniteDiscreteMeasure without KernelFunctions, then there is no matrix attribute, this is why I wrote like that. Sorry, I don't understand your point that the best implementation would be to just forward everything to pairwise. I mean, the reason for cost_matrix is that I wouldn't have to deal with the variations. For example, if my cost function is a personalized function, then, for example, pairwise(sqeuclidean, mu.support, nu.support) behaves differently than pairwise(SqEuclidean(), mu.support, nu.support), which would require, for example, adding dims=1 and transforming the support to a matrix.
What I'd like to do is to write a function that deals with all these varying cases.

If instead the user passes sinkhorndivergence(SqEuclidean(), mu, nu), then I'd have to instead write C = pairwise(c, mu.support.X, nu.support.X, dims=1). But since I cannot guarantee that the user user KernelFunctions when creating the finite measures, I have to use a method that works for both cases, hence C = pairwise(c,reduce(hcat, mu.support), reduce(hcat, nu.support), dims=1).

That's exactly my point, you don't know what type of vectors is used (e.g., whether it is a ColVecs or a RowVecs), so often reduce(hcat, mu.support) can be a very inefficient and suboptimal choice. If instead you would just use pairwise(c, mu.support, nu.support) then you could make use of the optimizations in packages such as KernelFunctions. So my main point is just that probably this is handled here on the wrong level - the packages that define e.g. ColVecs and RowVecs should define how pairwise is handled and make sure that it is efficient since we can't handle all possible types here.

In general though I am a bit surprised about the problems you mention with SqEuclidean. All the desired cases work automatically due to https://github.com/JuliaStats/Distances.jl/blob/b52f0a10017553b311a9c9eed6f96e34a5629c2f/src/generic.jl#L333-L351 (even though it is not optimized for ColVecs but IMO that's an issue of KernelFunctions or the separate package where they should be moved):

julia> pairwise(SqEuclidean(), rand(5), rand(5)) 5x5 Matrix{Float64}: ... julia> pairwise(SqEuclidean(), [rand(5), rand(5)], [rand(5), rand(5)]) 2x2 Matrix{Float64}: ... julia> pairwise(SqEuclidean(), ColVecs(rand(5, 2)), ColVecs(rand(5, 2))) 2x2 Matrix{Float64}: ...

(sqeuclidean works as well but uses the fallback in StatsBase - IMO this should be changed in Distances)

Interesting. I was getting an error when using mu.support without the splatter, but I was using the dims argument. So perhaps that was the issue. If that is so, then I agree with you that the cost_matrix function is not necessary.

I'll close the PR. Thanks for the inputs.

BTW I made a PR to Distances that would fix the SqEuclidean/sqeuclidean discrepancy: JuliaStats/Distances.jl#224

devmotion · 2021-06-12T11:42:22Z

src/utils.jl

+function cost_matrix(
+    c,
+    μ::Union{FiniteDiscreteMeasure,DiscreteNonParametric},
+    ν::Union{FiniteDiscreteMeasure,DiscreteNonParametric},
+)
+    if typeof(c) <: PreMetric && length(μ.support[1]) == 1
+        return pairwise(c, vcat(μ.support...), vcat(ν.support...))
+    elseif typeof(c) <: PreMetric && length(μ.support[1]) > 1
+        return pairwise(c, vcat(μ.support'...), vcat(ν.support'...); dims=1)
+    else
+        return pairwise(c, μ.support, ν.support)
+    end
+end


I think we would want to restrict this to pairs of marginals with the same type of the support. Or at least the dimension should match in the case of arrays and scalars. So we would want a more fine-grained function signature.

In general, it would be better to avoid the type checks in the function definition since it makes it more difficult to extend and specialize the method. I think it would be better to just

define a separate fallback for arbitrary c (the last branch)

define a separate method for c.:PreMetric

Also one should avoid the splatting of support, it will lead to massive compile times and inference problems for larger arrays.

The same comments apply to the implementation below.

Hmm. So the splatter is inefficient. Now, how should one efficiently construct matrix C using Distances.pairwise ? I mean, the StatsBase.pairwise takes the vector of vectors and returns exactly what one wants. But the Distances.pairwise would require a matrix version, so how do I make a matrix from the vector of vectors?.

As mentioned above, if you work with ColVecs or RowVecs you actually don't want to construct a matrix at all. But if you deal with an actual vector of vectors, then usually you would use e.g. reduce(hcat, vectors_of_vectors) to avoid splatting.

I agree. But since we didn't require KernelFunctions as a dependency, I could not guarantee that mu.support.X would work. I'll change to your suggestion with reduce.

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

davibarreira · 2021-06-13T00:02:00Z

It seems that this helper function was not necessary, so I'm closing this PR.

coveralls · 2024-10-01T16:22:36Z

Pull Request Test Coverage Report for Build 931101555

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

11 of 12 (91.67%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.07%) to 94.6%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/utils.jl	11	12	91.67%

Totals
Change from base Build 928044156:	-0.07%
Covered Lines:	473
Relevant Lines:	500

💛 - Coveralls

davibarreira added 21 commits June 1, 2021 21:23

Initianning sikhorn divergence

30de49c

Merge branch 'master' of https://github.com/JuliaOptimalTransport/Opt…

1a03325

…imalTransport.jl into sinkhorndivergence

Sinkhorn divergence implemented

4a1f380

Added PyCall to test dependencies

bdc1b5b

Added tests for sinkhorn divergence

416dcb4

Added Sinkhorn Divergence to docs

f593377

Creating FiniteDiscreteMeasure struct

21d38a8

Modifications:

e17bba5

- Created the struct FiniteDiscreteMeasure, - Implemented two versions of sinkhorn_divergence, - Disabled the use of regularization on sinkhorn_divergence, - Fixed docstring with suggestions.

FixedDiscreteMeasure normalizes the weights to sum 1

10e8849

FixedDiscreteMeasure checks if probabilities are positive

52b3c7a

Created tests for FiniteDiscreteMeasure

7d2924d

Added tests for sinkhorn divergence and finite discrete measure

7cf44a6

Fixed the code for creating cost matrices in the sinkhorn_divergence

4764b00

Added costmatrix.jl to tests

98784c5

Fixed docstring for costmatrix

1fb0fc1

Fixed errors in the tests

808d6ac

Minor fixes in the tests

3415386

Created auxiliary cost matrix function

d373d52

Formatted code

933c106

costmatrix implementation from sinkhorndiverngce PR

63af17a

Formatted code

f952be8

Added costmatrix.jl to docs

6d812e3

devmotion reviewed Jun 12, 2021

View reviewed changes

davibarreira and others added 2 commits June 12, 2021 08:48

Update Project.toml

97f5d77

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

Update src/OptimalTransport.jl

fd54e9f

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>

davibarreira closed this Jun 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cost Matrix function #104

Cost Matrix function #104

davibarreira commented Jun 12, 2021

coveralls commented Jun 12, 2021 •

edited

Loading

devmotion Jun 12, 2021

davibarreira Jun 12, 2021 •

edited

Loading

davibarreira Jun 12, 2021

devmotion Jun 12, 2021

davibarreira Jun 12, 2021 •

edited

Loading

devmotion Jun 12, 2021

devmotion Jun 12, 2021 •

edited

Loading

davibarreira Jun 12, 2021 •

edited

Loading

davibarreira Jun 13, 2021 •

edited

Loading

devmotion Jun 13, 2021

devmotion Jun 12, 2021

davibarreira Jun 12, 2021 •

edited

Loading

devmotion Jun 12, 2021

davibarreira Jun 12, 2021

davibarreira commented Jun 13, 2021

coveralls commented Oct 1, 2024 •

edited

Loading

Cost Matrix function #104

Cost Matrix function #104

Conversation

davibarreira commented Jun 12, 2021

coveralls commented Jun 12, 2021 • edited Loading

Pull Request Test Coverage Report for Build 931149182

💛 - Coveralls

Choose a reason for hiding this comment

davibarreira Jun 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davibarreira Jun 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

devmotion Jun 12, 2021 • edited Loading

Choose a reason for hiding this comment

davibarreira Jun 12, 2021 • edited Loading

Choose a reason for hiding this comment

davibarreira Jun 13, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davibarreira Jun 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davibarreira commented Jun 13, 2021

coveralls commented Oct 1, 2024 • edited Loading

Pull Request Test Coverage Report for Build 931101555

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

coveralls commented Jun 12, 2021 •

edited

Loading

davibarreira Jun 12, 2021 •

edited

Loading

davibarreira Jun 12, 2021 •

edited

Loading

devmotion Jun 12, 2021 •

edited

Loading

davibarreira Jun 12, 2021 •

edited

Loading

davibarreira Jun 13, 2021 •

edited

Loading

davibarreira Jun 12, 2021 •

edited

Loading

coveralls commented Oct 1, 2024 •

edited

Loading