Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dot(x::Adjoint, y::AbstractVector) gives inconsistent answer #641

Open
marius311 opened this issue Jun 22, 2019 · 9 comments
Open

dot(x::Adjoint, y::AbstractVector) gives inconsistent answer #641

marius311 opened this issue Jun 22, 2019 · 9 comments

Comments

@marius311
Copy link
Contributor

marius311 commented Jun 22, 2019

I wanted to get some feedback before maybe submitting a simple PR. I was quite surprised that dot of an Adjoint and a Vector gives the same answer as if I had never done the adjoint:

julia> using LinearAlgebra

julia> v = [1,2]
2-element Array{Int64,1}:
 1
 2

julia> dot(v,v)
5

julia> dot(adjoint(v),v)
5

I haven't been able to find any discussion explicitly about this, having gone through the many threads about adjoints. Mathematically I think it doesn't make sense and breaks some of the nice mathematical consistency of the existing system. My guess is that its just an unintended consequence of how generic this function is:

https://github.com/JuliaLang/julia/blob/f6049d6d25d720bc90d82aadae6be98bc948ef24/stdlib/LinearAlgebra/src/generic.jl#L785

I would propose special-casing the dot product with the following definition:

LinearAlgebra.dot(x::Adjoint{<:Any,<:AbstractVector}, y::AbstractVector) = parent(x)*y

which means that dot(adjoint(v),v) would be equivalent to v*v, which for Base vectors is undefined (although users would be free to define this operations for their own custom AbstractVectors, and I believe everything remains consistent in that case).

Let me know if there's any comments on this and if I should go ahead and make the PR.

@nickrobinson251
Copy link
Contributor

If this is the way to go, then it should be the same for transpose?

LinearAlgebra.dot(x::Transpose{<:Any,<:AbstractVector}, y::AbstractVector) = parent(x)*y

@nickrobinson251
Copy link
Contributor

But also I'm not sure that this is the way to go...

I think we currently have (roughly) dot(x, y) = sum(dot(x[i], y[i]) for i in 1:length(x)), which i think is what one expects for a dot/scalar product?

Although it does mean all inputs are treated as vectors e.g.

julia> A = [1 2; 3 4];

julia> b = [1 2 3 4];

julia> dot(A, b)
29

@nickrobinson251
Copy link
Contributor

Rather than special casing adjoint/transpose we could just require AbstractVector, i.e. change the function signature to dot(x::AbstractVector, y::AbstractVector) = ..., then dot(adjoint(v),v) would be a MethodError as you propose.

@c42f
Copy link
Member

c42f commented Jun 24, 2019

dot() is defined for arbitrary iterables so in that sense the current implementation seems reasonable. Note that dot is conjugate linear in the first argument, so unwrapping with parent doesn't really make sense.

The current design seems to date from: JuliaLang/julia#27401, and there was a later comment about adjoint there which seems like it might be the same thing JuliaLang/julia#27401 (comment).

@marius311
Copy link
Contributor Author

marius311 commented Jun 24, 2019

conjugate linear in the first argument, so unwrapping with parent doesn't really make sense.

I don't follow this sorry, parent when acting on an Adjoint does indeed take an implicit conjugate (that's why its used e.g. in the implementation of adjoint(::Adjoint)?

In any case, I guess its true, as the docs for dot are written, this is exactly intended, as is dot([1 2; 3 4], [1 2 3 4]) example above. A difference between that and what I'm saying I guess is that my impression was that Adjoint objects were really supposed to track the abstract mathematical meaning, in which case the current behavior for dot(x::Adjoint, y::AbstractVector) does seem incorrect to me. Of course, others have thought about this much more than I have so I could definitely be wrong.

@nickrobinson251
Copy link
Contributor

Adjoint objects were really supposed to track the abstract mathematical meaning

Ah, interesting point! I hadn't considered that the LinearAlgebra wrapper types could/should act differently than an array with the same dimensionality (e.g. an Adjoint isn't the same as a row vector).

@fredrikekre
Copy link
Member

See #549 and JuliaLang/julia#28666

@c42f
Copy link
Member

c42f commented Jun 24, 2019

parent when acting on an Adjoint does indeed take an implicit conjugate

Sure, move on, nothing to see there ;-) (I just misread your definition.)

@marius311
Copy link
Contributor Author

marius311 commented Jun 25, 2019

Thanks @fredrikekre, yea, looks like if JuliaLang/julia#28666 is merged that would cause dot(x::Adjoint, y::AbstractVector) to error, which I would be for. My proposal here would additionally make it give parent(x)*y (which is still an error for base Arrays). But I can't say I feel so strongly that that needs to be the case once we have JuliaLang/julia#28666 (and maybe its just a bandaid on the bigger issue of better handling recursivity in adjoints and cases where "scalars" are not Number, which I have seen discussed elsewhere). So I will probably just suggest JuliaLang/julia#28666 be marked as fixing this Issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants