Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define dot product between vectors of arrays/numbers #22220

Closed
wants to merge 10,000 commits into from
Closed

Define dot product between vectors of arrays/numbers #22220

wants to merge 10,000 commits into from

Conversation

dalum
Copy link
Contributor

@dalum dalum commented Jun 5, 2017

This is useful in quantum physics and potentially other fields, where equations containing sums of operators (matrices) can often be simplified by writing them as vector dot products instead. By allowing dot to work for vectors of matrices and numbers, it is possible to achieve a more direct correspondence between mathematical notation and a corresponding program. The application which motivated this PR was the Pauli vector which is often dotted with a magnetic field (B, a "normal" vector) to produce a sum of matrices:

σ = [[0 1; 1 0], [0 -im; im 0], [1 0; 0 -1]]
B = [1, 1, 1]
σ⋅B == σ[1] * B[1] + σ[2] * B[2] + σ[3] * B[3]

The reason for adding dot(::AbstractArray, ::AbstractArray) with conjugate transposition of the first argument is to preserve the correspondence between complex conjugation and conjugate transposition of numbers and matrices, where hermitian matrices are analogous to real numbers and antihermitian are analogous to imaginary numbers.

JeffBezanson and others added 30 commits May 16, 2017 13:46
Add a missing return in doInitialization
Load module on remote workers from master (v2)
turn on deprecation warnings for `type` and `immutable`
give a syntax error for repeated keyword args. fixes #16937
fix #21848, bug in widening done by limit_type_depth
fix #20575, syntax error for juxtaposing a string literal
Use where syntax in umfpack.jl
fix #21906, `ccall` causes unnecessary variable `Box`
This is undefined behavior in LLVM and can actually segfault on real hardware
including x86 in certain cases.

Also add missing endianess test since it is changed in this commit.
base/digits: allow negative bases
@dalum
Copy link
Contributor Author

dalum commented Jun 6, 2017

@TotalVerb But isn't it the case that the direct product of a 3 by 1 matrix with a 2 by 2 matrix is a 6 by 2 matrix? Multiplying this with a 2 by 6 matrix from the left should give a 2 by 2 matrix, not a number.

I am not arguing that this should be the behaviour of dot now, but of *.

@TotalVerb
Copy link
Contributor

I don't understand what you mean about *. Could you give an example?

@dalum
Copy link
Contributor Author

dalum commented Jun 6, 2017

julia> A = [rand(2, 2) for _ in 1:3]
3-element Array{Array{Float64,2},1}:
 [0.684755 0.372397; 0.165985 0.0307143]
 [0.60091 0.894754; 0.0742568 0.532859] 
 [0.76396 0.898669; 0.159446 0.626533]  

julia> A' * A
3.896401569691669

julia> A' * A == sum(dot(Ai, Ai) for Ai in A) == dot(A, A)
true

julia> A' * A == sum(Ai' * Ai for Ai in A)
false

@TotalVerb
Copy link
Contributor

I have to agree with you that A' * A returning a scalar is strange because it does recursively call dot, despite there being no indication of it from the syntax. Hmm. Maybe it should be changed to return a matrix.

@stevengj
Copy link
Member

stevengj commented Jun 6, 2017

x'*x returns a scalar if x is a vector (this was changed after long discussion in JuliaLang/LinearAlgebra.jl#42) since x' is treated as an element of the dual space. If x is a vector, in contrast, x is treated as the adjoint of a linear operator and x'*x gives a matrix. If x is a vector of matrices, it makes sense to me that it be treated as a vector, since it can't act as a linear operator.

In any case, that seems like a somewhat separate issue?

@dalum
Copy link
Contributor Author

dalum commented Jun 6, 2017

I agree that x' * x should return a number, when x is a vector of numbers. But not when x is a vector of operators.

The notion of a vector of linear operators is, I would argue, a commonly established concept in physics, and probably applied mathematics too. The divergence of a vector field is exactly an operator which has the property of being a row vector and where the dot notation is used in a mathematical context exactly because of its similarities. The fact that it returns a real number when acting on a real vector field is coincidental, as it would in general return a complex number for a complex vector field. Or a quarternion for a quarternion vector field. Or a matrix for a matrix vector field.

Now, as pointed out by @andreasnoack in #22240, the fact that dot(::Matrix, ::Matrix) works at all right now is a bug, and thus so is the fact that x' * x returns a scalar when x is a Vector{Matrix{Number}}.

@stevengj
Copy link
Member

stevengj commented Jun 6, 2017

@eveydee, the problem with interpreting a vector of matrices as a linear operator is that this is inconsistent with how Julia's * works:

julia> A = [rand(2,2) for i = 1:4]
4-element Array{Array{Float64,2},1}:
 [0.366133 0.289976; 0.123663 0.41303] 
 [0.737389 0.398943; 0.102958 0.326483]
 [0.737347 0.742327; 0.868316 0.846149]
 [0.410302 0.429769; 0.474719 0.750022]

julia> A * [3,4]
ERROR: DimensionMismatch("Cannot multiply two vectors")
Stacktrace:
 [1] *(::Array{Array{Float64,2},1}, ::Array{Int64,1}) at ./linalg/rowvector.jl:184

If A were interpreted as a vector of linear operators (and I totally agree that this notation is possible), then A*x would give [a*x for a in A] here.

How the adjoint works really has to be consistent with how * and dot work. The key (defining!) property is that we must have dot(x,A*y) == dot(A'*x, y) for any cases where A' is treated as the adjoint of a linear operator.

@andreasnoack
Copy link
Member

@eveydee, the problem with interpreting a vector of matrices as a linear operator is that this is inconsistent with how Julia's * works

That is because we have been sloppy when implementing * and it should most likely be considered a bug.

This is Julia 0.3, by the way

julia> x = [randn(2,2) for i in 1:3]
3-element Array{Array{Float64,2},1}:
 2x2 Array{Float64,2}:
 0.491661    -0.697108
 0.00846169  -0.195587
 2x2 Array{Float64,2}:
  0.242481   0.648218
 -0.567836  -0.224404  
 2x2 Array{Float64,2}:
 0.729065  -0.90256
 0.799737   2.11797      

julia> dot(x,x)
2x2 Array{Float64,2}:
 -0.263727  -2.7643
  2.26912    3.4786

and it got changed accidentally as mentioned in #22240. As dot works now, it is effectively vecdot. I'm proposing that dot and vecdot do different things and then you can pick the one that works for your problem. I don't really see what is non-generic about vecdot.

It is not that returning a matrix doesn't work. I already mentioned the limitation in #22220 (comment) but in most cases, it just works. E.g. here is Gram-Schmidt

julia> Base.:(*)(x::Vector{<:Array}, α::Array) = [xx*α for xx in x]

julia> mydot(x,y) = sum(xx'yy for (xx,yy) in zip(x,y))
mydot (generic function with 1 method)

julia> function myGS(V::Vector{Vector{Matrix{Float64}}})
                U = similar(V)
                U[1] = V[1]
                for j in 2:length(V)
                  U[j] = V[j] - sum(U[i]*inv(mydot(U[i], U[i]))*mydot(U[i], V[j]) for i in 1:j-1)
                end
                return U
              end
myGS (generic function with 1 method)

julia> V = [[randn(2,2) for i in 1:3] for i in 1:3];

julia> U = myGS(V);

julia> mydot(U[1], U[2])
2×2 Array{Float64,2}:
 3.60822e-16   0.0
 1.11022e-15  -1.77636e-15

julia> mydot(U[1], U[3])
2×2 Array{Float64,2}:
 -2.22045e-16  4.71845e-16
  8.88178e-16  7.77156e-16

julia> mydot(U[2], U[3])
2×2 Array{Float64,2}:
  1.11022e-15  -2.15106e-16
 -8.88178e-16  -7.63278e-17

@dalum
Copy link
Contributor Author

dalum commented Jun 6, 2017

@stevengj I understand what you are getting at, but your example highlights a completely different use-case, which I don't think has a good solution. This is not at all what was intended to be achieved, and I wouldn't expect people to assume that to work in any instance.

This PR/issue is about dotting vectors in spaces of equal dimensions, i. e., achieving compact notation by writing sums of (matrix) products as dot products of vectors. And it looks like things worked that way in Julia 0.3, based on @andreasnoack's comment.

@stevengj
Copy link
Member

stevengj commented Jun 6, 2017

The basic question is whether one sees ' as the purely mechanical operation of "swapping rows and columns and conjugating" or as an adjoint operation reflecting a deeper notion of algebra. With RowVector I thought we were moving in the latter direction, which requires *, dot, and ' to have certain relationships.

@andreasnoack, the problem with myGS is that inv(mydot(U[i], U[i])) may fail because this is not an inner product and may not return something invertible for nonzero U[i].

@TotalVerb
Copy link
Contributor

TotalVerb commented Jun 6, 2017

@stevengj To me, at least one of these behaviors ought to be wrong:

julia> A = [rand(2, 2) for _ in 1:3]
3-element Array{Array{Float64,2},1}:
 [0.843197 0.790402; 0.293537 0.717]   
 [0.842407 0.207064; 0.685534 0.616951]
 [0.679417 0.632756; 0.462531 0.390315]

julia> B = [rand(2, 2) for _ = 1:3, __ = 1:3]
3×3 Array{Array{Float64,2},2}:
 [0.77686 0.873093; 0.518896 0.314283]     [0.997188 0.951079; 0.00939076 0.37915]
 [0.418401 0.300341; 0.892219 0.776044]     [0.64111 0.536536; 0.0546541 0.607846] 
 [0.352997 0.667995; 0.242118 0.931147]     [0.352873 0.416895; 0.448481 0.515657] 

julia> A' * B
1×3 RowVector{Array{Float64,2},ConjArray{Array{Float64,2},1,Array{Array{Float64,2},1}}}:
 [2.12329 2.49799; 1.94104 2.24253]    [1.86831 2.30368; 1.35971 1.97476]

julia> A' * B[:,1]
4.365816195613023

If we want A' to be the adjoint in this case, then we need to fix matrix multiplication using it. Furthermore, collect(A') returns something that is not the adjoint of this vector, but instead a 1×3 matrix of matrices, which cannot be the adjoint. From these it looks like both covector-matrix multiplication and collect are interpreting it as a vector of operators, which is inconsistent with dot and covector-vector multiplication.

@stevengj
Copy link
Member

stevengj commented Jun 6, 2017

@TotalVerb, neither of those behaviors seem wrong if we interpret A as simply an element of a vector space (not a linear operator since A*vector isn't defined), in which case A' is an element of the dual space (not an adjoint operator).

This is no different from a = rand(3); B = rand(3,3) where a'*B gives a row vector and a'*B[1,:] gives a scalar.

@dalum
Copy link
Contributor Author

dalum commented Jun 6, 2017

Why must the scalar be a number? If you have a matrix field, M and construct a vector in M^3, then your scalar is a matrix, not a number. Looking at Wikipedia, this seems to be referred to as a module, but I'm not familiar with the terminology here: https://en.wikipedia.org/wiki/Scalar_(mathematics)#Scalars_in_modules

@TotalVerb
Copy link
Contributor

TotalVerb commented Jun 6, 2017

@stevengj Note that the result of A' * B is a RowVector of matrices. If A' is an element of the dual space, then it should be a RowVector of scalars.

I suppose that what may be wrong here is again the printing of A' * B; it and collect are certainly printed in a misleading way, because the behavior of this adjoint is far from a "row vector of matrices" and should not be printed or collected that way.

@TotalVerb
Copy link
Contributor

TotalVerb commented Jun 6, 2017

In other words, perhaps more clearly, to me there is a semantics difference between

A :: Matrix{Float64}
B :: Matrix{Float64}

[A, B]' == [A', B'] <as RowVector>

and

A :: Matrix{Float64}
B :: Matrix{Float64}

[A, B]' == [Dual(A), Dual(B)] <as RowVector>

The Dual operation is recursive (by that I mean, upon collection the elements of the original vector are themselves dual-ed), but the thing that recurses is not the adjoint operator x', because the behavior of that operator is incorrect on matrices (it does not return the dual as if x were simply a vector).

@stevengj
Copy link
Member

stevengj commented Jun 6, 2017

@eveydee, scalars don't have to be numbers, but arbitrary square matrices don't form a field (or skew field) because nonzero matrices aren't invertible. And we can't tell from the type Matrix{Number} whether the matrices are even square, much less invertible. So, given the type information alone, you cannot infer that Vector{Matrix{Number}} is a vector space over anything other than Number.

@stevengj
Copy link
Member

stevengj commented Jun 6, 2017

@TotalVerb, the point is that x' is ambiguous. It can mean an adjoint operator or a dual vector. This is a fact of life in mathematics. We are left with how to resolve it. Saying that it is the adjoint operator for x::Matrix and the dual vector for x::Vector, regardless of the element type, is mostly what we do now, and is what I have been advocating based on how we define *.

@StefanKarpinski
Copy link
Member

Square matrices do form a ring, however, and matrices of matrices can be viewed as forming a module (c.f. vector space) with a "scalar" – i.e. ring – elements that are matrices themselves.

@TotalVerb
Copy link
Contributor

@stevengj I understand that. But when ' is resolved to be a dual, the elements of that dual vector (as from collect) should also be dual elements, not adjoints as they currently are, no?

@stevengj
Copy link
Member

stevengj commented Jun 6, 2017

@StefanKarpinski, but you can't infer from the type that the matrices are square, or of the same size, so why would you assume this interpretation if you saw a Vector{Matrix{Number}}?

@StefanKarpinski
Copy link
Member

I wouldn't – it would have to be implied by an operation on them, like a dot product.

@andreasnoack
Copy link
Member

andreasnoack commented Jun 6, 2017

@andreasnoack, the problem with myGS is that inv(mydot(U[i], U[i])) may fail because this is not an inner product and may not return something invertible for nonzero U[i].

Hey, I pointed out that first. I'm just saying that it's not a big deal. I don't think the consumers of this feature would have a problem with the potential singularity here. (Matrix consumers are used to dealing with potential singularity and most matrices are full rank anyway).

The main point here is that defining dot(x::Vector,y::Vector) = sum(xx'yy for (xx,yy) in zip(x,y)) wouldn't restrict you, or at least you haven't told us in what way, since you can use vecdot. If you enforce that dot is like vecdot then we don't have a choice between two different useful behaviors.

@dalum
Copy link
Contributor Author

dalum commented Jun 7, 2017

@stevengj If the matrices aren't of compatible sizes for a matrix product, you shouldn't be able to vecdot them, when the matrix inner product is defined as trace(X'Y) for X, Y matrices. In other words, when taking the dot products of Vector{Matrix{Number}}, the only case that the matrix inner product covers, which a matrix product does not, is when the vectors contain compatible matrices but of different sizes in each entry. This is because summing matrices of different sizes does not make sense, but summing the traces does:

julia> A = [rand(2, 2), rand(3, 2), rand(2, 3)]
julia> B = [rand(2, 2), rand(2, 3), rand(3, 2)]
julia> C = [rand(2, 2), rand(4, 1), rand(2, 3)]

julia> A'A
3.199407765200927
julia> sum(trace(a'a) for a in A) == A'A
true
julia> sum(a'a for a in A)  # The error here is the sum, not the matrix product
ERROR: DimensionMismatch("dimensions must match")

julia> A'B
5.432131012504026
julia> sum(trace(a'b) for (a,b) in zip(A,B)) == A'B
ERROR: DimensionMismatch("A has dimensions (3,2) but B has dimensions (3,2)")

julia> A'C
ERROR: DimensionMismatch("dot product arguments have lengths 6 and 4")

The fact that A'B even works here, I think is a bug. vecdot(A[2], B[2]) should be an error since A[2] and B[2] do not have compatible dimensions to do trace(A[2]'B[2]) (A[2]'B[2] is an error). It does work, however, and it returns a different result if we take the conjugate transpose of the first argument, which I don't think is the desired behaviour. One of these should fail:

julia> vecdot(A[2], B[2])
2.5458122251380035
julia> vecdot(A[2]', B[2])
2.358516933607439

A whole different issue is in the not so unlikely case that you want to take the kronecker product of the matrices. In quantum physics, this is the case for the spin orbit coupling, where you have a dot product between vectors of operators in angular momentum and spin spaces respectively. In this particular case the matrices are of different sizes, but for spin-spin couplings in general, the matrices may have identical sizes, so mere size-incompatibility cannot be used to infer which is the desired behaviour. I guess the correct way to treat this is to introduce a new krondot (kronecker dot) for this particular use case?

@andreasnoack
Copy link
Member

andreasnoack commented Jan 16, 2018

JuliaLang/LinearAlgebra.jl#496 made me revisit this issue and I realized that I recently read a formalized argument for my view on dot(Vector{Matrix},Vector{Matrix}) in #22220 (comment).

For the, admittedly, few people interested, the (semi) inner product in https://arxiv.org/pdf/0905.3509.pdf is like our dot, since it maps to the C* algebra and not . The corresponding norm used in the paper would then be

norm(x::Vector{<:AbstractMatrix}) = norm(dot(x,x))

So the version of Cauchy-Schwarz in the paper can be expressed as

julia> norm(x::Vector{<:Matrix}) = norm(dot(x,x))
norm (generic function with 18 methods)

julia> ()(A::Matrix,B::Matrix)= all(eigvals(B - A) .>= 0)
 (generic function with 1 method)

julia> x = [rand(2,2) for i in 1:2]
2-element Array{Array{Float64,2},1}:
 [0.158424 0.747836; 0.541273 0.677574]
 [0.136556 0.956337; 0.468317 0.302034]

julia> y = [rand(2,2) for i in 1:2]
2-element Array{Array{Float64,2},1}:
 [0.454078 0.173277; 0.628935 0.317456]
 [0.270871 0.828004; 0.0367388 0.96777]

julia> (xy)*(yx)  norm(y)^2*(xx)
true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linear algebra Linear algebra
Projects
None yet
Development

Successfully merging this pull request may close these issues.