Implement more matrix functions #5840

jiahao · 2014-02-17T23:31:50Z

Higham and Deadman have published a list of software implementing algorithms for matrix functions, showing that Julia's library for these functions could be improved greatly relative to the state of the art.

The purpose of this issue is to discuss and track the implementation of matrix functions.

From Higham and Deadman's list:

General function evaluations

General matrix function with derivatives of the underlying scalar function available: Schur–Parlett algorithm (Davies and Higham, 2003)
Function of a symmetric or Hermitian matrix by diagonalization
Condition number estimate for general functions: cond(f,A)
matrix function - vector product evaluation: f(A) * b

Specific functions

expm: scaling and squaring algorithm (Al-Mohy and Higham, 2009)
logm: inverse scaling and squaring algorithm (Al-Mohy, Higham, and Relton, 2012,
2013)
sqrtm:
- Schur algorithm (Björck and Hammarling, 1983)
- real version for real matrices (Higham, 1987)
- blocking algorithm for performance improvements (Deadman, Higham, and Ralha, 2013)
A^t for real matrix powers: Schur– Padé algorithm (Higham and Lin, 2013) (Fixed the algorithm for powers of a matrix. #21184)
Matrix unwinding function

Fréchet derivatives

Exponential: scaling and squaring algorithm (Al-Mohy and Higham, 2009)
Logarithm: inverse scaling and squaring algorithm (Al-Mohy, Higham, and Relton, 2013)
General matrix function: complex step algorithm (Al-Mohy and Higham, 2010) or using block 2 × 2 matrix formula.
A^t for real matrix powers: Schur– Padé algorithm (Higham and Lin, 2013)

Miscellaneous functions

inv and det for Tridiagonal and SymTridiagonal matrices (Usmani, 1994) (Provide inv and det of Tridiagonal and SymTridiagonal matrices #5358)

The text was updated successfully, but these errors were encountered:

StefanKarpinski · 2014-02-17T23:53:03Z

It would be amazing to include all of these algorithms. Thanks for putting this issue together, @jiahao.

jiahao · 2014-02-18T00:11:47Z

It's not clear to me what interface should be implemented for the very generic things like "function of a symmetric or Hermitian matrix".

We could define methods for elementary functions until the cows come home, but that is both tedious and namespace polluting (we'd need sinm, cosm, atanhm, etc. to disambiguate from the ordinary forms which broadcast over the elements) , and furthermore wouldn't scale well for linear combinations of functions.

Maybe we want something like applym, so that a call like applym(A->exp(A)+sin(2A)-cosh(4A), A) would evaluate the matrix expm(A)+sinm(2A)-coshm(4A)?

StefanKarpinski · 2014-02-18T00:29:51Z

This is one of the reasons why having all these functions be vectorized is kind of shitty. I suspect that @JeffBezanson will vehemently agree. There could be a MatrixFunctions module that you have to import from to get the matrix variants with all the same names.

jiahao · 2014-02-18T00:37:01Z

Sure, but that won't help with the fact that a complicated matrix function like A->expm(A)+sinm(2A)-coshm(4A) can be computed more quickly all at once than computing (expm(A))+(sinm(2A))-(coshm(4A)) by working out the individual elementary functions separately, since the underlying Schur-Parlett algorithm is essentially the same.

StefanKarpinski · 2014-02-18T00:56:48Z

Yes, that's a very good point.

stevengj · 2014-02-18T01:48:03Z

Trefethen's contour-integration method is also neat for mostly analytic functions (with known singularities), though I don't know how it compares to the state of the art (in performance and generality).

jiahao · 2014-02-18T04:24:19Z

@stevengj it looks like that paper is Ref. 32 of the Higham and Deadman report, and isn't currently available in any major package (although it has Matlab code).

simonbyrne · 2014-02-18T09:41:55Z

This seems like a good candidate for a GSOC project.

As far as an interface goes, I would suggest a macro

B = @matfun exp(A)+sin(2A)-cosh(4A)

A naive approach could simply replace the functions (e.g. exp -> expm), a more advanced approach could try to identify the common decompositions.

GunnarFarneback · 2014-02-18T22:21:27Z

The item "[sqrtm] real version for real matrices (Higham, 1987)" is checked. I can't see that algorithm in base. I made an attempt on it about a year ago but concluded that the complexity just wasn't worth it. I've put up the code as a gist if someone wants to have a look.

jiahao · 2014-02-18T22:59:40Z

Thanks for catching that. I must have ticked it by accident when writing up this issue.

dlfivefifty · 2014-04-11T11:23:16Z

Just tried to call logm, but wasn't implemented. Had to go to Mathematica

fabianlischka · 2014-08-09T17:38:02Z

Whoever tackles this issue might find this paper helpful for testing:
Higham, Deadman: Testing Matrix Function Algorithms Using Identities, March 2014

StefanKarpinski · 2014-08-09T17:44:50Z

Great reference – thanks, @fabianlischka.

RFC: Matrix logarithm function (issue #5840)

mfasi · 2015-07-26T12:24:37Z

The logm function has been added by pull request #12217.

ViralBShah · 2015-07-26T13:48:28Z

Also cc: @higham @alanedelman

ararslan · 2016-05-05T23:11:16Z

Would sinm and cosm (matrix sine and cosine, respectively, see here) be useful? They're easily derived from expm. I can put together a PR if that sounds desirable.

dlfivefifty · 2016-08-02T23:46:32Z

I think dropping the m is more consistent with the Taylor series definition of exp. For example,

exp(dual(1.,2.))

returns the dual corresponding to the Taylor series definition (which in this case is equivalent to expm([1. 2.; 0. 1.]).

That's ignoring the Matlab and Mathematica precedents.

dlfivefifty · 2016-08-12T08:41:38Z

Whether or not exp(A) replaces expm(A), I think down the line exp.(A) should replace exp(A), as the current exp(A) is confusing for non-MATLAB users

tkelman · 2016-08-12T10:30:40Z

@dlfivefifty sooner than you think, there's already an open PR for it: https://github.com/JuliaLang/julia/pull/17302/files#diff-8278b779f2ea681192ba5b020a2c3e2bL137

felixrehren · 2017-01-11T16:33:10Z

Version 2 of the Deadman-Higham list has been out for a year and Julia is the same as it was first time. Let's get better before version 3 comes out. I'm going to look at Schur-Pade -- that together with the top-4 tasks "General function evaluations" in the OP would cover most of this. Anyone else want to join?

P.S. we already beat R in the DH list, with built-in Matlab and SciPy looking vulnerable as well!

simonbyrne · 2017-01-11T17:06:54Z

Is the "Deadman-Higham ranking" our Weissman Score?

dlfivefifty · 2017-01-11T20:17:46Z

Could there be a function matrixbroadcast(f,A) that calculates the matrix function f(A) for general f? This would be easy to implement for diagonalisable A, not sure about otherwise

stevengj · 2017-01-12T12:22:21Z

Even for diagonalizable A, the possibility of nearly defective A would be an issue. Of course, it could be restricted to Hermitian A, but that case is so easy that we would hardly add much value by implementing a function for it in Base.

alanedelman · 2017-02-01T15:56:11Z

There is a pretty good chance that matrix derivatives would work with ForwardDiff.jl, at least
if there is a corresponding generic linear algebra function. I've been pretty impressed that
one can get derivatives of powers and svd's and inverses, etc.

ChrisRackauckas · 2017-02-22T18:23:35Z

Should this all be moved out to a MatrixFunctions.jl?

felixrehren · 2017-02-23T08:18:16Z

Seems like a good idea -- I think everything on the list other than expm logm sqrtm is a clear candidate for moving out (e.g. I have a rootm function that takes arbitrary matrix pth roots -- probably doesn't fit in base, but could be nice somewhere). Whether expm logm sqrtm are worth keeping in base or can move out as well, I don't know.

@ararslan You did the great work moving SpecialFunctions.jl. What do you think about MatrixFunctions.jl? Other people that were involved there IIRC but are not in this thread: @musm, @andreasnoack

ViralBShah · 2018-03-04T03:29:33Z

Should we close this, and perhaps there ought to be a better place for this in a new repo?

ChrisRackauckas · 2018-03-04T03:59:02Z

If it's extending sqrt and log like exp in #23233, then it should be in Base since otherwise it's type piracy.

dlfivefifty · 2018-03-04T10:17:07Z

we could do something like this:

for f in (:sqrt, ...)
   @eval $f(A::AbstractArray) = matrixfunction($f, A)
end
matrixfunction(f, A) = error(“Install MatrixFunctions.jl”)
end

Then MatrixFunctions.jl can safely override Base.matrixfunction(::typeof(sqrt), A)

robzan8 · 2018-05-01T15:42:37Z

I have implemented a couple of algorithms for computing matrix functions.
For dense matrix functions: Schur-Parlett with automatic differentiation done with TaylorSeries.jl.
For sparse matrix functions: the rational Krylov/Arnoldi method, with poles found automatically using AAA (the AAA algorithm for rational approximation of functions).

You can check them out here https://github.com/robzan8/MatFun.jl

sethaxen · 2020-12-29T00:19:22Z

I was planning to add the Fréchet derivatives for matrix exponential and logarithm to ChainRules to support pushforwards and pullbacks of exp and log. However, they duplicate a lot of code from LinearAlgebra's implementations. Given this issue, would it be better to contribute them directly to LinearAlgebra?

andreasnoack · 2021-01-04T10:31:53Z

Possibly but isn't it possible to define these in terms of existing matrix functions?

antoine-levitt · 2021-01-04T10:44:00Z

Unless I'm missing something, not really: differentials of matrix functions are tricky due to non-commutativity. @sethaxen it would be strange to have this live in LinearAlgebra, can you maybe refactor LinearAlgebra to expose the functions that you need and use them in ChainRules?

dlfivefifty · 2021-01-04T11:30:32Z

I think @antoine-levitt is right. Looks like differentiating f(A(t)) using Taylor series is a mess. Cauchy integrals formula is a bit easier: if we differentiate

f(A(t)) = ∮ f(ζ) inv(ζ*I - A(t)) dζ

then we get

d/dt f(A(t)) = ∮ f(ζ) inv(ζ*I - A(t)) * dA/dt * inv(ζ*I - A(t))  dζ

which I don't believe can be reduced to matrix factorisations.

Really interested to know if anyone has a reference for differentiating matrix functions. Don't have a copy of Higham handy but nothing pops out of the table of contents.

sethaxen · 2021-01-04T11:37:39Z

Possibly but isn't it possible to define these in terms of existing matrix functions?

Sort of. For any matrix function f we can compute its Frechet derivative by calling f on a block matrix with a particular structure:

function frechet(f, A, ΔA)
    n = checksquare(A)
    Ablock = [A ΔA; zero(A) A]
    Yblock = f(Ablock)
    Y = Yblock[1:n,1:n]
    ΔY = Yblock[1:n,n+1:2n]
    return Y, ΔY
end

But this is ~8x the cost of f, whereas the algorithms by Higham in the OP are more efficient (I think ~3x or less).

@sethaxen it would be strange to have this live in LinearAlgebra

I agree, I only ask because given this issue it seemed there was (old) interest in them being in base, and the Frechet derivative of exp is included in scipy.

can you maybe refactor LinearAlgebra to expose the functions that you need and use them in ChainRules?

Perhaps. While the most efficient way to implement will be completely in parallel, we can get close by reusing all matrix products. So a nice refactor might be to move the work of exp! into a function that also returns all intermediate matrix products, which I could then reuse. But it would also be nice to reduce the number of allocations in exp!, in which case its Frechet derivative would need to be computed in parallel and duplicate a lot of code.

antoine-levitt · 2021-01-04T12:00:42Z

Really interested to know if anyone has a reference for differentiating matrix functions. Don't have a copy of Higham handy but nothing pops out of the table of contents.

Basically continue what you were doing with the integral formula, expand in an eigenbasis and integrate the poles explicitly, and you get divided differences (f(lambda_n) - f(lambda_m)) / (lambda_n - lambda_m), with the convention (f(x)-f(y))/(x-y) = f'(x) when x==y. It's probably discussed in Higham somewhere, I think it's called Daleskii-Krein there (I know it from quantum mechanics, where it's known under the generic name of "perturbation theory"). It's a bit tricky to implement in the case of close eigenvalues for generic functions f because of the divided differences (what I've been doing in https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/Smearing.jl#L44 is to use forward-diff to use f' and f'' for x and y close together, although that still doesn't get you to O(eps)). For exp/log you can use expm1 and log1p. All this is for normal matrices, @sethaxen I imagine the method you're looking at is more robust for non-normal?

It was suggested above to split matrix functions (more than exp/log/sqrt) to a separate MatrixFunctions.jl package; I think that would be a very natural place to put derivatives.

dlfivefifty · 2021-01-04T12:24:45Z

@sethaxen's method is probably a bad idea for normal matrices as Ablock will not be normal.

sethaxen · 2021-01-04T18:57:14Z

Really interested to know if anyone has a reference for differentiating matrix functions. Don't have a copy of Higham handy but nothing pops out of the table of contents.

Section 3.2 of Higham's Functions of Matrices gives several approaches for computing Frechet derivative:

the block method given above (Eq 3.16)
Daleckii-Krein using the eigendecomposition for normal matrices (Corollary 3.12)
Using the Schur decomposition, if you know the Frechet derivative of f(T) for triangular T (Problem 3.2)
Computing the Frechet derivatives of the primitive functions of the implementation of f and composing with the chain rule, though this needs to be checked for accuracy.

@sethaxen I imagine the method you're looking at is more robust for non-normal?

For automatic differentiation, we indeed only want to define a rule if it works for all matrices, and the latter approach should be fine. In fact, the Frechet derivative of the matrix exponential by Higham is precisely exactly this, just checked for accuracy. I don't think we can autodiff exp with any AD (Zygote can't handle in-place modification, and ForwardDiff and ReverseDiff require types other than BlasFloat), so a rule is needed. The reverse-mode derivative (pullback) is the Frechet derivative applied to the adjoint of the input.

It was suggested above to split matrix functions (more than exp/log/sqrt) to a separate MatrixFunctions.jl package; I think that would be a very natural place to put derivatives.

The issue with that, raised above, is type piracy. I guess that could be worked around by re-appending the m suffix or using an interface like matfun(sin, A), but sin(A) is cleaner. For Hermitian, almost all of the matrix functions share a common form (using Daleckii-Krein), so it would be odd to split a subset of them into a separate package. The advantage though is such a package could provide multiple algorithms for computing each f and also bundle the ChainRules rules side-by-side, which is always preferable to putting them in the ChainRules package. There are also cases like ExponentialUtilities.exp! where the code for LinearAlgebra.exp! was almost exactly duplicated just to use a cache.

antoine-levitt · 2021-01-04T19:46:51Z

Computing the Frechet derivatives of the primitive functions of the implementation of f and composing with the chain rule, though this needs to be checked for accuracy.

I see. In the applications I'm interested in the function f is often the Heaviside function, so that's not an option.

Regarding a separate package, clearly exp/sin/log/sqrt are special because they're in base, but there are other functions one might want to use, so a general matfun(f, A) is needed, and it would be natural to put the rules there. But indeed the question of where to put the rules for exp/sin/log/sqrt remains... I would say they belong to ChainRules, and code duplication is necessary in any case (as code reuse for the function itself and its derivatives looks hard from what you say).

ViralBShah · 2022-01-30T01:16:23Z

This is best done in external packages. Moving to a discussion so that we have the list.

jiahao added the linear algebra label Feb 17, 2014

jiahao added feature labels Feb 18, 2014

argriffing mentioned this issue Mar 4, 2014

fill all matrix function checkboxes in table 1 of http://eprints.ma.man.ac.uk/2102/ scipy/scipy#3365

Open

10 tasks

jiahao mentioned this issue Nov 3, 2014

Alternative syntax for map(func, x) #8450

Closed

ViralBShah removed the feature label Feb 14, 2015

simonbyrne mentioned this issue Mar 31, 2015

Deprecate Euler number e in favor of ℯ (U+212F) #10612

Closed

jiahao mentioned this issue Jul 19, 2015

RFC: Matrix logarithm function (issue #5840) #12217

Merged

andreasnoack added a commit that referenced this issue Jul 20, 2015

Merge pull request #12217 from mfasi/matrix_functions

7e7dfd3

RFC: Matrix logarithm function (issue #5840)

mfasi mentioned this issue Jul 22, 2015

Function to compute an estimate of the 1-norm and 2-norm of a matrix #12267

Closed

mfasi mentioned this issue Jul 31, 2015

feature request: cond() for sparse matrices #6485

Closed

This was referenced Aug 12, 2015

WIP: Improve accuracy of ^ for ill-conditioned matrices #12584

Closed

Add wrapper for LAPACK _trexc methods (reordering of the Schur form) #12659

Merged

kshyatt mentioned this issue Sep 12, 2015

Missing matrix/factorization functions #13096

Open

33 tasks

antoine-levitt mentioned this issue Aug 18, 2017

deprecate expm in favor of exp #23233

Merged

ararslan added the stdlib Julia's standard library label May 1, 2018

sethaxen mentioned this issue Jan 18, 2021

Add rules for dense matrix exponential JuliaDiff/ChainRules.jl#351

Merged

sethaxen mentioned this issue Mar 10, 2021

Compute real matrix logarithm and matrix square root using real arithmetic #39973

Merged

JuliaLang locked and limited conversation to collaborators Jan 30, 2022

ViralBShah converted this issue into discussion #43982 Jan 30, 2022

This issue was moved to a discussion.

Implement more matrix functions #5840

Implement more matrix functions #5840

Comments

jiahao commented Feb 17, 2014 • edited by stevengj Loading

From Higham and Deadman's list:

General function evaluations

Specific functions

Fréchet derivatives

Miscellaneous functions

StefanKarpinski commented Feb 17, 2014

jiahao commented Feb 18, 2014

StefanKarpinski commented Feb 18, 2014

jiahao commented Feb 18, 2014

StefanKarpinski commented Feb 18, 2014

stevengj commented Feb 18, 2014

jiahao commented Feb 18, 2014

simonbyrne commented Feb 18, 2014

GunnarFarneback commented Feb 18, 2014

jiahao commented Feb 18, 2014

dlfivefifty commented Apr 11, 2014

fabianlischka commented Aug 9, 2014

StefanKarpinski commented Aug 9, 2014

mfasi commented Jul 26, 2015

ViralBShah commented Jul 26, 2015

ararslan commented May 5, 2016 • edited Loading

dlfivefifty commented Aug 2, 2016

dlfivefifty commented Aug 12, 2016

tkelman commented Aug 12, 2016

felixrehren commented Jan 11, 2017 • edited Loading

simonbyrne commented Jan 11, 2017 • edited Loading

dlfivefifty commented Jan 11, 2017

stevengj commented Jan 12, 2017 • edited Loading

alanedelman commented Feb 1, 2017

ChrisRackauckas commented Feb 22, 2017

felixrehren commented Feb 23, 2017

ViralBShah commented Mar 4, 2018

ChrisRackauckas commented Mar 4, 2018

dlfivefifty commented Mar 4, 2018

robzan8 commented May 1, 2018

sethaxen commented Dec 29, 2020

andreasnoack commented Jan 4, 2021

antoine-levitt commented Jan 4, 2021

dlfivefifty commented Jan 4, 2021

sethaxen commented Jan 4, 2021 • edited Loading

antoine-levitt commented Jan 4, 2021

dlfivefifty commented Jan 4, 2021

sethaxen commented Jan 4, 2021

antoine-levitt commented Jan 4, 2021

ViralBShah commented Jan 30, 2022

This issue was moved to a discussion.

jiahao commented Feb 17, 2014 •

edited by stevengj

Loading

ararslan commented May 5, 2016 •

edited

Loading

felixrehren commented Jan 11, 2017 •

edited

Loading

simonbyrne commented Jan 11, 2017 •

edited

Loading

stevengj commented Jan 12, 2017 •

edited

Loading

sethaxen commented Jan 4, 2021 •

edited

Loading