derivatives of more matrix functions #764

stevengj · 2023-12-21T16:15:48Z

I notice that we only currently implement matfun_frechet (the Fréchet derivative of matrix functions) for exp(A), in #331 by @sethaxen.

However, I came across a beautifully simple result based on Magnus et al. (2021) (proposition 2) that looks like it should allow us to define matfun_frechet for (at least) any function with a Taylor series.

In particular, it is easy to show by induction that the Fréchet derivative d(Aⁿ) = (Aⁿ)′[dA] of f(A)=Aⁿ is given by:

It then follows, for any analytic matrix function f(A) (i.e. with a Taylor series), that

That is, you augment the matrix A with E and compute f on the augmented matrix — the diagonal blocks are now f(A) and the upper-right block is the Fréchet derivative f'(A) applied to E.

It's not as efficient as an algorithm specialized for particular functions f(A), e.g. it's probably slower than our current exp implementation, because doubling the matrix size probably increases the cost by a factor of 8 (or maybe less, if the algorithm can take advantage of the block-triangular structure). But it's better than nothing.

The text was updated successfully, but these errors were encountered:

sethaxen · 2023-12-21T17:30:43Z

Thanks for the issue. Indeed, Higham in Section 3 of https://eprints.maths.manchester.ac.uk/id/eprint/2754 lists as the recommended algorithm for the Fréchet derivative for an arbitrary matrix function either an approximate one (complex step method) or precisely the approach you give, which is elegant if not that efficient. The reference given is section 7.3 of https://doi.org/10.1017/S0962492910000036, where the only qualification on f is that it is 2n-1 times continuously differentiable.

In terms of derivatives of matrix functions, the most important one we are missing is the matrix logarithm algorithm of https://doi.org/10.1137/120885991, because with this and our exp implementation, one can write rules for all of the matrix functions defined in LinearAlgebra, (though maybe for matrix powers there is a better approach).

Here are some potential downsides to the block triangular approach:

While the pushforward would cost ~8x the primal, the pullback would require first evaluating the primal and then calling the block-triangular version in the pullback, so ~9x.
If A is structurally sparse (e.g. diagonal or triangular), f(A) might be much more efficient than f(Matrix(A)), but likely that structural sparsity would be lost in the block-triangular matrix. We would need to use ProjectTo(A) to get it back.
Worse, it is possible only f(::MyStructurallySparseType) is implemented, and there is no f(::Matrix) method, in which case it just errors.
How likely is it that users will need other matrix functions besides the ones in base? This seems like something implementers of matrix functions would want, in which case maybe it makes sense to be in its own package?

stevengj · 2023-12-21T19:00:32Z

Higham (2010) section 7.3, in turn, cites Higham (2008)

Higham (2008), section 3.2, writes:

stevengj · 2023-12-21T19:05:12Z

The Mathias reference is Roy Mathias (1996), which seems to be the earliest cited source for this result. Mathias presents it as an original result:

stevengj mentioned this issue Jan 11, 2024

Add ChainRules rules SciML/ExponentialUtilities.jl#40

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

derivatives of more matrix functions #764

derivatives of more matrix functions #764

stevengj commented Dec 21, 2023 •

edited

Loading

sethaxen commented Dec 21, 2023

stevengj commented Dec 21, 2023 •

edited

Loading

stevengj commented Dec 21, 2023

derivatives of more matrix functions #764

derivatives of more matrix functions #764

Comments

stevengj commented Dec 21, 2023 • edited Loading

sethaxen commented Dec 21, 2023

stevengj commented Dec 21, 2023 • edited Loading

stevengj commented Dec 21, 2023

stevengj commented Dec 21, 2023 •

edited

Loading

stevengj commented Dec 21, 2023 •

edited

Loading