Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

derivatives of more matrix functions #764

Open
stevengj opened this issue Dec 21, 2023 · 3 comments
Open

derivatives of more matrix functions #764

stevengj opened this issue Dec 21, 2023 · 3 comments

Comments

@stevengj
Copy link
Contributor

stevengj commented Dec 21, 2023

I notice that we only currently implement matfun_frechet (the Fréchet derivative of matrix functions) for exp(A), in #331 by @sethaxen.

However, I came across a beautifully simple result based on Magnus et al. (2021) (proposition 2) that looks like it should allow us to define matfun_frechet for (at least) any function with a Taylor series.

In particular, it is easy to show by induction that the Fréchet derivative d(Aⁿ) = (Aⁿ)′[dA] of f(A)=Aⁿ is given by:
image
It then follows, for any analytic matrix function f(A) (i.e. with a Taylor series), that
image
That is, you augment the matrix A with E and compute f on the augmented matrix — the diagonal blocks are now f(A) and the upper-right block is the Fréchet derivative f'(A) applied to E.

It's not as efficient as an algorithm specialized for particular functions f(A), e.g. it's probably slower than our current exp implementation, because doubling the matrix size probably increases the cost by a factor of 8 (or maybe less, if the algorithm can take advantage of the block-triangular structure). But it's better than nothing.

@sethaxen
Copy link
Member

Thanks for the issue. Indeed, Higham in Section 3 of https://eprints.maths.manchester.ac.uk/id/eprint/2754 lists as the recommended algorithm for the Fréchet derivative for an arbitrary matrix function either an approximate one (complex step method) or precisely the approach you give, which is elegant if not that efficient. The reference given is section 7.3 of https://doi.org/10.1017/S0962492910000036, where the only qualification on f is that it is 2n-1 times continuously differentiable.

In terms of derivatives of matrix functions, the most important one we are missing is the matrix logarithm algorithm of https://doi.org/10.1137/120885991, because with this and our exp implementation, one can write rules for all of the matrix functions defined in LinearAlgebra, (though maybe for matrix powers there is a better approach).

Here are some potential downsides to the block triangular approach:

  • While the pushforward would cost ~8x the primal, the pullback would require first evaluating the primal and then calling the block-triangular version in the pullback, so ~9x.
  • If A is structurally sparse (e.g. diagonal or triangular), f(A) might be much more efficient than f(Matrix(A)), but likely that structural sparsity would be lost in the block-triangular matrix. We would need to use ProjectTo(A) to get it back.
  • Worse, it is possible only f(::MyStructurallySparseType) is implemented, and there is no f(::Matrix) method, in which case it just errors.
  • How likely is it that users will need other matrix functions besides the ones in base? This seems like something implementers of matrix functions would want, in which case maybe it makes sense to be in its own package?

@stevengj
Copy link
Contributor Author

stevengj commented Dec 21, 2023

Higham (2010) section 7.3, in turn, cites Higham (2008)

image

Higham (2008), section 3.2, writes:
image
image

@stevengj
Copy link
Contributor Author

The Mathias reference is Roy Mathias (1996), which seems to be the earliest cited source for this result. Mathias presents it as an original result:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants