Add a GPU implementation of `lax.linalg.eig`. #24663

copybara-service · 2024-11-01T15:58:44Z

Add a GPU implementation of lax.linalg.eig.

This feature has been in the queue for a long time (see #1259), and some folks have found that they can use pure_callback to call the CPU version as a workaround. It has recently come up that there can be issues when using pure_callback with JAX calls in the body (#24255; this should be investigated separately).

This change adds a native solution for computing lax.linalg.eig on GPU. By default, this is implemented by calling LAPACK on host directly because this has good performance for small to moderately sized problems (less than about 2048^2). For larger matrices, a GPU-backed implementation based on MAGMA can have significantly better performance. (I should note that I haven't done a huge amount of benchmarking yet, but this was the breakeven point used by PyTorch, and I find roughly similar behavior so far.)

We don't want to add MAGMA as a required dependency, but if a user has installed it, JAX can use it when the jax_gpu_eig_magma configuration variable is set to "on". By default, we try to dlopen libmagma.so, but the path to a non-standard installation location can be specified using the JAX_GPU_MAGMA_PATH environment variable.

For reasons that I don't yet totally understand, the MAGMA implementation segfaults deep in the MAGMA internals for complex128 inputs, so I've disabled that configuration for now.

PhilipVinc · 2024-11-01T16:36:35Z

Oh....

This is...

Amazing!

Thanks enormously for this, really, it's been on my secretive wishlist for so long...

This feature has been in the queue for a long time (see #1259), and some folks have found that they can use `pure_callback` to call the CPU version as a workaround. It has recently come up that there can be issues when using `pure_callback` with JAX calls in the body (#24255; this should be investigated separately). This change adds a native solution for computing `lax.linalg.eig` on GPU. By default, this is implemented by calling LAPACK on host directly because this has good performance for small to moderately sized problems (less than about 2048^2). For larger matrices, a GPU-backed implementation based on [MAGMA](https://icl.utk.edu/magma/) can have significantly better performance. (I should note that I haven't done a huge amount of benchmarking yet, but this was the breakeven point used by PyTorch, and I find roughly similar behavior so far.) We don't want to add MAGMA as a required dependency, but if a user has installed it, JAX can use it when the `jax_gpu_eig_magma` configuration variable is set to `"on"`. By default, we try to dlopen `libmagma.so`, but the path to a non-standard installation location can be specified using the `JAX_GPU_MAGMA_PATH` environment variable. For reasons that I don't yet totally understand, the MAGMA implementation segfaults deep in the MAGMA internals for complex128 inputs, so I've disabled that configuration for now. PiperOrigin-RevId: 691072237

copybara-service bot assigned dfm Nov 1, 2024

copybara-service bot force-pushed the test_691072237 branch 3 times, most recently from b653418 to d24721b Compare November 6, 2024 20:39

copybara-service bot force-pushed the test_691072237 branch from d24721b to 35199e1 Compare November 6, 2024 20:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a GPU implementation of `lax.linalg.eig`. #24663

Add a GPU implementation of `lax.linalg.eig`. #24663

copybara-service bot commented Nov 1, 2024 •

edited

Loading

PhilipVinc commented Nov 1, 2024

Add a GPU implementation of lax.linalg.eig. #24663

Are you sure you want to change the base?

Add a GPU implementation of lax.linalg.eig. #24663

Conversation

copybara-service bot commented Nov 1, 2024 • edited Loading

PhilipVinc commented Nov 1, 2024

Add a GPU implementation of `lax.linalg.eig`. #24663

Add a GPU implementation of `lax.linalg.eig`. #24663

copybara-service bot commented Nov 1, 2024 •

edited

Loading