[Design Draft] Eindistance #248

arogozhnikov · 2023-03-23T07:25:23Z

One common scenario that can benefit from einsum-like notation, but seemingly was not implemented is computation of pairwise distances.

This draft covers how this functionality may look like in einops.

Example

distances_bthw = eindistance(x_btc, x_bhwc, 'b t c, b h w c -> b t h w', distance='sq_euclid')

In this example distance is computed as a norm over reduced variable c.

Function resembles einsum, but there are several differences:

no trivially reduced axes (i.e. axes present in only one of inputs)
always two inputs
choice of distance
for simplicity and for all practical cases we can assume that only one variable is reduced.

Backend support

cdist. scipy has a cdist function (also replicas in cupy/jax), which does not cover batching (which is super-common in DL code). pytorch has cdist with batching (different interface)

Implementation issues

Trivial implementation (computing difference, taking norm over reduced dimension) is simple to implement, but suffers from inefficiency and high memory consumption.

More efficient approaches available that are highly specific to commonly used norms (euclid, cosine).
However both have some issues with precision (e.g. fast sq_euclid can be negative, and same with cosine).

Previous issues may be exaggerated by usage of low-precision arithmetics (float16 / bfloat16/etc)

No ETA.

The text was updated successfully, but these errors were encountered:

arogozhnikov added the enhancement New feature or request label May 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Design Draft] Eindistance #248

[Design Draft] Eindistance #248

arogozhnikov commented Mar 23, 2023 •

edited

Loading

[Design Draft] Eindistance #248

[Design Draft] Eindistance #248

Comments

arogozhnikov commented Mar 23, 2023 • edited Loading

Example

Backend support

Implementation issues

arogozhnikov commented Mar 23, 2023 •

edited

Loading