Faulty broadcasting when encountering successive singleton dimensions #382

Xp-speit2018 · 2024-09-30T09:05:22Z

Appreciate your excellent work! It could accelerate my code by 7x.
Despite that, I've found that the broadcasting is a little buggy. Here's a minimum demo to reproduce the bug:

import torch
import pykeops

device = "cuda"
M, nh = 64, 32 # batch dimensions
n, c  = 1, 256 # number dimensions
d = 2          # spatial dimension

X = torch.randn(M, nh, n, d, device=device)
C = torch.randn(M, 1, c, d, device=device)

print("X.shape", X.shape)
print("C.shape", C.shape)
print("cdist shape", torch.cdist(X, C, p=2).shape)
codes_torch = torch.cdist(X, C, p=2).argmin(dim=3)
print("codes_torch.shape", codes_torch.shape)
print()

X_i = pykeops.torch.LazyTensor(X.unsqueeze(3))
C_j = pykeops.torch.LazyTensor(C.unsqueeze(2))
D_ij = ((X_i - C_j) ** 2).sum(4)
codes_keops = D_ij.argmin(dim=3).squeeze(3)
print("codes_keops.shape", codes_keops.shape)
print()

print("Equal?", torch.allclose(codes_torch, codes_keops))
print("Match percentage", (codes_torch == codes_keops).float().mean().item())

Note I'm creating C with the second batch dimension to be 1 and C is later unsqueezed, indicating two successive implicit broadcast on subtraction.

The above code will output:

X.shape torch.Size([64, 32, 1, 2])
C.shape torch.Size([64, 1, 256, 2])
cdist shape torch.Size([64, 32, 1, 256])
codes_torch.shape torch.Size([64, 32, 1])

codes_keops.shape torch.Size([64, 32, 1])

Equal? False
Match percentage 0.00439453125

I've tested the above codes with both cuda and cpu backends and the bug seems to be irrelevant to computing devices.

Several workarounds could help to correct the match percentage to 1:

explicitly broadcasting: C = C.expand(M, nh, c, d) resolves the issue. Since expansion in pytorch will not lead to direct memory copy, this approach has zero overhead.
increase the number of points in batches of X: Setting n>1 will also avoid two successive singleton dimensions. However not practical in my case as I'm producing exactly 1 vector of X during each iteration.

The text was updated successfully, but these errors were encountered:

Xp-speit2018 · 2024-09-30T09:24:33Z

Explicitly broadcasting indeed leads to an overhead:

[pyKeOps] Warning : at least one of the input tensors is not contiguous. Consider using contiguous data arrays to avoid unnecessary copies.
codes_keops.shape torch.Size([64, 32, 1])

So do you have any quick fixes?

joanglaunes · 2024-11-29T11:25:40Z

Hello @Xp-speit2018 ,
Sorry for this very late reply. We are indeed still having a bug with batch dimensions for parameter variables, as noticed previously in issue #366. In your example, the fact that you set n=1 makes the X_i variable to be understood as a parameter and not a "i-indexed" variable, in the KeOps terminology, and then we get the bug. We really need to urgently solve this issue, but there is indeed a workaround for this, which is to force KeOps to consider X_i as "i-indexed", by defining it as a symbolic LazyTensor first, and then pass the actual X tensor at the final call of the reduction. Here is the modified code in your case:

import torch
import pykeops

device = "cuda"
M, nh = 64, 32 # batch dimensions
n, c  = 1, 256 # number dimensions
d = 2          # spatial dimension

X = torch.randn(M, nh, n, d, device=device)
C = torch.randn(M, 1, c, d, device=device)

print("X.shape", X.shape)
print("C.shape", C.shape)
print("cdist shape", torch.cdist(X, C, p=2).shape)
codes_torch = torch.cdist(X, C, p=2).argmin(dim=3)
print("codes_torch.shape", codes_torch.shape)
print()

# define X_i as symbolic LazyTensor with ind=0, dim=2 and cat=0 (i.e. i-indexed) :
X_i = pykeops.torch.LazyTensor((0,2,0))
C_j = pykeops.torch.LazyTensor(C.unsqueeze(2))
D_ij = ((X_i - C_j) ** 2).sum(4)
# now we input X for the actual computation:
codes_keops = D_ij.argmin(dim=3)(X).squeeze(3)
# (explanation: since D_ij.argmin(dim=3) is symbolic, it outputs a function instead of a tensor,
# and we call it with the actual tensor X as input)

print("codes_keops.shape", codes_keops.shape)
print()

print("Equal?", torch.allclose(codes_torch, codes_keops))
print("Match percentage", (codes_torch == codes_keops).float().mean().item())

Xp-speit2018 · 2024-12-02T03:29:06Z

Hi @joanglaunes,
Thanks for the workaround—it works perfectly! The explanation was very enlightening and helped me better understand how KeOps handles batch dimensions and symbolic LazyTensors.

I’m truly grateful for your time and effort in addressing this. Hope this issue can be resolved in the future!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faulty broadcasting when encountering successive singleton dimensions #382

Faulty broadcasting when encountering successive singleton dimensions #382

Xp-speit2018 commented Sep 30, 2024

Xp-speit2018 commented Sep 30, 2024 •

edited

Loading

joanglaunes commented Nov 29, 2024

Xp-speit2018 commented Dec 2, 2024

Faulty broadcasting when encountering successive singleton dimensions #382

Faulty broadcasting when encountering successive singleton dimensions #382

Comments

Xp-speit2018 commented Sep 30, 2024

Xp-speit2018 commented Sep 30, 2024 • edited Loading

joanglaunes commented Nov 29, 2024

Xp-speit2018 commented Dec 2, 2024

Xp-speit2018 commented Sep 30, 2024 •

edited

Loading