Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faulty broadcasting when encountering successive singleton dimensions #382

Open
Xp-speit2018 opened this issue Sep 30, 2024 · 3 comments
Open

Comments

@Xp-speit2018
Copy link

Appreciate your excellent work! It could accelerate my code by 7x.
Despite that, I've found that the broadcasting is a little buggy. Here's a minimum demo to reproduce the bug:

import torch
import pykeops

device = "cuda"
M, nh = 64, 32 # batch dimensions
n, c  = 1, 256 # number dimensions
d = 2          # spatial dimension

X = torch.randn(M, nh, n, d, device=device)
C = torch.randn(M, 1, c, d, device=device)

print("X.shape", X.shape)
print("C.shape", C.shape)
print("cdist shape", torch.cdist(X, C, p=2).shape)
codes_torch = torch.cdist(X, C, p=2).argmin(dim=3)
print("codes_torch.shape", codes_torch.shape)
print()

X_i = pykeops.torch.LazyTensor(X.unsqueeze(3))
C_j = pykeops.torch.LazyTensor(C.unsqueeze(2))
D_ij = ((X_i - C_j) ** 2).sum(4)
codes_keops = D_ij.argmin(dim=3).squeeze(3)
print("codes_keops.shape", codes_keops.shape)
print()

print("Equal?", torch.allclose(codes_torch, codes_keops))
print("Match percentage", (codes_torch == codes_keops).float().mean().item())

Note I'm creating C with the second batch dimension to be 1 and C is later unsqueezed, indicating two successive implicit broadcast on subtraction.

The above code will output:

X.shape torch.Size([64, 32, 1, 2])
C.shape torch.Size([64, 1, 256, 2])
cdist shape torch.Size([64, 32, 1, 256])
codes_torch.shape torch.Size([64, 32, 1])

codes_keops.shape torch.Size([64, 32, 1])

Equal? False
Match percentage 0.00439453125

I've tested the above codes with both cuda and cpu backends and the bug seems to be irrelevant to computing devices.

Several workarounds could help to correct the match percentage to 1:

  • explicitly broadcasting: C = C.expand(M, nh, c, d) resolves the issue. Since expansion in pytorch will not lead to direct memory copy, this approach has zero overhead.
  • increase the number of points in batches of X: Setting n>1 will also avoid two successive singleton dimensions. However not practical in my case as I'm producing exactly 1 vector of X during each iteration.
@Xp-speit2018
Copy link
Author

Xp-speit2018 commented Sep 30, 2024

Explicitly broadcasting indeed leads to an overhead:

[pyKeOps] Warning : at least one of the input tensors is not contiguous. Consider using contiguous data arrays to avoid unnecessary copies.
codes_keops.shape torch.Size([64, 32, 1])

So do you have any quick fixes?

@joanglaunes
Copy link
Contributor

Hello @Xp-speit2018 ,
Sorry for this very late reply. We are indeed still having a bug with batch dimensions for parameter variables, as noticed previously in issue #366. In your example, the fact that you set n=1 makes the X_i variable to be understood as a parameter and not a "i-indexed" variable, in the KeOps terminology, and then we get the bug. We really need to urgently solve this issue, but there is indeed a workaround for this, which is to force KeOps to consider X_i as "i-indexed", by defining it as a symbolic LazyTensor first, and then pass the actual X tensor at the final call of the reduction. Here is the modified code in your case:

import torch
import pykeops

device = "cuda"
M, nh = 64, 32 # batch dimensions
n, c  = 1, 256 # number dimensions
d = 2          # spatial dimension

X = torch.randn(M, nh, n, d, device=device)
C = torch.randn(M, 1, c, d, device=device)

print("X.shape", X.shape)
print("C.shape", C.shape)
print("cdist shape", torch.cdist(X, C, p=2).shape)
codes_torch = torch.cdist(X, C, p=2).argmin(dim=3)
print("codes_torch.shape", codes_torch.shape)
print()

# define X_i as symbolic LazyTensor with ind=0, dim=2 and cat=0 (i.e. i-indexed) :
X_i = pykeops.torch.LazyTensor((0,2,0))
C_j = pykeops.torch.LazyTensor(C.unsqueeze(2))
D_ij = ((X_i - C_j) ** 2).sum(4)
# now we input X for the actual computation:
codes_keops = D_ij.argmin(dim=3)(X).squeeze(3)
# (explanation: since D_ij.argmin(dim=3) is symbolic, it outputs a function instead of a tensor,
# and we call it with the actual tensor X as input)

print("codes_keops.shape", codes_keops.shape)
print()

print("Equal?", torch.allclose(codes_torch, codes_keops))
print("Match percentage", (codes_torch == codes_keops).float().mean().item())

@Xp-speit2018
Copy link
Author

Hi @joanglaunes,
Thanks for the workaround—it works perfectly! The explanation was very enlightening and helped me better understand how KeOps handles batch dimensions and symbolic LazyTensors.

I’m truly grateful for your time and effort in addressing this. Hope this issue can be resolved in the future!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants