Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FD-based gradient calculation seems incorrect for Burgers (with code to verify) #20

Open
kuangdai opened this issue May 2, 2023 · 2 comments

Comments

@kuangdai
Copy link

kuangdai commented May 2, 2023

The issue

I am trying to understand how gradients are computed for Burgers, implemented by FDM_Burgers() in train_utils/losses.py, as pasted below:

def FDM_Burgers(u, v, D=1):
    batchsize = u.size(0)
    nt = u.size(1)
    nx = u.size(2)

    u = u.reshape(batchsize, nt, nx)
    dt = D / (nt-1)
    dx = D / (nx)

    u_h = torch.fft.fft(u, dim=2)
    # Wavenumbers in y-direction
    k_max = nx//2
    k_x = torch.cat((torch.arange(start=0, end=k_max, step=1, device=u.device),
                     torch.arange(start=-k_max, end=0, step=1, device=u.device)), 0).reshape(1,1,nx)
    ux_h = 2j *np.pi*k_x*u_h
    uxx_h = 2j *np.pi*k_x*ux_h
    ux = torch.fft.irfft(ux_h[:, :, :k_max+1], dim=2, n=nx)
    uxx = torch.fft.irfft(uxx_h[:, :, :k_max+1], dim=2, n=nx)
    ut = (u[:, 2:, :] - u[:, :-2, :]) / (2 * dt)
    Du = ut + (ux*u - v*uxx)[:,1:-1,:]
    return Du, ut, ux, uxx

It is clear that you are using finite difference (FD) to compute $u_t$, ut = (u[:, 2:, :] - u[:, :-2, :]) / (2 * dt). For $u_x$, you are computing it in the Fourier space as described in the paper. However, it seems to me that what you are doing here (i.e., only one round of FFT, wavenumber multiplication, and IFFT) is insufficient; for example, the pointwise activation functions are not included at all. I do not understand why $u_x$ and $u_{xx}$ could be computed in such a simple way.

Benchmark with autograd

To understand this question, I check the results against autograd. These are what I have done:

  1. extend the returns of FDM_Burgers() and PINO_loss() in train_utils/losses.py to expose the gradient outputs;
  2. compare FD and autograd results in the training method train_2d_burger() in train_utils/train_2d.py;
  3. a minor fix in train_burgers.py to run debug on CPU.

For quick reference, this is my code for the FD vs autograd comparison in step 2:

for x, y in train_loader:
    # make x require grad
    x.requires_grad = True

    x, y = x.to(rank), y.to(rank)
    out = model(x).reshape(y.shape)
    data_loss = myloss(out, y)

    ####################
    # BENCHMARK ut, ux #
    ####################
    # results from FDM
    loss_u, loss_f, Du, ut, ux, uxx = PINO_loss(out, x[:, 0, :, 0], v)

    from torch.autograd import grad
    g_AD = grad(out.sum(), x, create_graph=True)[0]

    # from datasets.py
    # Xs = torch.stack([Xs, gridx.repeat([n_sample, self.T, 1]),
    #                   gridt.repeat([n_sample, 1, self.s])], dim=3)
    ux_AD = g_AD[:, :, :, 1]  # x coordinates -> second dim
    ut_AD = g_AD[:, :, :, 2]  # t coordinates -> third dim

    print('Difference for ut')
    print(ut_AD[0, 1:-1] - ut[0])

    print('\n\nDifference for ux')
    print(ux_AD[0] - ux[0])
    assert False, 'Stop for debug'

If you replace the original source files with the attached three files and run

python3 train_burgers.py --config_path configs/pretrain/burgers-pretrain.yaml --mode train

you should be able to get some outputs similar to the following:

Difference for ut
tensor([[ 1.1273e-05,  7.0436e-06,  4.2944e-06,  ...,  4.5442e-05,
          3.7894e-05,  3.1451e-05],
        [-1.2425e-05, -1.6239e-05, -1.7828e-05,  ...,  9.3258e-06,
          3.6445e-06, -5.4443e-07],
        [-2.9822e-05, -3.1391e-05, -3.2599e-05,  ..., -2.1410e-05,
         -2.2994e-05, -2.5690e-05],
        ...,
        [-3.7912e-05, -3.9732e-05, -4.0075e-05,  ..., -2.7271e-05,
         -3.0192e-05, -3.2881e-05],
        [-3.5721e-05, -3.9391e-05, -4.0469e-05,  ..., -2.0454e-06,
         -7.5690e-06, -1.2858e-05],
        [-2.4116e-05, -2.8111e-05, -3.0634e-05,  ...,  3.3159e-05,
          2.4668e-05,  1.7543e-05]], grad_fn=<SubBackward0>)

Difference for ux
tensor([[ 0.0319, -0.0136,  0.0091,  ...,  0.0092, -0.0135,  0.0320],
        [ 0.0319, -0.0136,  0.0091,  ...,  0.0092, -0.0135,  0.0320],
        [ 0.0319, -0.0136,  0.0091,  ...,  0.0092, -0.0135,  0.0320],
        ...,
        [ 0.0334, -0.0144,  0.0095,  ...,  0.0097, -0.0142,  0.0336],
        [ 0.0334, -0.0144,  0.0095,  ...,  0.0097, -0.0142,  0.0336],
        [ 0.0334, -0.0144,  0.0095,  ...,  0.0096, -0.0142,  0.0335]],
       grad_fn=<SubBackward0>)

As we can see, the differences between FD and autograd for $u_t$ are quite small, as expected, which also imply that I am using autograd correctly in train_2d_burger(). However, the differences for $u_x$ are exceedingly large, which seems to support my doubt that FDM_Burgers() is insufficent for $u_x$.

@kuangdai
Copy link
Author

kuangdai commented May 2, 2023

source.zip

Please use these three source files to reproduce the benchmark results.

@kuangdai
Copy link
Author

kuangdai commented May 3, 2023

source (1).zip
Further I compared $u_x$ by three methods: finite difference, autograd and FFT-based. The last one is adopted by your original code.

My modified code is attached.

Here are the output from my running:

Difference autograd vs FFT
tensor([[-0.0079,  0.0036, -0.0022,  ..., -0.0022,  0.0035, -0.0080],
        [-0.0079,  0.0036, -0.0022,  ..., -0.0022,  0.0035, -0.0080],
        [-0.0079,  0.0036, -0.0022,  ..., -0.0022,  0.0035, -0.0080],
        ...,
        [-0.0061,  0.0027, -0.0017,  ..., -0.0017,  0.0027, -0.0062],
        [-0.0061,  0.0027, -0.0017,  ..., -0.0017,  0.0027, -0.0062],
        [-0.0061,  0.0027, -0.0017,  ..., -0.0017,  0.0027, -0.0062]],
       grad_fn=<SubBackward0>)


Difference finite-diff vs FFT
tensor([[ 0.0035, -0.0022,  0.0016,  ...,  0.0016, -0.0022,  0.0035],
        [ 0.0035, -0.0022,  0.0016,  ...,  0.0016, -0.0022,  0.0035],
        [ 0.0035, -0.0022,  0.0016,  ...,  0.0016, -0.0022,  0.0035],
        ...,
        [ 0.0027, -0.0017,  0.0012,  ...,  0.0012, -0.0017,  0.0027],
        [ 0.0027, -0.0017,  0.0012,  ...,  0.0012, -0.0017,  0.0027],
        [ 0.0027, -0.0017,  0.0012,  ...,  0.0012, -0.0017,  0.0027]],
       grad_fn=<SubBackward0>)


Difference autograd vs finite-diff
tensor([[-4.8905e-05, -5.7416e-05, -6.7184e-05,  ..., -1.7096e-05,
         -1.1985e-05, -4.2657e-06],
        [-4.8192e-05, -5.6226e-05, -6.6709e-05,  ..., -1.6668e-05,
         -1.1796e-05, -3.8399e-06],
        [-4.7259e-05, -5.5532e-05, -6.5778e-05,  ..., -1.6510e-05,
         -1.1401e-05, -3.2073e-06],
        ...,
        [ 7.5682e-06,  1.5970e-06, -6.3481e-06,  ...,  1.7191e-05,
          2.5546e-05,  3.8179e-05],
        [ 8.3740e-06,  3.1168e-06, -5.0674e-06,  ...,  1.7569e-05,
          2.5923e-05,  3.7840e-05],
        [ 8.7544e-06,  4.2115e-06, -3.4967e-06,  ...,  1.7523e-05,
          2.5875e-05,  3.8267e-05]], grad_fn=<SubBackward0>)

As you can see, autograd and finite difference agree well, but the FFT-IFFT approach yields a big difference from the other two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant