Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to reproduce the results #2

Open
zhuyy0810 opened this issue Dec 4, 2024 · 0 comments
Open

Failed to reproduce the results #2

zhuyy0810 opened this issue Dec 4, 2024 · 0 comments

Comments

@zhuyy0810
Copy link

zhuyy0810 commented Dec 4, 2024

I attempted to reproduce the experiment in section 5.2.1 of the paper, where DeepONet is used to solve the advection equation. I used the same model architecture and training parameters as described in the paper, with a Trunk size of 4×512, a Branch size of 2×512, and trained for 250,000 iterations. However, the training time and memory usage differ significantly from the results in the paper. When using mixed-precision training, the training time was 2680.338209 seconds, and the memory usage was 736MB. With fp32 training, the training time was 3127.268575 seconds, and the memory usage was 736MB. My environment has TensorFlow version 2.13.1, DeepXDE version 1.10.1, and I trained on NVIDIA GeForce RTX 3090 GPU. When I ran advec_mixed_prec.py, I encountered the error 'The global policy can only be set in TensorFlow 2 or if V2 dtype behavior has been set. To enable V2 dtype behavior, call "tf.compat.v1.keras.layers.enable_v2_dtype_behavior()".' Therefore, I added tf.compat.v1.keras.layers.enable_v2_dtype_behavior() before policy = mixed_precision.Policy('mixed_float16'). The rest of advec_mixed_prec.py and Advection.py, except for the training parameter settings in the main function, are the same as the ones on GitHub.. Below is the main function of the codes I used to train DeepONet with mixed precision and fp32.

nt = 40
nx = 40
x_train, y_train = get_data("/home/zhuyiyan/mixed-precision-sciml-main/Dataset/DeepONEt/Advection_equation_dataset/train_IC2.npz")
x_test, y_test = get_data("/home/zhuyiyan/mixed-precision-sciml-main/Dataset/DeepONEt/Advection_equation_dataset/test_IC2.npz")
data = dde.data.TripleCartesianProd(x_train, y_train, x_test, y_test)

net = dde.maps.DeepONetCartesianProd(
    [nx, 512, 512], [2, 512, 512, 512, 512], "relu", "Glorot normal"
)

model = dde.Model(data, net)
# model.callbacks.append(time_callback(verbose=1))
model.compile(
    "adam",
    lr=1e-3,
    decay=("inverse time", 1, 1e-4),
    metrics=["mean l2 relative error"],
)

# IC1
# losshistory, train_state = model.train(epochs=100000, batch_size=None)
# IC2
# time_callback = TimeCallback()
losshistory, train_state = model.train(epochs=250000, batch_size=None)

y_pred = model.predict(data.test_x)
np.savetxt("y_pred_deeponet.dat", y_pred[0].reshape(nt, nx))
np.savetxt("y_true_deeponet.dat", data.test_y[0].reshape(nt, nx))
np.savetxt("y_error_deeponet.dat", (y_pred[0] - data.test_y[0]).reshape(nt, nx))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant