Failed to reproduce the results #2

zhuyy0810 · 2024-12-04T13:23:03Z

I attempted to reproduce the experiment in section 5.2.1 of the paper, where DeepONet is used to solve the advection equation. I used the same model architecture and training parameters as described in the paper, with a Trunk size of 4×512, a Branch size of 2×512, and trained for 250,000 iterations. However, the training time and memory usage differ significantly from the results in the paper. When using mixed-precision training, the training time was 2680.338209 seconds, and the memory usage was 736MB. With fp32 training, the training time was 3127.268575 seconds, and the memory usage was 736MB. My environment has TensorFlow version 2.13.1, DeepXDE version 1.10.1, and I trained on NVIDIA GeForce RTX 3090 GPU. When I ran advec_mixed_prec.py, I encountered the error 'The global policy can only be set in TensorFlow 2 or if V2 dtype behavior has been set. To enable V2 dtype behavior, call "tf.compat.v1.keras.layers.enable_v2_dtype_behavior()".' Therefore, I added tf.compat.v1.keras.layers.enable_v2_dtype_behavior() before policy = mixed_precision.Policy('mixed_float16'). The rest of advec_mixed_prec.py and Advection.py, except for the training parameter settings in the main function, are the same as the ones on GitHub.. Below is the main function of the codes I used to train DeepONet with mixed precision and fp32.

nt = 40
nx = 40
x_train, y_train = get_data("/home/zhuyiyan/mixed-precision-sciml-main/Dataset/DeepONEt/Advection_equation_dataset/train_IC2.npz")
x_test, y_test = get_data("/home/zhuyiyan/mixed-precision-sciml-main/Dataset/DeepONEt/Advection_equation_dataset/test_IC2.npz")
data = dde.data.TripleCartesianProd(x_train, y_train, x_test, y_test)

net = dde.maps.DeepONetCartesianProd(
    [nx, 512, 512], [2, 512, 512, 512, 512], "relu", "Glorot normal"
)

model = dde.Model(data, net)
# model.callbacks.append(time_callback(verbose=1))
model.compile(
    "adam",
    lr=1e-3,
    decay=("inverse time", 1, 1e-4),
    metrics=["mean l2 relative error"],
)

# IC1
# losshistory, train_state = model.train(epochs=100000, batch_size=None)
# IC2
# time_callback = TimeCallback()
losshistory, train_state = model.train(epochs=250000, batch_size=None)

y_pred = model.predict(data.test_x)
np.savetxt("y_pred_deeponet.dat", y_pred[0].reshape(nt, nx))
np.savetxt("y_true_deeponet.dat", data.test_y[0].reshape(nt, nx))
np.savetxt("y_error_deeponet.dat", (y_pred[0] - data.test_y[0]).reshape(nt, nx))

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to reproduce the results #2

Failed to reproduce the results #2

zhuyy0810 commented Dec 4, 2024 •

edited

Loading

Failed to reproduce the results #2

Failed to reproduce the results #2

Comments

zhuyy0810 commented Dec 4, 2024 • edited Loading

zhuyy0810 commented Dec 4, 2024 •

edited

Loading