pytorch backend lstm very +10x slow (maybe batch size with pytorch backend has different semantics than the traditional Keras semantics?) #20902

mw66 · 2025-02-14T01:44:09Z

https://stackoverflow.com/questions/78717341/keras-training-speed-with-pytorch-backend-is-a-lot-slower-than-with-tensorflow

"""
I am on native Windows and I used old Keras with TensorFlow 2.10 (GPU accelerated) before. I wanted to try Keras 3 with PyTorch backend. Can someone please help me why this model trains 10x slower with Keras 3.4.1 and PyTorch 2.3.1 backend? With my GPU a single epoch takes a little more than 2 minutes with TF, and over 20 minutes with PyTorch.

import os
os.environ["KERAS_BACKEND"] = "torch"
import torch
torch.cuda.is_available() # <-- returns True

import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras import optimizers
from keras.regularizers import l2

x_train, y_train = np.float32(x_train), np.float32(y_train)
x_val, y_val = np.float32(x_val), np.float32(y_val)

model=Sequential()
reg=0.00001
model.add(LSTM( 80, return_sequences=True , dropout=0.0, kernel_regularizer=l2(reg), recurrent_regularizer=l2(reg), input_shape=(x_train.shape[1], x_train.shape[2]) ))
model.add(LSTM( 80, return_sequences=False, dropout=0.0, kernel_regularizer=l2(reg), recurrent_regularizer=l2(reg) ))
model.add(Dense(40))
model.add(Dense(40))
model.add(Dense(1))
opt = optimizers.Adam(learning_rate=lrate)
model.compile(optimizer=opt, loss='mean_squared_error')

from keras.callbacks import ModelCheckpoint
from keras.callbacks import BackupAndRestore
savecallback = ModelCheckpoint(basefolder+"/"+modelfile, save_best_only=False, monitor='val_loss', mode='min', verbose=1)
backupcallback = BackupAndRestore(basefolder+"/tmp/backup_"+modelfile)

hist=model.fit(x_train, y_train, validation_data=(x_val, y_val), batch_size=batchsize, epochs=20, callbacks=[savecallback, backupcallback])

I verified GPU acceleration with both backends.
"""

mw66 · 2025-02-14T01:46:37Z

https://stackoverflow.com/a/79438138/873275

"""
Experiencing the same problem.

I noticed that with pytorch backend the GPU memory is ~10x smaller, so I increased the batch size to be 16x, then the training speed is 16x faster. Now comparable to the TensorFlow backend (however, the GPU utilization is still low, ~3% vs ~30% with TF).

NOTE: increasing the batch size may affect training quality, which is yet to be compared.

I suspect batch size with pytorch backend has different semantics than the traditional Keras semantics. See here: https://discuss.pytorch.org/t/solved-pytorch-lstm-50x-slower-than-keras-tf-cudnnlstm/10043/8
"""

mw66 · 2025-02-14T01:47:51Z

Related issues:

[Feature Request] Add cuDNN-accelerated LSTM and GRU to PyTorch #19177

"the LSTM and GRU are considerably (several times) slower"

#19177 (comment)

github-actions bot assigned mehtamansi29 Feb 14, 2025

sonali-kumari1 added the backend:torch label Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pytorch backend lstm very +10x slow (maybe batch size with pytorch backend has different semantics than the traditional Keras semantics?) #20902

pytorch backend lstm very +10x slow (maybe batch size with pytorch backend has different semantics than the traditional Keras semantics?) #20902

mw66 commented Feb 14, 2025 •

edited

Loading

mw66 commented Feb 14, 2025 •

edited

Loading

mw66 commented Feb 14, 2025 •

edited

Loading

pytorch backend lstm very +10x slow (maybe batch size with pytorch backend has different semantics than the traditional Keras semantics?) #20902

pytorch backend lstm very +10x slow (maybe batch size with pytorch backend has different semantics than the traditional Keras semantics?) #20902

Comments

mw66 commented Feb 14, 2025 • edited Loading

mw66 commented Feb 14, 2025 • edited Loading

mw66 commented Feb 14, 2025 • edited Loading

mw66 commented Feb 14, 2025 •

edited

Loading

mw66 commented Feb 14, 2025 •

edited

Loading

mw66 commented Feb 14, 2025 •

edited

Loading