-
Notifications
You must be signed in to change notification settings - Fork 74.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence warning when iterating over a dataset #62963
Comments
@sushreebarsa tf 2.15 is not affected, but 2.16 and 2.17 are. |
@sushreebarsa The reason you couldn't reproduce the error in colab is because the warnings are suppressed by default. Could you please check this colab https://colab.research.google.com/drive/1JuQriKXe-aJBAbValQK-8BFGtktzw4IW?usp=sharing ? |
@p-s-p-s TF v2.15 is the latest stable version so error is not appearing there. |
@sushreebarsa I reported this issue in order to make it fixed before 2.16 release. Moreover, tf 2.15 with 04fb826 applied is also internally affected by this issue.
I am not sure what causes the problem, but as a symptomatic solution it is possible to disable some warnings like this:
|
@sachinprasadhs I was able to replicate the issue reported here, please have a look. Thank you! |
Can confirm this issue with tf-nightly
|
Hello, I'd like to look into this issue and try to fix it, if that is possible. |
Issue running default tensorflow training job after docker rebuild only on RTX-A4500
|
My understanding is the error will not affect the execution but the iterator was not usable after the error? https://stackoverflow.com/questions/53930242/how-to-fix-a-outofrangeerror-end-of-sequence-error-when-training-a-cnn-with-t |
@salaki |
… over a dataset
Another example to reproduce this issue with Python 3.12.2 and TensorFlow 2.16.1 is the fourth installment of the introductory videos, "TensorFlow ML Zero to Hero". The fourth part uses this notebook. When training every second epoch falls over with the issue reported here. When I switch to TensorFlow 2.15.1, I also have to downgrade to Python version 3.11.8 which is something I'd like to avoid. Ideally TensorFlow 2.15.1 should be made available to the most recent stable release of Python at least until a newer stable version of TensorFlow becomes available. Combo Python 3.11.8 and TensorFlow 2.15.1 works for the given notebook. Here is the link to that notebook that I mentioned. It runs fine online but not locally if using Python 3.12.2 and TensorFlow 2.16.1. This link is also accessible form the description in the video at https://www.youtube.com/watch?v=u2TjZzNuly8 I hope having another example to reproduce the problem helps with resolving this issue. Keep up the good work! |
@google-admin @goolge Just please fire all these "issue triagers". They are a waste of our time, and a waste of your money. All they do is copy paste the code in collab with blindfolds, fuck it up with a 90% chance, and tell you you're wrong. They are a disgrace to our intellect. |
Hi all, this didn't make it our (tf.data team's) way until just now, when an internal user flagged it. This should be fixed with 4924ec6. |
The error is even on the official tf website, so hopefully it will soon be fixed. |
Similar error. Fixed it by removing the steps_per_epoch argument from model.fit() and model.evaluate() import sys physical_devices = tf.config.list_physical_devices('GPU') Invalid device or cannot modify virtual devices once initialized.pass define cnn modeldef define_model(): compile modelopt = SGD(learning_rate=0.001, momentum=0.9) create data generatordatagen = ImageDataGenerator(rescale=1.0/255.0) prepare iteratorstrain_it = datagen.flow_from_directory('/workspace/workspace/cats_and_dogs_data/dogs-vs-cats/train/', fit modelhistory = model.fit(train_it, validation_data=test_it, epochs=20, verbose=1) evaluate model_, acc = model.evaluate(test_it, verbose=1) |
I can reproduce the warning on Python 3.12 and TF 2.16. In addition, when my (custom) dataset has this 'issue' then I also get messages when calling
PS: I DO have enough data. |
Why would |
In my case I am not using |
same here. using repeat() still see this error. |
@rytis-paskauskas That warning is not related to TensorFlow, but Keras. During training and evaluation your code is wrapped in a In particular, |
@miticollo I don't think that's the case. Please take a look at the same |
@arianmaghsoudnia In that comment, I referenced only to this warning:
and not to
The first warning comes from Keras and, as I explained above, it can be ignored. While the latter warning comes from TensorFlow. |
@miticollo You're correct about distinguishing between the logs from TensorFlow and Keras. However, the Keras warning appears because there's an underlying issue on the TensorFlow side. The core problem is that the |
I am getting this issue on CPU from Python 3.11 and Tensorflow 2.16.1:
Training crashes during epoch 1. Edit: |
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
Yes
Source
binary
TensorFlow version
tf 2.16
Custom code
Yes
OS platform and distribution
Linux Ubuntu 22.04
Mobile device
No response
Python version
3.10
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
No response
GPU model and memory
No response
Current behavior?
There is a warning which appears after the last iteration over a dataset:
W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
This warning report was introduced by this commit: 04fb826
I believe that simple iteration over a dataset shouldn't cause such behavior.
Standalone code to reproduce the issue
Relevant log output
The text was updated successfully, but these errors were encountered: