Resolves coredump caused by tf.data.experimental.save
with prefetch
#49383
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Repeat and prefetch in combination cause the snapshot reader Initialize function to be invoked multiple times.
However, there is nothing to prefetch on the very last iteration. This results in Prefetch issuing a CancelThreads call while the snapshot thread is trying to initialize. See
tensorflow/tensorflow/core/kernels/data/prefetch_dataset_op.cc
Line 151 in 6446dda
Currently the dataset reference counting is done asymmetrically. The reference increment happens at the end of initialization, where as the reference decrement
happens in a destructor. When prefetch cancels the snapshot thread, it errors out of the initialization function. And stops calling the reference increment. However, the reference decrement happens regardless, as it is in the destructor which always is invoked during cleanup. This results in an attempt to decrement the null dataset pointer, and therefore a segmentation fault.
This is different from all other dataset ops, where the dataset reference increment happens in the constructor and the decrement happens in the destructor, which are symmetric.
The solution to this is to ensure that the dataset reference is always initialized to nullptr, and to check for null when decrementing the dataset reference.