Description
When I was building up my data pipeline, the Tensorflow docs were very insistent that generators are unsafe for multiprocessing, and that the best way to build up a multiprocessing streaming pipeline is to extend tensorflow.keras.utils.Sequence into your own custom class. This is written here: https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence
So I did that, but now Tensorflow is telling me that Sequence extensions are ALSO not ideal for multiprocessing through the warning message multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended.
. So now the recommendation is to use tf.Data. And, as it were, I keep running into deadlocks 4~ epochs into training now.
Is there no converter between an existing sequence class and a tf.Data pipeline? It seems bizarre that the EXACT thing the Sequence extension class is recommended for seems to no longer work, and now only a brand new type of data pipeline will do the multiprocessing job. At the very least, this should be updated in the Sequence docs.