-
Notifications
You must be signed in to change notification settings - Fork 74.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to convert a custom keras.utils.Sequence custom class to a tf.Data pipeline? #39523
Comments
Can you please explain more about your customs class. To understand the difference between keras.utils.Sequence and a tf.Data pipeline you can take a look at the following question. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you. |
Closing this issue as it has been inactive for 2 weeks. Please add additional comments for us to open this issue again. Thanks! |
I was facing exactly the same question. In particular when upgrading certain routines such as data generation from TF1 to TF2. The mentioned statement in the documentation seems confusing. I am also facing deadlocks irregularly and in a non-reproducible manner. When using generators, derived from If so, would that be a recommended way to switch over to I too think the documentation is very confusing at that point... |
Hey @MaxSchambach, glad to hear that I'm not alone in this issue! Unfortunately, there doesn't seem to be a good way to move over to tf.Data without significant rewriting. I still do think this should be updated in the docs, so I'd like to reopen the issue. However, if it's any consolation, there ARE ways to get around deadlocking using the current Sequence framework. It's very hacky and customized to our problem, but it works. I didn't write that particular fix, but the general overview is that you need a pretty good understanding of the multiprocessing library. You need to use the multiprocessing Lock() functionality to continually .acquire() or .release() whatever your data is (Spark files in our case) to ensure that the underlying Tensorflow threads don't try to grab onto multiple files at the same time, all which calling the garbage collector to immediately collect any stray data and prevent memory leaks. I don't think I'd really be able to share a code snippet of how it works, purely due to how specific to our problem it is. Personally, I'd just move to Pytorch if your data loading setup is complex enough to have to use stuff like Sequences or tf.Data. Tensorflow doesn't seem well designed for unusual training schemes. |
Hi @Abhishaike , @MaxSchambach. Thank you for your points and suggestions regarding the issue with Sequence dataloader. I came across this issue trying to overcome the same situation. Using |
Training just stopped at the very first epoch with this message:
It's really disappointing to get such an issue. Rewriting
|
I am currently facing the same issue. I read somewhere that |
It seems that TF Keras is sensitive to Sequence implementations not being thread-safe or process-safe. I've been having horrible problems migrating my data pipelines using generators/sequences to TF 2. But there are some observations and possible bugs in TF Keras. Assuming we use a Sequence with multiprocessing (and TF 2.3).
Thus even if we lazily initialize some thread-unsafe state within the Sequence instance it gets initialized in the main process and then copied to the subprocess! Also modifying the state in the hook modifies only the instance in the main process not in the subprocess. I tried to use the basic Keras MNIST convnet example, wrapped the arrays with a Sequence and trace the process ids: https://gist.github.com/bzamecnik/dcc1d1a39f3e4fa7ac5733d80b79fa2d (code + logs) In general Keras supports Sequences and multiprocessing (at least in TF 2.3) but if there's anything thread-unsafe in the Sequence it fails. |
I have a hack For example, you have a class inherited from keras.utils.Sequence
and code that uses it, for example:
---------------A hack---------------
And now the most interesting part:
It would look like this:
And one more advantage: it's easy to return to the previous implementation. |
Hi! |
You are right, thank you, I forgot to copy indexing!
Was: I've also edited my first answer. |
-- For those that couldn't get the above impl to work, I was getting the below error: Error resolved by initializing the sequence generator within the generator.
Was really hoping to see the GPU utility increase, but mine still fluctuates from 0 to 100% during training, likely due to usage of python methods as mentioned in this SO post or just pending further tweaking of the tf.dataset options ... regardless, this is one elegant implementation! --
|
My current implementation of get_item returns a batch of elements like this
Unfortunately this raises the following error |
@everyone If this is still an issue, Can you please create this issue in keras-team/keras as the development of keras has been moved to a different repo and the keras team is inactive here. I'm sorry for the premature closure but if someone creates an issue, I'll be happy to look into it. Thanks!! |
what a great post, and very inspiring to me. Thanks a lot to all the suggestion made above. |
When I was building up my data pipeline, the Tensorflow docs were very insistent that generators are unsafe for multiprocessing, and that the best way to build up a multiprocessing streaming pipeline is to extend tensorflow.keras.utils.Sequence into your own custom class. This is written here: https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence
So I did that, but now Tensorflow is telling me that Sequence extensions are ALSO not ideal for multiprocessing through the warning message
multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended.
. So now the recommendation is to use tf.Data. And, as it were, I keep running into deadlocks 4~ epochs into training now.Is there no converter between an existing sequence class and a tf.Data pipeline? It seems bizarre that the EXACT thing the Sequence extension class is recommended for seems to no longer work, and now only a brand new type of data pipeline will do the multiprocessing job. At the very least, this should be updated in the Sequence docs.
The text was updated successfully, but these errors were encountered: