BLOCKED: Convert pipeline to be TPU compatible #12
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After research and experiments, our model architecture is likely compatible with TPU's on collab, however, our data generator is not.
The current setup in this PR is able to attach to collab TPUs, compile the model, and start an epoch. However it fails with
Unavailable: {{function_node __inference_train_function_14623}} failed to connect to all addresses
... As per this, the error is likely being caused by the data generator as "tf.keras.utils.Sequence
on TPUs is not supported as it uses py_function underlyingly", this means it must be converted totf.data
As per this,
tf.data
preprocessing must use TensorFlow operations exclusively. We usePIL
library for cropping, so converting data generator totf.data
would require refactoring of this functionality at least. For now, it appears the swap over, while possible, is likely too much work.Stacked Hourglass Networks for Human Pose Estimation
paper that we based our model off of was able to train in 3 days on 12GB NVIDIA TitanX GPU.