Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLOCKED: Convert pipeline to be TPU compatible #12

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

julianrocha
Copy link
Collaborator

After research and experiments, our model architecture is likely compatible with TPU's on collab, however, our data generator is not.

The current setup in this PR is able to attach to collab TPUs, compile the model, and start an epoch. However it fails with Unavailable: {{function_node __inference_train_function_14623}} failed to connect to all addresses... As per this, the error is likely being caused by the data generator as "tf.keras.utils.Sequence on TPUs is not supported as it uses py_function underlyingly", this means it must be converted to tf.data

As per this, tf.data preprocessing must use TensorFlow operations exclusively. We use PIL library for cropping, so converting data generator to tf.data would require refactoring of this functionality at least. For now, it appears the swap over, while possible, is likely too much work. Stacked Hourglass Networks for Human Pose Estimation paper that we based our model off of was able to train in 3 days on 12GB NVIDIA TitanX GPU.

@julianrocha julianrocha added the wontfix This will not be worked on label Mar 20, 2021
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@robertklee
Copy link
Owner

Adding a few resources here:

tensorflow/tensorflow#39523

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants