[AIR] Allow user to pass model to `TensorflowCheckpoint.get_model` #31203

bveeramani · 2022-12-19T23:38:10Z

Signed-off-by: Balaji Veeramani balaji@anyscale.com

Why are these changes needed?

When you resume training a TensorFlow model, you may need to create an unnecessary lambda to load model weights. This is because TensorflowCheckplint.get_model expects a model definition, but you may have already constructed your model. This PR improves the UX by letting users directly pass a model to get_model.

Before

def train_loop_per_worker():
    ...

    model = tf.keras.applications.MobileNetV3Small(input_shape=(3, 260, 260))
    if checkpoint is not None:
        model = checkpoint.get_model(lambda: model)

After

def train_loop_per_worker():
    ...

    model = tf.keras.applications.MobileNetV3Small(input_shape=(3, 260, 260))
    if checkpoint is not None:
        model = checkpoint.get_model(model)

Related issue number

Closes #31290

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

doc/source/conf.py

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

amogkam

Nice! left some small comments.

python/ray/train/tensorflow/tensorflow_checkpoint.py

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

…31203) When you resume training a TensorFlow model, you may need to create an unnecessary lambda to load model weights. This is because TensorflowCheckplint.get_model expects a model definition, but you may have already constructed your model. This PR improves the UX by letting users directly pass a model to get_model. Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

…ay-project#31203) When you resume training a TensorFlow model, you may need to create an unnecessary lambda to load model weights. This is because TensorflowCheckplint.get_model expects a model definition, but you may have already constructed your model. This PR improves the UX by letting users directly pass a model to get_model. Signed-off-by: Balaji Veeramani <balaji@anyscale.com> Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

Initial commit

29a1e0d

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

bveeramani assigned xwjiang2010 Dec 19, 2022

Add test

3c6445f

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

bveeramani requested a review from a team as a code owner December 19, 2022 23:57

bveeramani assigned amogkam Dec 19, 2022

amogkam reviewed Dec 20, 2022

View reviewed changes

doc/source/conf.py Outdated Show resolved Hide resolved

Update conf.py

e4b77c0

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

amogkam reviewed Dec 20, 2022

View reviewed changes

bveeramani added 3 commits December 21, 2022 19:21

Replace model_definition with model

85f4dc3

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

Update tensorflow_checkpoint.py

799d736

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

Merge remote-tracking branch 'upstream/master' into get-model

d315436

Signed-off-by: Balaji Veeramani <balaji@anyscale.com>

amogkam approved these changes Jan 3, 2023

View reviewed changes

amogkam merged commit c83a8c7 into ray-project:master Jan 4, 2023

bveeramani deleted the get-model branch January 4, 2023 00:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AIR] Allow user to pass model to `TensorflowCheckpoint.get_model` #31203

[AIR] Allow user to pass model to `TensorflowCheckpoint.get_model` #31203

bveeramani commented Dec 19, 2022 •

edited

Loading

amogkam left a comment

[AIR] Allow user to pass model to TensorflowCheckpoint.get_model #31203

[AIR] Allow user to pass model to TensorflowCheckpoint.get_model #31203

Conversation

bveeramani commented Dec 19, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

amogkam left a comment

Choose a reason for hiding this comment

[AIR] Allow user to pass model to `TensorflowCheckpoint.get_model` #31203

[AIR] Allow user to pass model to `TensorflowCheckpoint.get_model` #31203

bveeramani commented Dec 19, 2022 •

edited

Loading