Use tf.shape for graph mode support #6

edknv · 2023-02-14T23:44:42Z

Why

With custom layers/models in graph mode, it seems to have trouble inferring shapes because .shape is used in call() instead of tf.shape(), and I'm running into issues such as:

[1,0]<stdout>:E                               TypeError: Exception encountered when calling layer "distributed_embedding" "                 f"(type DistributedEmbedding).
[1,0]<stdout>:E
[1,0]<stdout>:E                               in user code:
[1,0]<stdout>:E
[1,0]<stdout>:E                                   File "/usr/local/lib/python3.8/dist-packages/distributed_embeddings/python/layers/dist_model_parallel.py", line 503, in call  *
[1,0]<stdout>:E                                       outputs = self._call_base(inputs)
[1,0]<stdout>:E                                   File "/usr/local/lib/python3.8/dist-packages/distributed_embeddings/python/layers/dist_model_parallel.py", line 279, in _call_base  *
[1,0]<stdout>:E                                       global_splits.append(sum(local_splits[-1]))
[1,0]<stdout>:E
[1,0]<stdout>:E                                   TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
[1,0]<stdout>:E
[1,0]<stdout>:E
[1,0]<stdout>:E                               Call arguments received by layer "distributed_embedding" "                 f"(type DistributedEmbedding):
[1,0]<stdout>:E                                 • inputs=['tf.Tensor(shape=(None, None), dtype=int64)', 'tf.Tensor(shape=(None, None), dtype=int64)']

We've run into this error while trying to integrate distributed-embeddings into Merlin Models (NVIDIA-Merlin/models#974).

From Tensorflow docs:

tf.shape and Tensor.shape should be identical in eager mode. Within tf.function or within a compat.v1 context, not all dimensions may be known until execution time. Hence, when defining custom layers and models for graph mode, prefer the dynamic tf.shape(x) over the static x.shape.

This PR proposes to replace .shape with tf.shape(). According to the doc quoted above, it seems that tf.shape() has advantages because it should behave the same as .shape in eager mode and will also work correctly in graph mode. With the changes in this PR, I've managed to make the custom model work without errors.

Testing

Manually ran

python distributed_embeddings/python/layers/dist_model_parallel_test.py
python tests/dist_model_parallel_test.py

This PR includes a variable name change that makes the tests in python tests/dist_model_parallel_test.py pass.

edknv · 2023-02-14T23:48:25Z

@FDecaYed @skyw It looks like I don't have access to assign reviewers. Can you please review this?

FDecaYed · 2023-02-22T12:50:05Z

@edknv Thanks for the contribution. We've seen this before and I agree this need to be fixed.
There is one issue:
from our testing, use of tf.shape() is not always strictly better than tensor.shape, at least in current tf version. It could create some blocking copy between host/device.
one way to deal with the change is something like

# to get shape[0]
dim0 = tf.shape(tensor)[0] if tensor.shape[0] is None else tensor.shape[0]
# to get shape
tshape = tf.shape(tensor) if None in tensor.shape else tensor.shape

Use tf.shape for graph mode support

a50037f

edknv mentioned this pull request Feb 15, 2023

Introduce distributed embeddings NVIDIA-Merlin/models#974

Open

FDecaYed self-requested a review February 22, 2023 12:39

Fall back to tf.shape only when static shape is not available

bf76a45

FDecaYed approved these changes Feb 23, 2023

View reviewed changes

FDecaYed merged commit 6c401c3 into NVIDIA-Merlin:main Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use tf.shape for graph mode support #6

Use tf.shape for graph mode support #6

edknv commented Feb 14, 2023 •

edited

Loading

edknv commented Feb 14, 2023 •

edited

Loading

FDecaYed commented Feb 22, 2023

Use tf.shape for graph mode support #6

Use tf.shape for graph mode support #6

Conversation

edknv commented Feb 14, 2023 • edited Loading

Why

Testing

edknv commented Feb 14, 2023 • edited Loading

FDecaYed commented Feb 22, 2023

edknv commented Feb 14, 2023 •

edited

Loading

edknv commented Feb 14, 2023 •

edited

Loading