[BUG] Model schemas and serving signatures don't align with each other #924

karlhigley · 2022-12-15T16:24:32Z

Bug description

The saved input and output schema files for Tensorflow models contain information that doesn't match the saved model's serving signature. The serving signature also seems to record dimensions inconsistently.

Steps/Code to reproduce bug

import pytest
import tensorflow as tf

import merlin.models.tf as mm
from merlin.datasets.synthetic import generate_data
from merlin.io import Dataset
from merlin.schema import Tags
from merlin.schema.io.tensorflow_metadata import TensorflowMetadata

def test_model_schemas_match_serving_signature(tmpdir):

    train = generate_data("sequence-testing", num_rows=100)

    seq_schema = train.schema.select_by_tag(Tags.SEQUENCE).select_by_tag(Tags.CATEGORICAL)

    target = train.schema.select_by_tag(Tags.ITEM_ID).column_names[0]
    predict_last = mm.SequencePredictLast(schema=seq_schema, target=target)

    input_schema = seq_schema
    output_schema = seq_schema.select_by_name(target)

    train = Dataset(train.to_ddf(columns=input_schema.column_names).compute())
    loader = mm.Loader(train, batch_size=16, shuffle=False)

    d_model = 48
    query_encoder = mm.Encoder(
        mm.InputBlockV2(
            input_schema,
            embeddings=mm.Embeddings(
                input_schema.select_by_tag(Tags.CATEGORICAL), sequence_combiner=None
            ),
        ),
        mm.MLPBlock([d_model]),
        mm.GPT2Block(d_model=d_model, n_head=2, n_layer=2),
        tf.keras.layers.Lambda(lambda x: tf.reduce_mean(x, axis=1)),
    )

    model = mm.RetrievalModelV2(
        query=query_encoder,
        output=mm.ContrastiveOutput(output_schema, negative_samplers="in-batch"),

    )

    model.compile(metrics={})
    model.fit(loader, epochs=1, pre=predict_last)

    query_encoder.save(
        f"{tmpdir}/query_encoder"
    )

    input_schema = TensorflowMetadata.from_json_file(f"{tmpdir}/query_encoder/.merlin/input_schema.json").to_merlin_schema()
    output_schema = TensorflowMetadata.from_json_file(f"{tmpdir}/query_encoder/.merlin/output_schema.json").to_merlin_schema()

    breakpoint()

Then compare the input and output schema (and especially the value counts) to the serving signature:

(Pdb) input_schema
[
    {'name': 'item_id_seq', 'tags': {<Tags.ITEM: 'item'>, <Tags.CATEGORICAL: 'categorical'>, <Tags.ITEM_ID: 'item_id'>, <Tags.ID: 'id'>, <Tags.SEQUENCE: 'sequence'>}, 'properties': {'domain': {'min': 1, 'max': 51996, 'name': 'item_id_seq'}, 'value_count': {'min': 4}}, 'dtype': dtype('int64'), 'is_list': True, 'is_ragged': True},
    {'name': 'categories', 'tags': {<Tags.SEQUENCE: 'sequence'>, <Tags.ITEM: 'item'>, <Tags.LIST: 'list'>, <Tags.CATEGORICAL: 'categorical'>}, 'properties': {'domain': {'min': 1, 'max': 331, 'name': 'categories'}, 'value_count': {'min': 4}}, 'dtype': dtype('int64'), 'is_list': True, 'is_ragged': True}
]

(Pdb) output_schema
[
    {'name': 'output_1', 'tags': set(), 'properties': {}, 'dtype': dtype('float32'), 'is_list': True, 'is_ragged': False}
]

root@b3178cb48bcd:/# saved_model_cli show --tag_set serve --signature_def serving_default --dir /tmp/pytest-of-root/pytest-19/test_model_schemas_match_serv0/query_encoder/

The given SavedModel SignatureDef contains the following input(s):
  inputs['categories_1'] tensor_info:
      dtype: DT_INT64
      shape: (-1, 1)
      name: serving_default_categories_1:0
  inputs['categories_2'] tensor_info:
      dtype: DT_INT32
      shape: (-1, 1)
      name: serving_default_categories_2:0
  inputs['item_id_seq_1'] tensor_info:
      dtype: DT_INT64
      shape: (-1, 1)
      name: serving_default_item_id_seq_1:0
  inputs['item_id_seq_2'] tensor_info:
      dtype: DT_INT32
      shape: (-1, 1)
      name: serving_default_item_id_seq_2:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['output_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 48)
      name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict

Two (potential?) issues:

The input value counts indicate that at least four values should be supplied, but the serving signature shape (-1,1) only has a variable length across the batch dimension and seems to expect one value per example instead of four.
The shape of the output (-1, 48) is correctly tracked in the serving signature, but the output schema doesn't contain corresponding value counts. We know it's a list from the is_list flag and that it has a fixed shape from the is_ragged flag, but we don't know what that fixed shape is from the schema.

Expected behavior

Inputs with variable lengths have shape (-1,-1) in the serving signature
Inputs with fixed lengths have shape (-1,fixed_length) in the serving signature
Output schemas for fixed length outputs have value counts indicating the shape ({min=48, max=48})

Environment details

Merlin version: main branch of all Merlin repos
Platform: Docker (nvcr.io/nvstaging/merlin/merlin-ci-runner:latest)
Python version: 3.8.10
Tensorflow version (GPU?): tensorflow-gpu 2.9.2

Additional context

We may not yet have a good way to transfer information about fixed shapes and variable shapes in the Merlin schemas/file format. If that's the crux of the issue, then it may be time to design our own Protobuf schema that matches the way we think about e.g. shapes instead of trying to force it. If trying to write a bug fix turns into a significant headache, that's probably a sign that there's a deeper issue for us to resolve, and it's totally fine at that point to close this bug as "won't fix" and wait for changes in Core to make it easier.

The text was updated successfully, but these errors were encountered:

oliverholworthy · 2022-12-19T18:00:13Z

Opened a PR to handle the output schema case (adding value count) for fixed length outputs here #926

For the ragged list inputs. categories and item_id_seq. The value_count.min = 4 here I think is not strictly a limitation of the model. I think this is a side effect of feature attributes from the dataset schema flowing through to the model input schema. The dataset happens to have a minimum sequence length of 4. The model was provided sequence lengths of at least 4 during training. However, that doesn't necessarily mean the model can't provide a response for smaller sequences. Though I suppose it could in theory be constructed that way.

So maybe dropping the value_count.min from the input schema would resolve this issue?

oliverholworthy · 2022-12-19T18:02:28Z

Or does this issue also include the question of how should we represent the information about how a ragged feature categories should be split up (e.g. to categories_1 as a 2d array of scalars with values, an categories_2 as a 2d array of scalars with row lengths. And together represent the ragged sequence of values.

karlhigley · 2022-12-19T18:51:51Z

If the models are completely flexible about sequence length, I suppose the minimum would be 0? Honestly, the way we represent shapes is a bit confusing to me. This is one of the things I'd like to straighten out by proposing a standard to be adopted (or revised until it can be.)

rnyak · 2023-01-18T17:01:16Z

@sararb and @marcromeyn what does TF model expects for an input of a list of length 4? does it expect like [[1],[2],[3],[4]] or like [1, 2, 3, 4] ?

sararb · 2023-01-25T19:08:54Z

For a general context: In merlin frameworks, the same list feature can be represented in three different formats:

1.TF dense tensor: all rows have the same fixed length:

"item_id_seq":<tf.Tensor: shape=(3, 4), dtype=int32, numpy=
array([[ 2, 4, 5, 0],
       [23, 4, 0, 0],
       [1, 34, 54, 7]], dtype=int32)>

2.TF RaggedTensor: a tf structure for variable-length lists

"item_id_seq": <tf.RaggedTensor [[2, 4, 5], [23, 4], [1, 34, 54, 7]]> || shape=(3, None)

Tuple of values and lengths: the variable-length list feature is represented by two tensors of values and lengths

"item_id_seq": (
<tf.Tensor: shape=(9, 1), dtype=int32, numpy= array([ 2, 4, 5, 23, 4, 1, 34, 54, 7])
<tf.Tensor: shape=(3, 1), dtype=int32, numpy= array([ 3, 2, 4]), 
)

==> Merlin Models supports two inputs format: case 1 and case 3

In case 1, the padding is set in the data loader and dense tensors are passed to the model. The SavedModel signature is as follows:

The given SavedModel SignatureDef contains the following input(s):
  inputs['categories'] tensor_info:
      dtype: DT_INT64
      shape: (-1, 4)
      name: serving_default_categories:0
  inputs['item_id_seq'] tensor_info:
      dtype: DT_INT64
      shape: (-1, 4)
      name: serving_default_item_id_seq:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['output_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 48)
      name: StatefulPartitionedCall:0

In case 3 (related to the code shared in this bug ticket), the data loader passes the list inputs as a tuple of two tensors. The SavedModel signature is as follows:

The given SavedModel SignatureDef contains the following input(s):
  inputs['categories_1'] tensor_info:
      dtype: DT_INT64
      shape: (-1, 1)
      name: serving_default_categories_1:0
  inputs['categories_2'] tensor_info:
      dtype: DT_INT32
      shape: (-1, 1)
      name: serving_default_categories_2:0
  inputs['item_id_seq_1'] tensor_info:
      dtype: DT_INT64
      shape: (-1, 1)
      name: serving_default_item_id_seq_1:0
  inputs['item_id_seq_2'] tensor_info:
      dtype: DT_INT32
      shape: (-1, 1)
      name: serving_default_item_id_seq_2:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['output_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 48)
      name: StatefulPartitionedCall:0

You can see that in case 3, the variable length list features are represented with two inputs item_id_seq_1 and item_id_seq_2 to account for the tensors of values and lengths. So the expected signature for variable lengths is not one variable with shape (-1,-1) but rather two 1-d variables with shape (-1,1) and (-1,1).

Representing variable lengths list inputs with one tensor is case 2 (tf.RaggedTensor) But this error message suggests that TF does not generate SavedModel signature when model's inputs are tf.RaggedTensor(s)

oliverholworthy · 2023-04-11T17:11:31Z

Ragged input names changed from _1/_2 to __values/__offsets in Updates Models to support new dataloader format for lists (__values and __offsets in dict) and scalar (1D) #999 along with the shape of these from (-1, 1) to (-1)

karlhigley added bug Something isn't working status/needs-triage labels Dec 15, 2022

oliverholworthy mentioned this issue Dec 19, 2022

Add value_count property if output is fixed size tensor #926

Merged

rnyak added this to the Merlin 23.01 milestone Dec 19, 2022

karlhigley modified the milestones: Merlin 23.01, Merlin 23.02 Jan 18, 2023

viswa-nvidia mentioned this issue Jan 24, 2023

[RMP] Tensorflow support for session based recommendations integration in Merlin NVIDIA-Merlin/Merlin#433

Closed

37 tasks

viswa-nvidia assigned marcromeyn Jan 24, 2023

EvenOldridge assigned sararb Jan 24, 2023

viswa-nvidia modified the milestones: Merlin 23.02, Merlin 23.04 Apr 11, 2023

viswa-nvidia closed this as completed Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Model schemas and serving signatures don't align with each other #924

[BUG] Model schemas and serving signatures don't align with each other #924

karlhigley commented Dec 15, 2022

oliverholworthy commented Dec 19, 2022

oliverholworthy commented Dec 19, 2022

karlhigley commented Dec 19, 2022

rnyak commented Jan 18, 2023

sararb commented Jan 25, 2023 •

edited

Loading

oliverholworthy commented Apr 11, 2023

[BUG] Model schemas and serving signatures don't align with each other #924

[BUG] Model schemas and serving signatures don't align with each other #924

Comments

karlhigley commented Dec 15, 2022

Bug description

Steps/Code to reproduce bug

Expected behavior

Environment details

Additional context

oliverholworthy commented Dec 19, 2022

oliverholworthy commented Dec 19, 2022

karlhigley commented Dec 19, 2022

rnyak commented Jan 18, 2023

sararb commented Jan 25, 2023 • edited Loading

oliverholworthy commented Apr 11, 2023

sararb commented Jan 25, 2023 •

edited

Loading