Preserve type of shape dims as ints when re-loading schema from disk #281
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Goal
Preserve type of shape dims ints re-loading schema from disk.
Motivation: This change was motivated by errors in Transformers4Rec when using the Merlin Core Schema instead of the legacy merlin_standard_lib version. A workaround for this could be applied to coerce dims to int in Transfrormers4Rec, however, it seems worth making sure that shapes always contain int values in the min/max attributes of each dimension (if specified).
Implementation details
Currently when you have a bounded ragged dimension, for example
(1, 5)
or(1, None)
these will be reloaded with a float value when saving and reloading using theTensorflowMetadata
schema serialization. This is because we use json to save the shape currently and this format doesn't distinguish between int and float types.This PR adds a condition in the part of the tensorflow metadata deserialization that coerces float dims to int values.
And adds some additional validation in the
Dimension
to raise a ValueError if a shape is constructed with values in the dims.Shape((None, 2.0))
currently raises a ValueError, whileShape((None, (1.0, 2.0)))
is currently valid. After this PR,Shape((None, (1.0, 2.0)))
will also raise aValueError