Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve type of shape dims as ints when re-loading schema from disk #281

Merged

Conversation

oliverholworthy
Copy link
Member

@oliverholworthy oliverholworthy commented Apr 13, 2023

Goal

Preserve type of shape dims ints re-loading schema from disk.

Motivation: This change was motivated by errors in Transformers4Rec when using the Merlin Core Schema instead of the legacy merlin_standard_lib version. A workaround for this could be applied to coerce dims to int in Transfrormers4Rec, however, it seems worth making sure that shapes always contain int values in the min/max attributes of each dimension (if specified).

Implementation details

Currently when you have a bounded ragged dimension, for example (1, 5) or (1, None) these will be reloaded with a float value when saving and reloading using the TensorflowMetadata schema serialization. This is because we use json to save the shape currently and this format doesn't distinguish between int and float types.

This PR adds a condition in the part of the tensorflow metadata deserialization that coerces float dims to int values.

And adds some additional validation in the Dimension to raise a ValueError if a shape is constructed with values in the dims. Shape((None, 2.0)) currently raises a ValueError, while Shape((None, (1.0, 2.0))) is currently valid. After this PR, Shape((None, (1.0, 2.0))) will also raise a ValueError

@oliverholworthy oliverholworthy added the chore Maintenance for the repository label Apr 13, 2023
@oliverholworthy oliverholworthy self-assigned this Apr 13, 2023
@oliverholworthy oliverholworthy changed the title Preserve type of shape dims ints re-loading schema from disk Preserve type of shape dims as ints when re-loading schema from disk Apr 13, 2023
@oliverholworthy oliverholworthy marked this pull request as ready for review April 13, 2023 08:53
@oliverholworthy oliverholworthy added this to the Merlin 23.04 milestone Apr 13, 2023
@karlhigley karlhigley merged commit 0f59f8d into NVIDIA-Merlin:main Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore Maintenance for the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants