Skip to content

Commit

Permalink
Update Dataloader for ColumnSchema API changes from Merlin Core
Browse files Browse the repository at this point in the history
Since `is_list` and `is_ragged` have become derived properties computed from the shape, it's no longer possible to directly set them from the constructor. They can be smuggled in through the properties, after which they'll be used to determine an appropriate shape that results in the same `is_list` and `is_ragged` values on the other side.

(This is a first step toward capturing and using more comprehensive shape information, with the goal of putting `Shape` in place while breaking as little as possible. There will be subsequent changes to directly capture more shape information, but this gets us part-way there.)

Depends on NVIDIA-Merlin/core#195
  • Loading branch information
karlhigley committed Jan 20, 2023
1 parent 7b10d37 commit 0af91f6
Showing 1 changed file with 6 additions and 5 deletions.
11 changes: 6 additions & 5 deletions merlin/dataloader/ops/embeddings/embedding_op.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,7 @@ def compute_output_schema(
name=self.embedding_name,
tags=[Tags.CONTINUOUS],
dtype=self._get_dtype(self.embeddings),
is_list=True,
is_ragged=False,
properties={"is_list": True, "is_ragged":False}
)
)

Expand Down Expand Up @@ -191,9 +190,11 @@ def compute_output_schema(
name=self.embedding_name,
tags=[Tags.CONTINUOUS],
dtype=self.embeddings.dtype,
is_list=True,
is_ragged=False,
properties={"value_count": {"min": embedding_dim, "max": embedding_dim}},
properties={
"is_list": True,
"is_ragged": False,
"value_count": {"min": embedding_dim, "max": embedding_dim}
},
)
)

Expand Down

0 comments on commit 0af91f6

Please sign in to comment.