Fix Dataloader Unittest - which broke by new DL structure #1782

bschifferer · 2023-03-15T19:00:28Z

No description provided.

bschifferer · 2023-03-15T21:05:58Z

rerun tests

bschifferer · 2023-03-15T21:10:36Z

rerun tests

bschifferer · 2023-03-16T08:09:24Z

rerun tests

review-notebook-app · 2023-03-16T10:22:43Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

oliverholworthy · 2023-03-16T10:45:40Z

tests/unit/loader/test_tf_dataloader.py

-                    len(x) for x in data[mh_name][idx * batch_size : idx * batch_size + n_samples]
-                ]
-                assert (nnzs == np.array(lens)).all()
+            array, offsets = X[f"{mh_name}__values"], X[f"{mh_name}__offsets"]


Using merlin.table.TensorTable here would simplify this and also make this code work with the previous version of the dataloader too.

mh_col = TensorTable(X)[f"{mh_name}"] values, offsets = mh_col.values.numpy(), mh_col.offsets.numpy()

oliverholworthy · 2023-03-16T10:54:08Z

tests/unit/loader/test_tf_dataloader.py

-    nested_data_col = tf.RaggedTensor.from_row_lengths(
-        batch[0]["data"][0][:, 0], tf.cast(batch[0]["data"][1][:, 0], tf.int32)
+    nested_data_col = tf.RaggedTensor.from_row_splits(
+        batch[0]["data__values"], tf.cast(batch[0]["data__offsets"], tf.int32)


Using TensorTable here would remove the need to specify the particular naming convention we have adopted for the dictionary keys containing values/offsets.

X = TensorTable(batch[0]) tf.RaggedTensor.from_row_splits((X["data"].values, X["data"].offsets, tf.int32)

Is it fine to leave the test as is ? I was not familiar with TensorTable

oliverholworthy · 2023-03-16T11:31:10Z

tests/unit/examples/test_03-Running-on-multiple-GPUs-or-on-CPU.py

@@ -56,4 +56,9 @@ def test_example_03():

            """
        )
+        for cell in tb.cells:
+            cell.source.replace(
+                "device_memory_limit=device_limit", "# device_memory_limit=device_limit"


why do we need to disable these parameters? and is this related to the change to the dataloader?

No this is not related to the dataloader changes.

I think we changed the CI tests. We define a CUDA Cluster with multiple GPUs. We use RMM Pool to reserve GPU memory for this CUDA Cluster. It is more efficient (faster), if we reserve a fixed size of GPU memory. The code tries to reserve X% of total GPU memory (not free). If we run multiple tests in parallel, other tests block GPU memory and we cannot reserve it.

I deactive the parameters that we use the default behavior (not using RMM)

maybe worth adding something along these lines as a comment above these lines that may help our future selves recovering context of why these parameters need to be changed

bschifferer added 2 commits March 15, 2023 18:42

fix tf

0807a5f

fix torch dl

6d1bf37

bschifferer requested review from gabrielspmoreira and karlhigley March 15, 2023 19:00

bschifferer added the ci label Mar 15, 2023

karlhigley approved these changes Mar 15, 2023

View reviewed changes

remove tf horovod multigpu

425d455

change example

1e4fe5d

oliverholworthy reviewed Mar 16, 2023

View reviewed changes

fix example unittest

75f59a0

oliverholworthy reviewed Mar 16, 2023

View reviewed changes

oliverholworthy approved these changes Mar 16, 2023

View reviewed changes

bschifferer merged commit fdb8715 into NVIDIA-Merlin:main Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Dataloader Unittest - which broke by new DL structure #1782

Fix Dataloader Unittest - which broke by new DL structure #1782

bschifferer commented Mar 15, 2023

bschifferer commented Mar 15, 2023

bschifferer commented Mar 15, 2023

bschifferer commented Mar 16, 2023

review-notebook-app bot commented Mar 16, 2023

oliverholworthy Mar 16, 2023

oliverholworthy Mar 16, 2023

bschifferer Mar 16, 2023

oliverholworthy Mar 16, 2023

bschifferer Mar 16, 2023

oliverholworthy Mar 16, 2023

Fix Dataloader Unittest - which broke by new DL structure #1782

Fix Dataloader Unittest - which broke by new DL structure #1782

Conversation

bschifferer commented Mar 15, 2023

bschifferer commented Mar 15, 2023

bschifferer commented Mar 15, 2023

bschifferer commented Mar 16, 2023

review-notebook-app bot commented Mar 16, 2023

oliverholworthy Mar 16, 2023

Choose a reason for hiding this comment

oliverholworthy Mar 16, 2023

Choose a reason for hiding this comment

bschifferer Mar 16, 2023

Choose a reason for hiding this comment

oliverholworthy Mar 16, 2023

Choose a reason for hiding this comment

bschifferer Mar 16, 2023

Choose a reason for hiding this comment

oliverholworthy Mar 16, 2023

Choose a reason for hiding this comment