-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Dataloader Unittest - which broke by new DL structure #1782
Conversation
rerun tests |
2 similar comments
rerun tests |
rerun tests |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
len(x) for x in data[mh_name][idx * batch_size : idx * batch_size + n_samples] | ||
] | ||
assert (nnzs == np.array(lens)).all() | ||
array, offsets = X[f"{mh_name}__values"], X[f"{mh_name}__offsets"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using merlin.table.TensorTable
here would simplify this and also make this code work with the previous version of the dataloader too.
mh_col = TensorTable(X)[f"{mh_name}"]
values, offsets = mh_col.values.numpy(), mh_col.offsets.numpy()
nested_data_col = tf.RaggedTensor.from_row_lengths( | ||
batch[0]["data"][0][:, 0], tf.cast(batch[0]["data"][1][:, 0], tf.int32) | ||
nested_data_col = tf.RaggedTensor.from_row_splits( | ||
batch[0]["data__values"], tf.cast(batch[0]["data__offsets"], tf.int32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using TensorTable here would remove the need to specify the particular naming convention we have adopted for the dictionary keys containing values/offsets.
X = TensorTable(batch[0])
tf.RaggedTensor.from_row_splits((X["data"].values, X["data"].offsets, tf.int32)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it fine to leave the test as is ? I was not familiar with TensorTable
@@ -56,4 +56,9 @@ def test_example_03(): | |||
|
|||
""" | |||
) | |||
for cell in tb.cells: | |||
cell.source.replace( | |||
"device_memory_limit=device_limit", "# device_memory_limit=device_limit" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to disable these parameters? and is this related to the change to the dataloader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this is not related to the dataloader changes.
I think we changed the CI tests. We define a CUDA Cluster with multiple GPUs. We use RMM Pool to reserve GPU memory for this CUDA Cluster. It is more efficient (faster), if we reserve a fixed size of GPU memory. The code tries to reserve X% of total GPU memory (not free). If we run multiple tests in parallel, other tests block GPU memory and we cannot reserve it.
I deactive the parameters that we use the default behavior (not using RMM)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe worth adding something along these lines as a comment above these lines that may help our future selves recovering context of why these parameters need to be changed
No description provided.