You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In line 222 of the Robocasa branch of robomimic/utils/train_utils.py, upon dataset creation, the dataset kwargs are deepcoppied. Since the language embedding model is one of the dataset_kwargs, this makes a copy of the model as well. This has caused me to run into a cuda out-of-memory issue when you train on a large number of dataset files. For example in Libero if you have 90 datasets, there are 90 copies of the language embedding model in cuda memory.
I made a quick modification that fixed this problem:
for i in range(len(ds_weights)):
ds_kwargs_copy = deepcopy(ds_kwargs)
# Change so that we do not run out of cuda memory
if "lang_encoder" in ds_kwargs:
ds_kwargs_copy["lang_encoder"] = ds_kwargs["lang_encoder"]
keys = ["hdf5_path", "filter_by_attribute"]
for k in keys:
ds_kwargs_copy[k] = ds_kwargs[k][i]
ds_kwargs_copy["dataset_lang"] = ds_langs[i]
ds_list.append(ds_class(**ds_kwargs_copy))
Should I maybe make this a PR? It might be more efficient to pop the lang_encoder and then not copy it for every dataset (even though with the above fix it gets immediately deleted)
The text was updated successfully, but these errors were encountered:
In line 222 of the Robocasa branch of robomimic/utils/train_utils.py, upon dataset creation, the dataset kwargs are deepcoppied. Since the language embedding model is one of the dataset_kwargs, this makes a copy of the model as well. This has caused me to run into a cuda out-of-memory issue when you train on a large number of dataset files. For example in Libero if you have 90 datasets, there are 90 copies of the language embedding model in cuda memory.
I made a quick modification that fixed this problem:
Should I maybe make this a PR? It might be more efficient to pop the lang_encoder and then not copy it for every dataset (even though with the above fix it gets immediately deleted)
The text was updated successfully, but these errors were encountered: