You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I'm trying to use multiple datasets when fine tuning OpenCLIP and I want to use train-data-upsampling-factors while doing so however I ran into an issue when trying to get data["val"]
AssertionError: --train_data_upsampling_factors is only supported when sampling with replacement (with --dataset-resampled).
data["val"] = get_dataset_fn(args.val_data, args.dataset_type)(
File "/home/ubuntu/research_nfs/humza/open_clip/src/training/data.py", line 357, in get_wds_dataset
assert args.train_data_upsampling_factors is None,\
AssertionError: --train_data_upsampling_factors is only supported when sampling with replacement (with --dataset-resampled).
assert args.train_data_upsampling_factors is None,\
AssertionError: --train_data_upsampling_factors is only supported when sampling with replacement (with --dataset-resampled).
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 9027) of binary: /home/ubuntu/miniconda3/envs/rapids-22.06/bin/python3.9
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/rapids-22.06/bin/torchrun", line 8, in <module>
sys.exit(main())
File "/home/ubuntu/miniconda3/envs/rapids-22.06/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
return f(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/rapids-22.06/lib/python3.9/site-packages/torch/distributed/run.py", line 724, in main
run(args)
File "/home/ubuntu/miniconda3/envs/rapids-22.06/lib/python3.9/site-packages/torch/distributed/run.py", line 715, in run
elastic_launch(
File "/home/ubuntu/miniconda3/envs/rapids-22.06/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/ubuntu/miniconda3/envs/rapids-22.06/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
Specifically if you look at the get_wds_dataset function you'll see that we set resampled to automatically be false if we are looking at val rather than train, and if resampled is false then it will complain if we pass in args.train_data_upsampling_factors even if we don't use it. I think a better way to capture this would be to move the assert to something like
if is_train and args.train_data_upsampling_factors is not None:
assert resampled, "--train_data_upsampling_factors is only supported when sampling with replacement (with --dataset-resampled)."
This way the check properly happens that when we want to use train_data_upsampling_factors we check that its in the right case. Let me know if this makes sense or if I'm missing something. Assuming the former I'm happy to put up a PR to fix. Thanks!
The text was updated successfully, but these errors were encountered:
Hi! I'm trying to use multiple datasets when fine tuning OpenCLIP and I want to use train-data-upsampling-factors while doing so however I ran into an issue when trying to get
data["val"]
Specifically if you look at the get_wds_dataset function you'll see that we set resampled to automatically be false if we are looking at val rather than train, and if resampled is false then it will complain if we pass in
args.train_data_upsampling_factors
even if we don't use it. I think a better way to capture this would be to move the assert to something likeThis way the check properly happens that when we want to use train_data_upsampling_factors we check that its in the right case. Let me know if this makes sense or if I'm missing something. Assuming the former I'm happy to put up a PR to fix. Thanks!
The text was updated successfully, but these errors were encountered: