Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError with RandAugmentIterDataPipe class during training #45

Open
lkxv333 opened this issue Mar 22, 2024 · 7 comments
Open

TypeError with RandAugmentIterDataPipe class during training #45

lkxv333 opened this issue Mar 22, 2024 · 7 comments

Comments

@lkxv333
Copy link

lkxv333 commented Mar 22, 2024

Hi,

I am trying to run training but facing a TypeError when RandAugmentIterDataPipe calls super().init()

class RandAugmentIterDataPipe(IterDataPipe):
def init(self, source_dp: IterDataPipe, dataset_config: DictConfig):
super().init()
self.source_dp = source_dp

below is the error message:

Original Traceback (most recent call last):
File "C:\Users\lkxv3\miniconda3\envs\rvt\lib\site-packages\torch\utils\data_utils\worker.py", line 252, in _worker_loop
fetcher = _DatasetKind.create_fetcher(dataset_kind, dataset, auto_collation, collate_fn, drop_last)
File "C:\Users\lkxv3\miniconda3\envs\rvt\lib\site-packages\torch\utils\data\dataloader.py", line 80, in create_fetcher
return _utils.fetch._IterableDatasetFetcher(dataset, auto_collation, collate_fn, drop_last)
File "C:\Users\lkxv3\miniconda3\envs\rvt\lib\site-packages\torch\utils\data_utils\fetch.py", line 21, in init
self.dataset_iter = iter(dataset)
File "C:\Users\lkxv3\miniconda3\envs\rvt\lib\site-packages\torch\utils\data\datapipes_hook_iterator.py", line 230, in wrap_iter
iter_ret = func(args, **kwargs)
File "C:\Users\lkxv3\miniconda3\envs\rvt\lib\site-packages\torch\utils\data\datapipes\datapipe.py", line 364, in iter
self._datapipe_iter = iter(self._datapipe)
File "C:\Users\lkxv3\miniconda3\envs\rvt\lib\site-packages\torch\utils\data\datapipes_hook_iterator.py", line 230, in wrap_iter
iter_ret = func(args, **kwargs)
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT\data\utils\stream_concat_datapipe.py", line 103, in iter
return iter(self._get_zipped_streams_with_worker_id())
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT\data\utils\stream_concat_datapipe.py", line 97, in _get_zipped_streams_with_worker_id
zipped_stream = self._get_zipped_streams(datapipe_list=self.datapipe_list, batch_size=self.batch_size)
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT\data\utils\stream_concat_datapipe.py", line 70, in _get_zipped_streams
streams = Zipper(
(Concater(
(self.augmentation_dp(x.to_iter_datapipe())
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT\data\utils\stream_concat_datapipe.py", line 70, in
streams = Zipper((Concater((self.augmentation_dp(x.to_iter_datapipe())
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT\data\utils\stream_concat_datapipe.py", line 70, in
streams = Zipper((Concater((self.augmentation_dp(x.to_iter_datapipe())
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT\data\genx_utils\sequence_for_streaming.py", line 190, in init
super().init()
TypeError: super(type, obj): obj must be an instance or subtype of type

I tried to resolve it byexplicitly calling
super(RandAugmentIterDataPipe, self).init() instead, but it results in the same error.

Could you help to identify what is wrong here?
Thank you.

@magehrig
Copy link
Contributor

With which Python version are you executing the script? And are all the package versions according to the installation instructions?

@lkxv333
Copy link
Author

lkxv333 commented Mar 22, 2024

I am using python 3.9.18 and I have followed the installation instructions closely.
I confirmed the version of torchdata to be 0.6.0 as instructed.

@lkxv333
Copy link
Author

lkxv333 commented Mar 22, 2024

Also, for more information, I am importing dill because without it I faced another error.

Traceback (most recent call last):
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT\train.py", line 140, in main
trainer.fit(model=module, ckpt_path=ckpt_path, datamodule=data_module)
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 603, in fit
call._call_and_handle_interrupt(
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\pytorch_lightning\trainer\call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 645, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1098, in _run
results = self._run_stage()
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1177, in _run_stage
self._run_train()
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1200, in _run_train
self.fit_loop.run()
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run
self.advance(*args, **kwargs)
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 267, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\pytorch_lightning\loops\loop.py", line 194, in run
self.on_run_start(*args, **kwargs)
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 161, in on_run_start
_ = iter(data_fetcher) # creates the iterator inside the fetcher
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 179, in iter
self._apply_patch()
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 120, in _apply_patch
apply_to_collections(self.loaders, self.loader_iters, (Iterator, DataLoader), _apply_patch_fn)
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\pytorch_lightning\utilities\fetching.py", line 156, in loader_iters
return self.dataloader_iter.loader_iters
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 555, in loader_iters
self._loader_iters = self.create_loader_iters(self.loaders)
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 595, in create_loader_iters
return apply_to_collection(loaders, Iterable, iter, wrong_dtype=(Sequence, Mapping))
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\lightning_utilities\core\apply_func.py", line 52, in apply_to_collection
return _apply_to_collection_slow(
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\lightning_utilities\core\apply_func.py", line 104, in _apply_to_collection_slow
v = _apply_to_collection_slow(
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\lightning_utilities\core\apply_func.py", line 96, in _apply_to_collection_slow
return function(data, *args, **kwargs)
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\pytorch_lightning\trainer\supporters.py", line 177, in iter
self._loader_iter = iter(self.loader)
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\torch\utils\data\dataloader.py", line 442, in iter
return self._get_iterator()
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\torch\utils\data\dataloader.py", line 388, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\torch\utils\data\dataloader.py", line 1043, in init
w.start()
File "C:\Python39\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Python39\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Python39\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Python39\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Python39\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\torch\utils\data\datapipes\datapipe.py", line 167, in reduce_ex
return super().reduce_ex(*args, **kwargs)
File "C:\Users\lkxv3\OneDrive\Desktop\CP5105\original\RVT.venv\lib\site-packages\torch\utils\data\datapipes\datapipe.py", line 333, in getstate
value = pickle.dumps(self._datapipe)
AttributeError: Can't pickle local object 'partialclass..NewCls'

caused from

class _DataPipeSerializationWrapper:
def init(self, datapipe):
self._datapipe = datapipe

def __getstate__(self):
    use_dill = False
    try:
        value = pickle.dumps(self._datapipe)
    except Exception:
        if HAS_DILL:
            value = dill.dumps(self._datapipe)
            use_dill = True
        else:
            raise
    return (value, use_dill)

Is using dill just passing on the problem? or is this a class inheritance problem?

@magehrig
Copy link
Contributor

magehrig commented Mar 22, 2024

Very strange. Two followup questions.
Which operating system are you using?
Are you using Jupiter notebook to execute the script?

@lkxv333
Copy link
Author

lkxv333 commented Mar 23, 2024

I am using Windows with python virtual environment set up as instructed and I am not using Jupiter notebook.

Below is the command that I used to run the training. It is mostly the same as instruction but I changed batch sizes.
python train.py model=rnndet dataset=gen1 dataset.path='C:\Users\lkxv3\OneDrive\Desktop\CP5105\RVT\gen1_original' wandb.project_name=RVT wandb.group_name=gen1 +experiment/gen1="base.yaml" hardware.gpus=0 batch_size.train=2 batch_size.eval=1 hardware.num_workers.train=2 hardware.num_workers.eval=1

@magehrig
Copy link
Contributor

I cannot reproduce this on my machine. Is it possible for you to create a minimal reproducible example? That would help a ton for debugging

@magehrig
Copy link
Contributor

magehrig commented Jul 8, 2024

The following fix should work: #54 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants