You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
Providing a generator in an instantiation of IterableDataset.from_generator() fails with TypeError: cannot pickle 'generator' object when the generator argument is supplied with a generator.
Code example
def line_generator(files: List[Path]):
if isinstance(files, str):
files = [Path(files)]
for file in files:
if isinstance(file, str):
file = Path(file)
yield from open(file,'r').readlines()
...
model_training_files = ['file1.txt', 'file2.txt', 'file3.txt']
train_dataset = IterableDataset.from_generator(generator=line_generator(model_training_files))
Traceback
Traceback (most recent call last):
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/contextlib.py", line 135, in exit
self.gen.throw(type, value, traceback)
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 691, in _no_cache_fields
yield
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 701, in dumps
dump(obj, file)
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 676, in dump
Pickler(file, recurse=True).dump(obj)
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/dill/_dill.py", line 394, in dump
StockPickler.dump(self, obj)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/pickle.py", line 487, in dump
self.save(obj)
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 666, in save
dill.Pickler.save(self, obj, save_persistent_id=save_persistent_id)
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/dill/_dill.py", line 1186, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/pickle.py", line 971, in save_dict
self._batch_setitems(obj.items())
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/pickle.py", line 997, in _batch_setitems
save(v)
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 666, in save
dill.Pickler.save(self, obj, save_persistent_id=save_persistent_id)
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/pickle.py", line 578, in save
rv = reduce(self.proto)
TypeError: cannot pickle 'generator' object
Steps to reproduce the bug
Create a set of text files to iterate over.
Create a generator that returns the lines in each file until all files are exhausted.
Instantiate the dataset over the generator by instantiating an IterableDataset.from_generator().
Wait for the explosion.
Expected behavior
I would expect that since the function claims to accept a generator that there would be no crash. Instead, I would expect the dataset to return all the lines in the files as queued up in the line_generator() function.
Environment info
datasets.version == '2.13.1'
Python 3.9.6
Platform: Darwin WE35261 22.5.0 Darwin Kernel Version 22.5.0: Thu Jun 8 22:22:22 PDT 2023; root:xnu-8796.121.3~7/RELEASE_X86_64 x86_64
The text was updated successfully, but these errors were encountered:
Describe the bug
Description
Providing a generator in an instantiation of IterableDataset.from_generator() fails with
TypeError: cannot pickle 'generator' object
when the generator argument is supplied with a generator.Code example
Traceback
Traceback (most recent call last):
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/contextlib.py", line 135, in exit
self.gen.throw(type, value, traceback)
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 691, in _no_cache_fields
yield
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 701, in dumps
dump(obj, file)
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 676, in dump
Pickler(file, recurse=True).dump(obj)
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/dill/_dill.py", line 394, in dump
StockPickler.dump(self, obj)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/pickle.py", line 487, in dump
self.save(obj)
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 666, in save
dill.Pickler.save(self, obj, save_persistent_id=save_persistent_id)
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/pickle.py", line 560, in save
f(self, obj) # Call unbound method with explicit self
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/dill/_dill.py", line 1186, in save_module_dict
StockPickler.save_dict(pickler, obj)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/pickle.py", line 971, in save_dict
self._batch_setitems(obj.items())
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/pickle.py", line 997, in _batch_setitems
save(v)
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 666, in save
dill.Pickler.save(self, obj, save_persistent_id=save_persistent_id)
File "/Users/d3p692/code/clem_bert/venv/lib/python3.9/site-packages/dill/_dill.py", line 388, in save
StockPickler.save(self, obj, save_persistent_id)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/pickle.py", line 578, in save
rv = reduce(self.proto)
TypeError: cannot pickle 'generator' object
Steps to reproduce the bug
Expected behavior
I would expect that since the function claims to accept a generator that there would be no crash. Instead, I would expect the dataset to return all the lines in the files as queued up in the
line_generator()
function.Environment info
datasets.version == '2.13.1'
Python 3.9.6
Platform: Darwin WE35261 22.5.0 Darwin Kernel Version 22.5.0: Thu Jun 8 22:22:22 PDT 2023; root:xnu-8796.121.3~7/RELEASE_X86_64 x86_64
The text was updated successfully, but these errors were encountered: