You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using HashCheckerIterDataPipe for implementing a SST2 dataset within torchtext causes test failures for unittest_linux_py3.6 and for all python versions on windows platform.
Here is the CircleCI link for all the test failures: failures.
Here is the Dataset implementation where the HashCheckerIterDataPipe is used: code pointer
I believe there may be changes to how io.seek() works from python 3.6 to 3.7 that could be causing the failures in unittest_linux_py3.6 and unittest_windows_py3.6. I'm not really sure why the other windows unit tests are failing.
Error for unittest_linux_py3.6 and unittest_windows_py3.6
self = <torchdata.datapipes.iter.util.hashchecker.HashCheckerIterDataPipe object at 0x7f937f867ba8>
def __iter__(self):
for file_name, stream in self.source_datapipe:
if self.hash_type == "sha256":
hash_func = hashlib.sha256()
else:
hash_func = hashlib.md5()
while True:
# Read by chunk to avoid filling memory
chunk = stream.read(1024 ** 2)
if not chunk:
break
hash_func.update(chunk)
# TODO(VitalyFedyunin): this will not work (or work crappy for non-seekable steams like http)
if self.rewind:
> stream.seek(0)
E io.UnsupportedOperation: seek
env/lib/python3.6/site-packages/torchdata-0.1.0a0+7772406-py3.6.egg/torchdata/datapipes/iter/util/hashchecker.py:51: UnsupportedOperation
self = <torchdata.datapipes.iter.util.hashchecker.HashCheckerIterDataPipe object at 0x000001929F2B5548>
def __iter__(self):
for file_name, stream in self.source_datapipe:
if self.hash_type == "sha256":
hash_func = hashlib.sha256()
else:
hash_func = hashlib.md5()
while True:
# Read by chunk to avoid filling memory
chunk = stream.read(1024 ** 2)
if not chunk:
break
hash_func.update(chunk)
# TODO(VitalyFedyunin): this will not work (or work crappy for non-seekable steams like http)
if self.rewind:
stream.seek(0)
if file_name not in self.hash_dict:
> raise RuntimeError("Unspecified hash for file {}".format(file_name))
E RuntimeError: Unspecified hash for file C:\Users\circleci\.torchtext\cache\SST2\SST-2\train.tsv
env\lib\site-packages\torchdata-0.1.0a0+7772406-py3.7.egg\torchdata\datapipes\iter\util\hashchecker.py:54: RuntimeError
Thanks for notifying us. The behavior of seek is different across platform and python version. And, IMO, this probably only happens on the zipfile. tarfile should more unified behavior.
I will comment on your original PR for a work around
🐛 Bug
Using HashCheckerIterDataPipe for implementing a SST2 dataset within torchtext causes test failures for
unittest_linux_py3.6
and for all python versions on windows platform.HashCheckerIterDataPipe
is used: code pointerI believe there may be changes to how
io.seek()
works from python 3.6 to 3.7 that could be causing the failures inunittest_linux_py3.6
andunittest_windows_py3.6
. I'm not really sure why the other windows unit tests are failing.To Reproduce
Steps to reproduce the behavior:
Error for
unittest_linux_py3.6
andunittest_windows_py3.6
Link to Circle CI Error
Error for all other
unittest_windows_py*
Link to Circle CI Error
Expected behavior
Expect all tests to pass
Environment
Tests pass on devserver environment but fails on CircleCI.
The text was updated successfully, but these errors were encountered: