Skip to content

Downloading MNIST is broken #541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Maratyszcza opened this issue Jul 5, 2018 · 11 comments
Closed

Downloading MNIST is broken #541

Maratyszcza opened this issue Jul 5, 2018 · 11 comments

Comments

@Maratyszcza
Copy link
Contributor

1fb0ccf broke downloading MNIST: downloaded MNIST files are zero-size and throw on decoding. The parent commit (0bbb1aa) works.

Environment:

  • Ubuntu 16.04 LTS
  • Python 2.7:

Repro:

from torchvision import datasets
mnist = datasets.MNIST("mnist", train=False, download=True)
@fmassa
Copy link
Member

fmassa commented Jul 5, 2018

It works for me on both Python 2.7 and Python 3.6.

I got a warning message on Python 2.7 though, but the files are there:

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to mnist/raw/train-images-idx3-ubyte.gz
 90%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊                          | 8.97M/9.91M [00:04<00:00, 4.86MB/s]
Exception KeyError: KeyError(<weakref at 0x7f8412dc4ba8; to 'tqdm' at 0x7f8412d61d90>,) in <bound method tqdm.__del__ of 9.92MB [00:04, 4.86MB/s]> ignored

cc @vishwakftw

@vishwakftw
Copy link
Contributor

@Maratyszcza @fmassa what are your versions of tqdm? I think there is a problem with 4.23.4. I downgraded to 4.19.9, and it works fine.

@fmassa
Copy link
Member

fmassa commented Jul 5, 2018

The version I used was '4.19.4', and worked fine

@vishwakftw
Copy link
Contributor

So, were there no warnings in 2.7? I can send in a patch accordingly.

@fmassa
Copy link
Member

fmassa commented Jul 5, 2018

Sorry for not being clear. On Python 2, I was using '4.23.4' and I got those warnings, but it still worked fine in the end

@vishwakftw
Copy link
Contributor

In Python 2.7, with '4.23.4' the warnings occur but the files are saved. I wanted to know if with '4.19', the warnings occurred at all, because they didn't occur for me.

@fmassa
Copy link
Member

fmassa commented Jul 5, 2018

4.19.4 didn't have any warnings on Python3.6

@fmassa
Copy link
Member

fmassa commented Jul 5, 2018

But I'm not sure the warnings are the problem here. @Maratyszcza was mentioning that the files didn't manage to download properly

@Maratyszcza
Copy link
Contributor Author

No, there were no warnings. Complete output dump:

Python 2.7.12 (default, Dec  4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from torchvision import datasets
>>> mnist = datasets.MNIST("mnist", train=False, download=True)
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to mnist/raw/train-images-idx3-ubyte.gz
0.00B [00:00, ?B/s]
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to mnist/raw/train-labels-idx1-ubyte.gz
0.00B [00:00, ?B/s]
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to mnist/raw/t10k-images-idx3-ubyte.gz
0.00B [00:00, ?B/s]
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to mnist/raw/t10k-labels-idx1-ubyte.gz
0.00B [00:00, ?B/s]
Processing...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/marat/tools/pytorch/vision/torchvision/datasets/mnist.py", line 57, in __init__
    self.download()
  File "/home/marat/tools/pytorch/vision/torchvision/datasets/mnist.py", line 137, in download
    read_image_file(os.path.join(self.root, self.raw_folder, 'train-images-idx3-ubyte')),
  File "/home/marat/tools/pytorch/vision/torchvision/datasets/mnist.py", line 302, in read_image_file
    assert get_int(data[:4]) == 2051
  File "/home/marat/tools/pytorch/vision/torchvision/datasets/mnist.py", line 287, in get_int
    return int(codecs.encode(b, 'hex'), 16)
ValueError: invalid literal for int() with base 16: ''

@Maratyszcza
Copy link
Contributor Author

tqdm was at 4.8.4, and no longer reproes after I updated to 4.23.4

@vishwakftw
Copy link
Contributor

@fmassa do you want me to send in a patch for downgraded version of tqdm ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants