Stanford cars download url is broken - HTTP 404 #7545

IamMohitM · 2023-04-29T13:01:25Z

🐛 Describe the bug

The Stanford Cars dataset is not available on the url from source code

https://pytorch.org/vision/main/_modules/torchvision/datasets/stanford_cars.html

Reproduce error:

import torch
import torchvision

train = torchvision.datasets.StanfordCars(root=".", download=True)

I get the following error

HTTPError                                 Traceback (most recent call last)
Cell In[18], line 4
      1 import torch
      2 import torchvision
----> 4 train = torchvision.datasets.StanfordCars(root=".", download=True)

File [~/Projects/diffusion/env/lib/python3.10/site-packages/torchvision/datasets/stanford_cars.py:60](https://file+.vscode-resource.vscode-cdn.net/Users/mo/Projects/diffusion/diffusion/~/Projects/diffusion/env/lib/python3.10/site-packages/torchvision/datasets/stanford_cars.py:60), in StanfordCars.__init__(self, root, split, transform, target_transform, download)
     57     self._images_base_path = self._base_folder [/](https://file+.vscode-resource.vscode-cdn.net/) "cars_test"
     59 if download:
---> 60     self.download()
     62 if not self._check_exists():
     63     raise RuntimeError("Dataset not found. You can use download=True to download it")

File [~/Projects/diffusion/env/lib/python3.10/site-packages/torchvision/datasets/stanford_cars.py:94](https://file+.vscode-resource.vscode-cdn.net/Users/mo/Projects/diffusion/diffusion/~/Projects/diffusion/env/lib/python3.10/site-packages/torchvision/datasets/stanford_cars.py:94), in StanfordCars.download(self)
     91 if self._check_exists():
     92     return
---> 94 download_and_extract_archive(
     95     url="https://ai.stanford.edu/~jkrause/cars/car_devkit.tgz",
     96     download_root=str(self._base_folder),
     97     md5="c3b158d763b6e2245038c8ad08e45376",
     98 )
     99 if self._split == "train":
    100     download_and_extract_archive(
    101         url="https://ai.stanford.edu/~jkrause/car196/cars_train.tgz",
...
File [/usr/local/Cellar/python](https://file+.vscode-resource.vscode-cdn.net/usr/local/Cellar/python)@3.10[/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/urllib/request.py:643](https://file+.vscode-resource.vscode-cdn.net/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/urllib/request.py:643), in HTTPDefaultErrorHandler.http_error_default(self, req, fp, code, msg, hdrs)
    642 def http_error_default(self, req, fp, code, msg, hdrs):
--> 643     raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: HTTP Error 404: Not Found

Versions

PyTorch version: 2.0.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 13.2.1 (x86_64)
GCC version: Could not collect
Clang version: 14.0.0 (clang-1400.0.29.202)
CMake version: version 3.19.8
Libc version: N/A

Python version: 3.10.10 (main, Feb 16 2023, 02:58:25) [Clang 14.0.0 (clang-1400.0.29.202)] (64-bit runtime)
Python platform: macOS-13.2.1-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz

Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.3
[pip3] torch==2.0.0
[pip3] torchaudio==2.0.1
[pip3] torchvision==0.15.1
[conda] blas 1.0 mkl
[conda] mkl 2019.4 233
[conda] mkl-service 2.3.0 py38h9ed2024_0
[conda] mkl_fft 1.2.1 py38ha059aab_0
[conda] mkl_random 1.1.1 py38h959d312_0
[conda] numpy 1.20.1 pypi_0 pypi
[conda] numpy-base 1.19.2 py38hcfb5961_0
[conda] numpydoc 1.1.0 pyhd3eb1b0_1
[conda] pytorch3d 0.4.0 pypi_0 pypi
[conda] torch 1.9.0 pypi_0 pypi

cc @pmeier

The text was updated successfully, but these errors were encountered:

pmeier · 2023-05-01T06:49:47Z

It's not just the download, the whole website seems to be down: https://ai.stanford.edu/~jkrause/cars/car_dataset.html. Googling for this dataset reveals that this is still the "current" address. Meaning, maybe this is temporary and will come back up. We should monitor this and maybe reach out to the authors in case this persists.

Wondering why our download tests didn't catch this 🤔

ysmintor · 2023-05-20T14:27:59Z

I met the same problem. And I have not find a mirror link to download link. Hope them can fix it as soon as possible.

IamMohitM · 2023-05-22T17:11:49Z

For people looking for a quick solution:

You can download the dataset from kaggle and use the following class to create a torch Dataset.

class StanfordCars(torch.utils.data.Dataset):
    def __init__(self, root_path, transform = None):
        self.images = [os.path.join(root_path, file) for file in os.listdir(root_path)]
        self.transform = transform

    def __len__(self):
        return len(self.images)

    def __getitem__(self, index):
        image_file = self.images[index]
        image = Image.open(image_file).convert("RGB")
        if self.transform:
            image = self.transform(image)
        return image[None]

ricefryegg · 2023-06-04T05:57:35Z

Thank you @IamMohitM, we can use data = torchvision.datasets.StanfordCars(root="./", download=True) and with the following steps and avoid changes to the code

Download dataset bundle from kaggle, extract, and remove recursive directory structure (eg stanford_cars/cars_test/cars_test)
Download car_devkit.tgz, extract it in stanford_cars

Confirm dataset structures are as follows:

└── stanford_cars
    └── cars_train
        └── .jpg
    └── cars_test
        └── .jpg
    └── devkit
        ├── cars_meta.mat
        ├── cars_test_annos.mat
        ├── cars_train_annos.mat
        ├── eval_train.m
        ├── README.txt
        └── train_perfect_preds.txt

pgsld23333 · 2023-06-07T12:39:54Z

Thank you @IamMohitM, we can use data = torchvision.datasets.StanfordCars(root="./", download=True) and with the following steps and avoid changes to the code

* Download dataset bundle from [kaggle](https://www.kaggle.com/datasets/jessicali9530/stanford-cars-dataset), extract, and remove recursive directory structure (eg `stanford_cars/cars_test/cars_test`)

* Download [car_devkit.tgz](https://github.com/pytorch/vision/files/11644847/car_devkit.tgz), extract it in `stanford_cars`

Confirm dataset structures are as follows:

└── stanford_cars
    └── cars_train
        └── .jpg
    └── cars_test
        └── .jpg
    └── devkit
        ├── cars_meta.mat
        ├── cars_test_annos.mat
        ├── cars_train_annos.mat
        ├── eval_train.m
        ├── README.txt
        └── train_perfect_preds.txt

But the annotation of the test set is still not available.

iamchenxin · 2023-06-24T08:53:27Z

Thank you @IamMohitM, we can use data = torchvision.datasets.StanfordCars(root="./", download=True) and with the following steps and avoid changes to the code

* Download dataset bundle from [kaggle](https://www.kaggle.com/datasets/jessicali9530/stanford-cars-dataset), extract, and remove recursive directory structure (eg `stanford_cars/cars_test/cars_test`)

* Download [car_devkit.tgz](https://github.com/pytorch/vision/files/11644847/car_devkit.tgz), extract it in `stanford_cars`

Confirm dataset structures are as follows:

└── stanford_cars
    └── cars_train
        └── .jpg
    └── cars_test
        └── .jpg
    └── devkit
        ├── cars_meta.mat
        ├── cars_test_annos.mat
        ├── cars_train_annos.mat
        ├── eval_train.m
        ├── README.txt
        └── train_perfect_preds.txt

But the annotation of the test set is still not available.

A cars_test_annos_withlabels.mat should be placed into base folder。（something like “stanford_cars\cars_test_annos_withlabels.mat”）

thefirebanks · 2023-07-11T20:06:46Z

@iamchenxin Just to clarify what you mean:

The original file cars_test_annos.mat in the devkit/ folder you mentioned does NOT contain the annotated labels, so it's not enough to download the dataset from Kaggle and the devkit you sent. I found the cars_test_annos_withlabels.mat file in one of the examples from the Kaggle dataset:

https://www.kaggle.com/code/subhangaupadhaya/pytorch-stanfordcars-classification/input?select=cars_test_annos_withlabels+%281%29.mat

and I'm sure that other code examples also load this file as part of their input. So to summarize, we need:

The original Kaggle dataset: https://www.kaggle.com/datasets/jessicali9530/stanford-cars-dataset?datasetId=30084&sortBy=dateCreated&select=cars_test
The devkit: car_devkit.tgz
The cars_test_annos_withlabels.mat file: https://www.kaggle.com/code/subhangaupadhaya/pytorch-stanfordcars-classification/input?select=cars_test_annos_withlabels+%281%29.mat

The directory structure you provided earlier works well once we add the missing file!

└── stanford_cars
    └── cars_test_annos_withlabels.mat
    └── cars_train
        └── *.jpg
    └── cars_test
        └── .*jpg
    └── devkit
        ├── cars_meta.mat
        ├── cars_test_annos.mat
        ├── cars_train_annos.mat
        ├── eval_train.m
        ├── README.txt
        └── train_perfect_preds.txt

If the script/notebook we're writing the code in is at the same directory level as the stanford_cars/ folder, we can write:

data = torchvision.datasets.StanfordCars(root="./", download=True)

Hope that this helps! @pgsld23333 @IamMohitM let me know if I missed something.

jzhangCSER01 · 2023-10-11T04:35:39Z

@iamchenxin Just to clarify what you mean:

The original file cars_test_annos.mat in the devkit/ folder you mentioned does NOT contain the annotated labels, so it's not enough to download the dataset from Kaggle and the devkit you sent. I found the cars_test_annos_withlabels.mat file in one of the examples from the Kaggle dataset:

https://www.kaggle.com/code/subhangaupadhaya/pytorch-stanfordcars-classification/input?select=cars_test_annos_withlabels+%281%29.mat

and I'm sure that other code examples also load this file as part of their input. So to summarize, we need:

The original Kaggle dataset: https://www.kaggle.com/datasets/jessicali9530/stanford-cars-dataset?datasetId=30084&sortBy=dateCreated&select=cars_test

The devkit: car_devkit.tgz

The cars_test_annos_withlabels.mat file: https://www.kaggle.com/code/subhangaupadhaya/pytorch-stanfordcars-classification/input?select=cars_test_annos_withlabels+%281%29.mat

The directory structure you provided earlier works well once we add the missing file!
└── stanford_cars
    └── cars_test_annos_withlabels.mat
    └── cars_train
        └── *.jpg
    └── cars_test
        └── .*jpg
    └── devkit
        ├── cars_meta.mat
        ├── cars_test_annos.mat
        ├── cars_train_annos.mat
        ├── eval_train.m
        ├── README.txt
        └── train_perfect_preds.txt
If the script/notebook we're writing the code in is at the same directory level as the stanford_cars/ folder, we can write:
data = torchvision.datasets.StanfordCars(root="./", download=True)
Hope that this helps! @pgsld23333 @IamMohitM let me know if I missed something.

thanks a lot! it's really helpful

Coderx7 · 2024-03-06T11:02:54Z

@pmeier Its been almost a year and this hasn't been remedied yet. Whats the plan going forward? removing it or keeping it broken like this?

NicolasHug · 2024-03-12T16:43:10Z

Thanks all for the reports. Unfortunately the URL has been consistently broken for a while now, so we decided to disable the (broken) download functionality. Passing download=True will now result in an error, and point the users to @thefirebanks 's #7545 (comment), suggesting to download the dataset manually from Kaggle. Thank you @thefirebanks for the very helpful instructions.

These changes will be effective from torchvision 0.18, aimed to be released in April 2024. I'll close this issue, but #7545 (comment) is still relevant until further notice.

rygx · 2024-08-11T08:31:50Z

Looks like the existing datasets are either missing some files or not fully conforming to the torchvision's required structures. So a lot of upfront manual work is needed every time to get ready.

To tackle this issue, a new dataset is created on Kaggle that is compatible with the latest version of torchvision: https://www.kaggle.com/datasets/rickyyyyyyy/torchvision-stanford-cars

The dataset can be setup with

import kaggle
# you need to configure API key through https://www.kaggle.com/docs/api
kaggle.api.dataset_download_files('rickyyyyyyy/torchvision-stanford-cars', path=YOUR_DATA_PATH, unzip=True)

And then can be used by torchvision

import torchvision
data = torchvision.datasets.StanfordCars(root=YOUR_DATA_PATH, download=False)

o-laurent · 2024-11-05T14:32:06Z

Hi, and thanks, @rygx, for your help!

I could put this version of the dataset on Zenodo (giving credits to the original owners) to make it accessible directly from a URL without authentication or additional dependency so as to fix this problem completely. Let me know if that would be helpful.

Have a great day.

BKJackson · 2024-12-23T18:50:15Z

Loading the dataset was not straightforward for me, but I eventually got something to work.

I downloaded this version from kagglehub and saved them in my google drive:

import kagglehub
path = kagglehub.dataset_download("emanuelriquelmem/stanford-cars-pytorch")

This will plot the images:

import os
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

def show_images(dataset_path, num_samples=20, cols=4):
    """ Plots some samples from the dataset """

    # Get a list of all image files in the folder
    image_files = [f for f in os.listdir(dataset_path) if f.endswith('.jpg')]

    plt.figure(figsize=(15,15))

    for i in range(min(num_samples, len(image_files))):
        # Load the image using matplotlib.image
        img = mpimg.imread(os.path.join(dataset_path, image_files[i]))

        plt.subplot(int(num_samples/cols) + 1, cols, i + 1)
        plt.imshow(img)
        plt.axis('off')  # Turn off axis ticks and labels

    plt.show()

# Call the function to display the images
dataset_path = 'drive/MyDrive/stanford_cars/cars_test'
show_images(dataset_path)

This will load the train and test sets:

import os
import scipy.io as sio
from PIL import Image
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms

IMG_SIZE = 64
BATCH_SIZE = 128

class StanfordCarsDataset(Dataset):
    def __init__(self, root_dir, annotations_file=None, transform=None, test=False):
        self.root_dir = root_dir
        self.transform = transform
        self.image_paths = [os.path.join(root_dir, filename) for filename in os.listdir(root_dir) if filename.endswith('.jpg')]

        if annotations_file: 
            self.annotations = sio.loadmat(annotations_file)['annotations'][0]  # Load annotations
            if test:
                self.filename_to_label = {ann[4][0]: -1 for ann in self.annotations} #Assign -1 to all test images
            else:
                self.filename_to_label = {ann[5][0]: int(ann[4][0][0]) for ann in self.annotations}  # Create mapping
        else:
            self.filename_to_label = {}  # Empty dictionary if no annotations file is provided

    def __len__(self):
        return len(self.image_paths)

    def __getitem__(self, idx):
        image_path = self.image_paths[idx]
        image = Image.open(image_path).convert('RGB')  # Ensure RGB format

        # Get label if available
        filename = os.path.basename(image_path)
        label = self.filename_to_label.get(filename, -1)  # Use -1 as default label if not in the annotations

        if self.transform:
            image = self.transform(image)

        # Always return a label even if -1    
        return image, label

def load_transformed_dataset():
    data_transforms = [
        transforms.Resize((IMG_SIZE, IMG_SIZE)),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(), # Scales data into [0,1]
        transforms.Lambda(lambda t: (t * 2) - 1) # Scale between [-1, 1]
    ]
    data_transforms = transforms.Compose(data_transforms)



    train = StanfordCarsDataset(root_dir='drive/MyDrive/stanford_cars/cars_train', 
                               annotations_file='drive/MyDrive/stanford_cars/devkit/cars_train_annos.mat', 
                               transform=data_transforms, test=False)
    
    test = StanfordCarsDataset(root_dir='drive/MyDrive/stanford_cars/cars_test', 
                               annotations_file='drive/MyDrive/stanford_cars/devkit/cars_test_annos.mat', 
                               transform=data_transforms, test=True)

    return torch.utils.data.ConcatDataset([train, test])

data = load_transformed_dataset()
dataloader = DataLoader(data, batch_size=BATCH_SIZE, shuffle=True)

pmeier added module: datasets dependency issue labels May 1, 2023

pmeier self-assigned this May 1, 2023

This was referenced May 1, 2023

refactor download tests #7546

Merged

CircleCI to GitHub Actions tracker #7405

Closed

dhruvbird mentioned this issue May 29, 2023

Torchvision dataset mirrors #7637

Open

This was referenced Jun 16, 2023

Torchvision - Stanford Cars dataset no longer available #7670

Closed

HTTP Error 404: Not Found when downloading StanfordCars #7681

Closed

malfet mentioned this issue Jun 26, 2023

Standord Cars dataset pulls up 404 error when loading via torch vision. #7699

Closed

pmeier mentioned this issue Sep 4, 2023

Link to Stanford cars dataset is broken: 404 not found #7930

Closed

NicolasHug mentioned this issue Mar 12, 2024

Disable download for StanfordCars dataset #8309

Merged

NicolasHug closed this as completed Mar 13, 2024

efemeryds mentioned this issue Apr 27, 2024

Problems with Stanford Cars Dataset JiazuoYu/MoE-Adapters4CL#7

Closed

jhpohovey added a commit to jhpohovey/StanfordCars-Dataset that referenced this issue Apr 28, 2024

upload data with readme from pytorch/vision#7545

1da0868

enkeejunior1 mentioned this issue Jul 17, 2024

Cars dataset kyrie-23/linear_task_arithmetic#1

Closed

rygx mentioned this issue Sep 1, 2024

Update stanford-cars docs with more elaborate instructions for download=False #8620

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stanford cars download url is broken - HTTP 404 #7545

Stanford cars download url is broken - HTTP 404 #7545

IamMohitM commented Apr 29, 2023 •

edited by pytorch-bot bot

Loading

pmeier commented May 1, 2023

ysmintor commented May 20, 2023

IamMohitM commented May 22, 2023

ricefryegg commented Jun 4, 2023 •

edited

Loading

pgsld23333 commented Jun 7, 2023

iamchenxin commented Jun 24, 2023

thefirebanks commented Jul 11, 2023 •

edited

Loading

jzhangCSER01 commented Oct 11, 2023

Coderx7 commented Mar 6, 2024

NicolasHug commented Mar 12, 2024 •

edited

Loading

rygx commented Aug 11, 2024 •

edited

Loading

o-laurent commented Nov 5, 2024 •

edited

Loading

BKJackson commented Dec 23, 2024

Stanford cars download url is broken - HTTP 404 #7545

Stanford cars download url is broken - HTTP 404 #7545

Comments

IamMohitM commented Apr 29, 2023 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Reproduce error:

Versions

pmeier commented May 1, 2023

ysmintor commented May 20, 2023

IamMohitM commented May 22, 2023

ricefryegg commented Jun 4, 2023 • edited Loading

pgsld23333 commented Jun 7, 2023

iamchenxin commented Jun 24, 2023

thefirebanks commented Jul 11, 2023 • edited Loading

jzhangCSER01 commented Oct 11, 2023

Coderx7 commented Mar 6, 2024

NicolasHug commented Mar 12, 2024 • edited Loading

rygx commented Aug 11, 2024 • edited Loading

o-laurent commented Nov 5, 2024 • edited Loading

BKJackson commented Dec 23, 2024

IamMohitM commented Apr 29, 2023 •

edited by pytorch-bot bot

Loading

ricefryegg commented Jun 4, 2023 •

edited

Loading

thefirebanks commented Jul 11, 2023 •

edited

Loading

NicolasHug commented Mar 12, 2024 •

edited

Loading

rygx commented Aug 11, 2024 •

edited

Loading

o-laurent commented Nov 5, 2024 •

edited

Loading