diff --git a/MANIFEST.in b/MANIFEST.in new file mode 100644 index 00000000000..7682c1b60cd --- /dev/null +++ b/MANIFEST.in @@ -0,0 +1,4 @@ +include README.rst + +recursive-exclude * __pycache__ +recursive-exclude * *.py[co] diff --git a/README.md b/README.md deleted file mode 100644 index 2cc2899ab2d..00000000000 --- a/README.md +++ /dev/null @@ -1,253 +0,0 @@ -# torch-vision - -This repository consists of: - -- [vision.datasets](#datasets) : Data loaders for popular vision datasets -- [vision.models](#models) : Definitions for popular model architectures, such as AlexNet, VGG, and ResNet and pre-trained models. -- [vision.transforms](#transforms) : Common image transformations such as random crop, rotations etc. -- [vision.utils](#utils) : Useful stuff such as saving tensor (3 x H x W) as image to disk, given a mini-batch creating a grid of images, etc. - -# Installation - -Binaries: - -```bash -conda install torchvision -c https://conda.anaconda.org/t/6N-MsQ4WZ7jo/soumith -``` - -From Source: - -```bash -pip install -r requirements.txt -pip install . -``` - -# Datasets - -The following dataset loaders are available: - -- [COCO (Captioning and Detection)](#coco) -- [LSUN Classification](#lsun) -- [ImageFolder](#imagefolder) -- [Imagenet-12](#imagenet-12) -- [CIFAR10 and CIFAR100](#cifar) - -Datasets have the API: -- `__getitem__` -- `__len__` -They all subclass from `torch.utils.data.Dataset` -Hence, they can all be multi-threaded (python multiprocessing) using standard torch.utils.data.DataLoader. - -For example: - -`torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)` - -In the constructor, each dataset has a slightly different API as needed, but they all take the keyword args: - -- `transform` - a function that takes in an image and returns a transformed version - - common stuff like `ToTensor`, `RandomCrop`, etc. These can be composed together with `transforms.Compose` (see transforms section below) -- `target_transform` - a function that takes in the target and transforms it. For example, take in the caption string and return a tensor of word indices. - -### COCO - -This requires the [COCO API to be installed](https://github.com/pdollar/coco/tree/master/PythonAPI) - -#### Captions: - -`dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])` - -Example: - -```python -import torchvision.datasets as dset -import torchvision.transforms as transforms -cap = dset.CocoCaptions(root = 'dir where images are', - annFile = 'json annotation file', - transform=transforms.ToTensor()) - -print('Number of samples: ', len(cap)) -img, target = cap[3] # load 4th sample - -print("Image Size: ", img.size()) -print(target) -``` - -Output: - -``` -Number of samples: 82783 -Image Size: (3L, 427L, 640L) -[u'A plane emitting smoke stream flying over a mountain.', -u'A plane darts across a bright blue sky behind a mountain covered in snow', -u'A plane leaves a contrail above the snowy mountain top.', -u'A mountain that has a plane flying overheard in the distance.', -u'A mountain view with a plume of smoke in the background'] -``` - -#### Detection: -`dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])` - -### LSUN - -`dset.LSUN(db_path, classes='train', [transform, target_transform])` - -- db_path = root directory for the database files -- classes = - - 'train' - all categories, training set - - 'val' - all categories, validation set - - 'test' - all categories, test set - - ['bedroom_train', 'church_train', ...] : a list of categories to load - - -### CIFAR - -`dset.CIFAR10(root, train=True, transform=None, target_transform=None, download=False)` - -`dset.CIFAR100(root, train=True, transform=None, target_transform=None, download=False)` - -- `root` : root directory of dataset where there is folder `cifar-10-batches-py` -- `train` : `True` = Training set, `False` = Test set -- `download` : `True` = downloads the dataset from the internet and puts it in root directory. If dataset already downloaded, does not do anything. - -### ImageFolder - -A generic data loader where the images are arranged in this way: - -``` -root/dog/xxx.png -root/dog/xxy.png -root/dog/xxz.png - -root/cat/123.png -root/cat/nsdf3.png -root/cat/asd932_.png -``` - -`dset.ImageFolder(root="root folder path", [transform, target_transform])` - -It has the members: - -- `self.classes` - The class names as a list -- `self.class_to_idx` - Corresponding class indices -- `self.imgs` - The list of (image path, class-index) tuples - - -### Imagenet-12 - -This is simply implemented with an ImageFolder dataset. - -The data is preprocessed [as described here](https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.md#download-the-imagenet-dataset) - -[Here is an example](https://github.com/pytorch/examples/blob/27e2a46c1d1505324032b1d94fc6ce24d5b67e97/imagenet/main.py#L48-L62). - -# Models - -The models subpackage contains definitions for the following model architectures: - - - [AlexNet](https://arxiv.org/abs/1404.5997): AlexNet variant from the "One weird trick" paper. - - [VGG](https://arxiv.org/abs/1409.1556): VGG-11, VGG-13, VGG-16, VGG-19 (with and without batch normalization) - - [ResNet](https://arxiv.org/abs/1512.03385): ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152 - -You can construct a model with random weights by calling its constructor: - -```python -import torchvision.models as models -resnet18 = models.resnet18() -alexnet = models.alexnet() -``` - - We provide pre-trained models for the ResNet variants and AlexNet, using the - PyTorch [model zoo](http://pytorch.org/docs/model_zoo.html). These can - be constructed by passing `pretrained=True`: - - ```python - import torchvision.models as models - resnet18 = models.resnet18(pretrained=True) - alexnet = models.alexnet(pretrained=True) -``` - -# Transforms - -Transforms are common image transforms. -They can be chained together using `transforms.Compose` - -### `transforms.Compose` - -One can compose several transforms together. -For example. - -```python -transform = transforms.Compose([ - transforms.RandomSizedCrop(224), - transforms.RandomHorizontalFlip(), - transforms.ToTensor(), - transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ], - std = [ 0.229, 0.224, 0.225 ]), -]) -``` - -## Transforms on PIL.Image - -### `Scale(size, interpolation=Image.BILINEAR)` -Rescales the input PIL.Image to the given 'size'. -'size' will be the size of the smaller edge. - -For example, if height > width, then image will be -rescaled to (size * height / width, size) -- size: size of the smaller edge -- interpolation: Default: PIL.Image.BILINEAR - -### `CenterCrop(size)` - center-crops the image to the given size -Crops the given PIL.Image at the center to have a region of -the given size. size can be a tuple (target_height, target_width) -or an integer, in which case the target will be of a square shape (size, size) - -### `RandomCrop(size, padding=0)` -Crops the given PIL.Image at a random location to have a region of -the given size. size can be a tuple (target_height, target_width) -or an integer, in which case the target will be of a square shape (size, size) -If `padding` is non-zero, then the image is first zero-padded on each side with `padding` pixels. - -### `RandomHorizontalFlip()` -Randomly horizontally flips the given PIL.Image with a probability of 0.5 - -### `RandomSizedCrop(size, interpolation=Image.BILINEAR)` -Random crop the given PIL.Image to a random size of (0.08 to 1.0) of the original size -and and a random aspect ratio of 3/4 to 4/3 of the original aspect ratio - -This is popularly used to train the Inception networks -- size: size of the smaller edge -- interpolation: Default: PIL.Image.BILINEAR - -### `Pad(padding, fill=0)` -Pads the given image on each side with `padding` number of pixels, and the padding pixels are filled with -pixel value `fill`. -If a `5x5` image is padded with `padding=1` then it becomes `7x7` - -## Transforms on torch.*Tensor - -### `Normalize(mean, std)` -Given mean: (R, G, B) and std: (R, G, B), will normalize each channel of the torch.*Tensor, i.e. channel = (channel - mean) / std - -## Conversion Transforms -- `ToTensor()` - Converts a PIL.Image (RGB) or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] -- `ToPILImage()` - Converts a torch.*Tensor of range [0, 1] and shape C x H x W or numpy ndarray of dtype=uint8, range[0, 255] and shape H x W x C to a PIL.Image of range [0, 255] - -## Generic Transofrms -### `Lambda(lambda)` -Given a Python lambda, applies it to the input `img` and returns it. -For example: - -```python -transforms.Lambda(lambda x: x.add(10)) -``` - -# Utils - -### make_grid(tensor, nrow=8, padding=2) -Given a 4D mini-batch Tensor of shape (B x C x H x W), makes a grid of images - -### save_image(tensor, filename, nrow=8, padding=2) -Saves a given Tensor into an image file. - -If given a mini-batch tensor, will save the tensor as a grid of images. diff --git a/README.rst b/README.rst new file mode 100644 index 00000000000..ee1957fe6c3 --- /dev/null +++ b/README.rst @@ -0,0 +1,316 @@ +torch-vision +============ + +This repository consists of: + +- `vision.datasets <#datasets>`__ : Data loaders for popular vision + datasets +- `vision.models <#models>`__ : Definitions for popular model + architectures, such as AlexNet, VGG, and ResNet and pre-trained + models. +- `vision.transforms <#transforms>`__ : Common image transformations + such as random crop, rotations etc. +- `vision.utils <#utils>`__ : Useful stuff such as saving tensor (3 x H + x W) as image to disk, given a mini-batch creating a grid of images, + etc. + +Installation +============ + +Binaries: + +.. code:: bash + + conda install torchvision -c https://conda.anaconda.org/t/6N-MsQ4WZ7jo/soumith + +From Source: + +.. code:: bash + + pip install -r requirements.txt + pip install . + +Datasets +======== + +The following dataset loaders are available: + +- `COCO (Captioning and Detection) <#coco>`__ +- `LSUN Classification <#lsun>`__ +- `ImageFolder <#imagefolder>`__ +- `Imagenet-12 <#imagenet-12>`__ +- `CIFAR10 and CIFAR100 <#cifar>`__ + +Datasets have the API: - ``__getitem__`` - ``__len__`` They all subclass +from ``torch.utils.data.Dataset`` Hence, they can all be multi-threaded +(python multiprocessing) using standard torch.utils.data.DataLoader. + +For example: + +``torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)`` + +In the constructor, each dataset has a slightly different API as needed, +but they all take the keyword args: + +- ``transform`` - a function that takes in an image and returns a + transformed version +- common stuff like ``ToTensor``, ``RandomCrop``, etc. These can be + composed together with ``transforms.Compose`` (see transforms section + below) +- ``target_transform`` - a function that takes in the target and + transforms it. For example, take in the caption string and return a + tensor of word indices. + +COCO +~~~~ + +This requires the `COCO API to be +installed `__ + +Captions: +^^^^^^^^^ + +``dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])`` + +Example: + +.. code:: python + + import torchvision.datasets as dset + import torchvision.transforms as transforms + cap = dset.CocoCaptions(root = 'dir where images are', + annFile = 'json annotation file', + transform=transforms.ToTensor()) + + print('Number of samples: ', len(cap)) + img, target = cap[3] # load 4th sample + + print("Image Size: ", img.size()) + print(target) + +Output: + +:: + + Number of samples: 82783 + Image Size: (3L, 427L, 640L) + [u'A plane emitting smoke stream flying over a mountain.', + u'A plane darts across a bright blue sky behind a mountain covered in snow', + u'A plane leaves a contrail above the snowy mountain top.', + u'A mountain that has a plane flying overheard in the distance.', + u'A mountain view with a plume of smoke in the background'] + +Detection: +^^^^^^^^^^ + +``dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])`` + +LSUN +~~~~ + +``dset.LSUN(db_path, classes='train', [transform, target_transform])`` + +- db\_path = root directory for the database files +- classes = +- 'train' - all categories, training set +- 'val' - all categories, validation set +- 'test' - all categories, test set +- ['bedroom\_train', 'church\_train', ...] : a list of categories to + load + +CIFAR +~~~~~ + +``dset.CIFAR10(root, train=True, transform=None, target_transform=None, download=False)`` + +``dset.CIFAR100(root, train=True, transform=None, target_transform=None, download=False)`` + +- ``root`` : root directory of dataset where there is folder + ``cifar-10-batches-py`` +- ``train`` : ``True`` = Training set, ``False`` = Test set +- ``download`` : ``True`` = downloads the dataset from the internet and + puts it in root directory. If dataset already downloaded, does not do + anything. + +ImageFolder +~~~~~~~~~~~ + +A generic data loader where the images are arranged in this way: + +:: + + root/dog/xxx.png + root/dog/xxy.png + root/dog/xxz.png + + root/cat/123.png + root/cat/nsdf3.png + root/cat/asd932_.png + +``dset.ImageFolder(root="root folder path", [transform, target_transform])`` + +It has the members: + +- ``self.classes`` - The class names as a list +- ``self.class_to_idx`` - Corresponding class indices +- ``self.imgs`` - The list of (image path, class-index) tuples + +Imagenet-12 +~~~~~~~~~~~ + +This is simply implemented with an ImageFolder dataset. + +The data is preprocessed `as described +here `__ + +`Here is an +example `__. + +Models +====== + +The models subpackage contains definitions for the following model +architectures: + +- `AlexNet `__: AlexNet variant from + the "One weird trick" paper. +- `VGG `__: VGG-11, VGG-13, VGG-16, + VGG-19 (with and without batch normalization) +- `ResNet `__: ResNet-18, ResNet-34, + ResNet-50, ResNet-101, ResNet-152 + +You can construct a model with random weights by calling its +constructor: + +.. code:: python + + import torchvision.models as models + resnet18 = models.resnet18() + alexnet = models.alexnet() + +We provide pre-trained models for the ResNet variants and AlexNet, using +the PyTorch `model zoo `__. +These can be constructed by passing ``pretrained=True``: + +``python import torchvision.models as models resnet18 = models.resnet18(pretrained=True) alexnet = models.alexnet(pretrained=True)`` + +Transforms +========== + +Transforms are common image transforms. They can be chained together +using ``transforms.Compose`` + +``transforms.Compose`` +~~~~~~~~~~~~~~~~~~~~~~ + +One can compose several transforms together. For example. + +.. code:: python + + transform = transforms.Compose([ + transforms.RandomSizedCrop(224), + transforms.RandomHorizontalFlip(), + transforms.ToTensor(), + transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ], + std = [ 0.229, 0.224, 0.225 ]), + ]) + +Transforms on PIL.Image +~~~~~~~~~~~~~~~~~~~~~~~ + +``Scale(size, interpolation=Image.BILINEAR)`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Rescales the input PIL.Image to the given 'size'. 'size' will be the +size of the smaller edge. + +For example, if height > width, then image will be rescaled to (size \* +height / width, size) - size: size of the smaller edge - interpolation: +Default: PIL.Image.BILINEAR + +``CenterCrop(size)`` - center-crops the image to the given size +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Crops the given PIL.Image at the center to have a region of the given +size. size can be a tuple (target\_height, target\_width) or an integer, +in which case the target will be of a square shape (size, size) + +``RandomCrop(size, padding=0)`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Crops the given PIL.Image at a random location to have a region of the +given size. size can be a tuple (target\_height, target\_width) or an +integer, in which case the target will be of a square shape (size, size) +If ``padding`` is non-zero, then the image is first zero-padded on each +side with ``padding`` pixels. + +``RandomHorizontalFlip()`` +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Randomly horizontally flips the given PIL.Image with a probability of +0.5 + +``RandomSizedCrop(size, interpolation=Image.BILINEAR)`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Random crop the given PIL.Image to a random size of (0.08 to 1.0) of the +original size and and a random aspect ratio of 3/4 to 4/3 of the +original aspect ratio + +This is popularly used to train the Inception networks - size: size of +the smaller edge - interpolation: Default: PIL.Image.BILINEAR + +``Pad(padding, fill=0)`` +^^^^^^^^^^^^^^^^^^^^^^^^ + +Pads the given image on each side with ``padding`` number of pixels, and +the padding pixels are filled with pixel value ``fill``. If a ``5x5`` +image is padded with ``padding=1`` then it becomes ``7x7`` + +Transforms on torch.\*Tensor +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +``Normalize(mean, std)`` +^^^^^^^^^^^^^^^^^^^^^^^^ + +Given mean: (R, G, B) and std: (R, G, B), will normalize each channel of +the torch.\*Tensor, i.e. channel = (channel - mean) / std + +Conversion Transforms +~~~~~~~~~~~~~~~~~~~~~ + +- ``ToTensor()`` - Converts a PIL.Image (RGB) or numpy.ndarray (H x W x + C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) + in the range [0.0, 1.0] +- ``ToPILImage()`` - Converts a torch.\*Tensor of range [0, 1] and + shape C x H x W or numpy ndarray of dtype=uint8, range[0, 255] and + shape H x W x C to a PIL.Image of range [0, 255] + +Generic Transofrms +~~~~~~~~~~~~~~~~~~ + +``Lambda(lambda)`` +^^^^^^^^^^^^^^^^^^ + +Given a Python lambda, applies it to the input ``img`` and returns it. +For example: + +.. code:: python + + transforms.Lambda(lambda x: x.add(10)) + +Utils +===== + +make\_grid(tensor, nrow=8, padding=2) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Given a 4D mini-batch Tensor of shape (B x C x H x W), makes a grid of +images + +save\_image(tensor, filename, nrow=8, padding=2) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Saves a given Tensor into an image file. + +If given a mini-batch tensor, will save the tensor as a grid of images. diff --git a/setup.cfg b/setup.cfg new file mode 100644 index 00000000000..3c6e79cf31d --- /dev/null +++ b/setup.cfg @@ -0,0 +1,2 @@ +[bdist_wheel] +universal=1 diff --git a/setup.py b/setup.py index 269a1638b33..bd8bf649e41 100644 --- a/setup.py +++ b/setup.py @@ -4,12 +4,12 @@ import sys from setuptools import setup, find_packages -VERSION = '0.1.6' -long_description = '''torch-vision provides DataLoaders, Pre-trained models -and common transforms for torch for images and videos''' +readme = open('README.rst').read() + +VERSION = '0.1.6' -setup_info = dict( +setup( # Metadata name='torchvision', version=VERSION, @@ -17,7 +17,7 @@ author_email='soumith@pytorch.org', url='https://github.com/pytorch/vision', description='image and video datasets and models for torch deep learning', - long_description=long_description, + long_description=readme, license='BSD', # Package info @@ -25,5 +25,3 @@ zip_safe=True, ) - -setup(**setup_info)