diff --git a/MANIFEST.in b/MANIFEST.in
new file mode 100644
index 00000000000..7682c1b60cd
--- /dev/null
+++ b/MANIFEST.in
@@ -0,0 +1,4 @@
+include README.rst
+
+recursive-exclude * __pycache__
+recursive-exclude * *.py[co]
diff --git a/README.md b/README.md
deleted file mode 100644
index 2cc2899ab2d..00000000000
--- a/README.md
+++ /dev/null
@@ -1,253 +0,0 @@
-# torch-vision
-
-This repository consists of:
-
-- [vision.datasets](#datasets) : Data loaders for popular vision datasets
-- [vision.models](#models) : Definitions for popular model architectures, such as AlexNet, VGG, and ResNet and pre-trained models.
-- [vision.transforms](#transforms) : Common image transformations such as random crop, rotations etc.
-- [vision.utils](#utils) : Useful stuff such as saving tensor (3 x H x W) as image to disk, given a mini-batch creating a grid of images, etc.
-
-# Installation
-
-Binaries:
-
-```bash
-conda install torchvision -c https://conda.anaconda.org/t/6N-MsQ4WZ7jo/soumith
-```
-
-From Source:
-
-```bash
-pip install -r requirements.txt
-pip install .
-```
-
-# Datasets
-
-The following dataset loaders are available:
-
-- [COCO (Captioning and Detection)](#coco)
-- [LSUN Classification](#lsun)
-- [ImageFolder](#imagefolder)
-- [Imagenet-12](#imagenet-12)
-- [CIFAR10 and CIFAR100](#cifar)
-
-Datasets have the API:
-- `__getitem__`
-- `__len__`
-They all subclass from `torch.utils.data.Dataset`
-Hence, they can all be multi-threaded (python multiprocessing) using standard torch.utils.data.DataLoader.
-
-For example:
-
-`torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)`
-
-In the constructor, each dataset has a slightly different API as needed, but they all take the keyword args:
-
-- `transform` - a function that takes in an image and returns a transformed version
-  - common stuff like `ToTensor`, `RandomCrop`, etc. These can be composed together with `transforms.Compose` (see transforms section below)
-- `target_transform` - a function that takes in the target and transforms it. For example, take in the caption string and return a tensor of word indices.
-
-### COCO
-
-This requires the [COCO API to be installed](https://github.com/pdollar/coco/tree/master/PythonAPI)
-
-#### Captions:
-
-`dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])`
-
-Example:
-
-```python
-import torchvision.datasets as dset
-import torchvision.transforms as transforms
-cap = dset.CocoCaptions(root = 'dir where images are',
-                        annFile = 'json annotation file',
-                        transform=transforms.ToTensor())
-
-print('Number of samples: ', len(cap))
-img, target = cap[3] # load 4th sample
-
-print("Image Size: ", img.size())
-print(target)
-```
-
-Output:
-
-```
-Number of samples: 82783
-Image Size: (3L, 427L, 640L)
-[u'A plane emitting smoke stream flying over a mountain.',
-u'A plane darts across a bright blue sky behind a mountain covered in snow',
-u'A plane leaves a contrail above the snowy mountain top.',
-u'A mountain that has a plane flying overheard in the distance.',
-u'A mountain view with a plume of smoke in the background']
-```
-
-#### Detection:
-`dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])`
-
-### LSUN
-
-`dset.LSUN(db_path, classes='train', [transform, target_transform])`
-
-- db_path = root directory for the database files
-- classes =
-  - 'train' - all categories, training set
-  - 'val' - all categories, validation set
-  - 'test' - all categories, test set
-  - ['bedroom_train', 'church_train', ...] : a list of categories to load
-
-
-### CIFAR
-
-`dset.CIFAR10(root, train=True, transform=None, target_transform=None, download=False)`
-
-`dset.CIFAR100(root, train=True, transform=None, target_transform=None, download=False)`
-
-- `root` : root directory of dataset where there is folder `cifar-10-batches-py`
-- `train` : `True` = Training set, `False` = Test set
-- `download` : `True` = downloads the dataset from the internet and puts it in root directory. If dataset already downloaded, does not do anything.
-
-### ImageFolder
-
-A generic data loader where the images are arranged in this way:
-
-```
-root/dog/xxx.png
-root/dog/xxy.png
-root/dog/xxz.png
-
-root/cat/123.png
-root/cat/nsdf3.png
-root/cat/asd932_.png
-```
-
-`dset.ImageFolder(root="root folder path", [transform, target_transform])`
-
-It has the members:
-
-- `self.classes` - The class names as a list
-- `self.class_to_idx` - Corresponding class indices
-- `self.imgs` - The list of (image path, class-index) tuples
-
-
-### Imagenet-12
-
-This is simply implemented with an ImageFolder dataset.
-
-The data is preprocessed [as described here](https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.md#download-the-imagenet-dataset)
-
-[Here is an example](https://github.com/pytorch/examples/blob/27e2a46c1d1505324032b1d94fc6ce24d5b67e97/imagenet/main.py#L48-L62).
-
-# Models
-
-The models subpackage contains definitions for the following model architectures:
-
- - [AlexNet](https://arxiv.org/abs/1404.5997): AlexNet variant from the "One weird trick" paper.
- - [VGG](https://arxiv.org/abs/1409.1556): VGG-11, VGG-13, VGG-16, VGG-19 (with and without batch normalization)
- - [ResNet](https://arxiv.org/abs/1512.03385): ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152
-
-You can construct a model with random weights by calling its constructor:
-
-```python
-import torchvision.models as models
-resnet18 = models.resnet18()
-alexnet = models.alexnet()
-```
-
- We provide pre-trained models for the ResNet variants and AlexNet, using the
- PyTorch [model zoo](http://pytorch.org/docs/model_zoo.html). These can
- be constructed by passing `pretrained=True`:
-
- ```python
- import torchvision.models as models
- resnet18 = models.resnet18(pretrained=True)
- alexnet = models.alexnet(pretrained=True)
-```
-
-# Transforms
-
-Transforms are common image transforms.
-They can be chained together using `transforms.Compose`
-
-### `transforms.Compose`
-
-One can compose several transforms together.
-For example.
-
-```python
-transform = transforms.Compose([
-    transforms.RandomSizedCrop(224),
-    transforms.RandomHorizontalFlip(),
-    transforms.ToTensor(),
-    transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ],
-                          std = [ 0.229, 0.224, 0.225 ]),
-])
-```
-
-## Transforms on PIL.Image
-
-### `Scale(size, interpolation=Image.BILINEAR)`
-Rescales the input PIL.Image to the given 'size'.
-'size' will be the size of the smaller edge.
-
-For example, if height > width, then image will be
-rescaled to (size * height / width, size)
-- size: size of the smaller edge
-- interpolation: Default: PIL.Image.BILINEAR
-
-### `CenterCrop(size)` - center-crops the image to the given size
-Crops the given PIL.Image at the center to have a region of
-the given size. size can be a tuple (target_height, target_width)
-or an integer, in which case the target will be of a square shape (size, size)
-
-### `RandomCrop(size, padding=0)`
-Crops the given PIL.Image at a random location to have a region of
-the given size. size can be a tuple (target_height, target_width)
-or an integer, in which case the target will be of a square shape (size, size)
-If `padding` is non-zero, then the image is first zero-padded on each side with `padding` pixels.
-
-### `RandomHorizontalFlip()`
-Randomly horizontally flips the given PIL.Image with a probability of 0.5
-
-### `RandomSizedCrop(size, interpolation=Image.BILINEAR)`
-Random crop the given PIL.Image to a random size of (0.08 to 1.0) of the original size
-and and a random aspect ratio of 3/4 to 4/3 of the original aspect ratio
-
-This is popularly used to train the Inception networks
-- size: size of the smaller edge
-- interpolation: Default: PIL.Image.BILINEAR
-
-### `Pad(padding, fill=0)`
-Pads the given image on each side with `padding` number of pixels, and the padding pixels are filled with
-pixel value `fill`.
-If a `5x5` image is padded with `padding=1` then it becomes `7x7`
-
-## Transforms on torch.*Tensor
-
-### `Normalize(mean, std)`
-Given mean: (R, G, B) and std: (R, G, B), will normalize each channel of the torch.*Tensor, i.e. channel = (channel - mean) / std
-
-## Conversion Transforms
-- `ToTensor()` - Converts a PIL.Image (RGB) or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
-- `ToPILImage()` - Converts a torch.*Tensor of range [0, 1] and shape C x H x W or numpy ndarray of dtype=uint8, range[0, 255] and shape H x W x C to a PIL.Image of range [0, 255]
-
-## Generic Transofrms
-### `Lambda(lambda)`
-Given a Python lambda, applies it to the input `img` and returns it.
-For example:
-
-```python
-transforms.Lambda(lambda x: x.add(10))
-```
-
-# Utils
-
-### make_grid(tensor, nrow=8, padding=2)
-Given a 4D mini-batch Tensor of shape (B x C x H x W), makes a grid of images
-
-### save_image(tensor, filename, nrow=8, padding=2)
-Saves a given Tensor into an image file.
-
-If given a mini-batch tensor, will save the tensor as a grid of images.
diff --git a/README.rst b/README.rst
new file mode 100644
index 00000000000..ee1957fe6c3
--- /dev/null
+++ b/README.rst
@@ -0,0 +1,316 @@
+torch-vision
+============
+
+This repository consists of:
+
+-  `vision.datasets <#datasets>`__ : Data loaders for popular vision
+   datasets
+-  `vision.models <#models>`__ : Definitions for popular model
+   architectures, such as AlexNet, VGG, and ResNet and pre-trained
+   models.
+-  `vision.transforms <#transforms>`__ : Common image transformations
+   such as random crop, rotations etc.
+-  `vision.utils <#utils>`__ : Useful stuff such as saving tensor (3 x H
+   x W) as image to disk, given a mini-batch creating a grid of images,
+   etc.
+
+Installation
+============
+
+Binaries:
+
+.. code:: bash
+
+    conda install torchvision -c https://conda.anaconda.org/t/6N-MsQ4WZ7jo/soumith
+
+From Source:
+
+.. code:: bash
+
+    pip install -r requirements.txt
+    pip install .
+
+Datasets
+========
+
+The following dataset loaders are available:
+
+-  `COCO (Captioning and Detection) <#coco>`__
+-  `LSUN Classification <#lsun>`__
+-  `ImageFolder <#imagefolder>`__
+-  `Imagenet-12 <#imagenet-12>`__
+-  `CIFAR10 and CIFAR100 <#cifar>`__
+
+Datasets have the API: - ``__getitem__`` - ``__len__`` They all subclass
+from ``torch.utils.data.Dataset`` Hence, they can all be multi-threaded
+(python multiprocessing) using standard torch.utils.data.DataLoader.
+
+For example:
+
+``torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)``
+
+In the constructor, each dataset has a slightly different API as needed,
+but they all take the keyword args:
+
+-  ``transform`` - a function that takes in an image and returns a
+   transformed version
+-  common stuff like ``ToTensor``, ``RandomCrop``, etc. These can be
+   composed together with ``transforms.Compose`` (see transforms section
+   below)
+-  ``target_transform`` - a function that takes in the target and
+   transforms it. For example, take in the caption string and return a
+   tensor of word indices.
+
+COCO
+~~~~
+
+This requires the `COCO API to be
+installed <https://github.com/pdollar/coco/tree/master/PythonAPI>`__
+
+Captions:
+^^^^^^^^^
+
+``dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])``
+
+Example:
+
+.. code:: python
+
+    import torchvision.datasets as dset
+    import torchvision.transforms as transforms
+    cap = dset.CocoCaptions(root = 'dir where images are',
+                            annFile = 'json annotation file',
+                            transform=transforms.ToTensor())
+
+    print('Number of samples: ', len(cap))
+    img, target = cap[3] # load 4th sample
+
+    print("Image Size: ", img.size())
+    print(target)
+
+Output:
+
+::
+
+    Number of samples: 82783
+    Image Size: (3L, 427L, 640L)
+    [u'A plane emitting smoke stream flying over a mountain.',
+    u'A plane darts across a bright blue sky behind a mountain covered in snow',
+    u'A plane leaves a contrail above the snowy mountain top.',
+    u'A mountain that has a plane flying overheard in the distance.',
+    u'A mountain view with a plume of smoke in the background']
+
+Detection:
+^^^^^^^^^^
+
+``dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])``
+
+LSUN
+~~~~
+
+``dset.LSUN(db_path, classes='train', [transform, target_transform])``
+
+-  db\_path = root directory for the database files
+-  classes =
+-  'train' - all categories, training set
+-  'val' - all categories, validation set
+-  'test' - all categories, test set
+-  ['bedroom\_train', 'church\_train', ...] : a list of categories to
+   load
+
+CIFAR
+~~~~~
+
+``dset.CIFAR10(root, train=True, transform=None, target_transform=None, download=False)``
+
+``dset.CIFAR100(root, train=True, transform=None, target_transform=None, download=False)``
+
+-  ``root`` : root directory of dataset where there is folder
+   ``cifar-10-batches-py``
+-  ``train`` : ``True`` = Training set, ``False`` = Test set
+-  ``download`` : ``True`` = downloads the dataset from the internet and
+   puts it in root directory. If dataset already downloaded, does not do
+   anything.
+
+ImageFolder
+~~~~~~~~~~~
+
+A generic data loader where the images are arranged in this way:
+
+::
+
+    root/dog/xxx.png
+    root/dog/xxy.png
+    root/dog/xxz.png
+
+    root/cat/123.png
+    root/cat/nsdf3.png
+    root/cat/asd932_.png
+
+``dset.ImageFolder(root="root folder path", [transform, target_transform])``
+
+It has the members:
+
+-  ``self.classes`` - The class names as a list
+-  ``self.class_to_idx`` - Corresponding class indices
+-  ``self.imgs`` - The list of (image path, class-index) tuples
+
+Imagenet-12
+~~~~~~~~~~~
+
+This is simply implemented with an ImageFolder dataset.
+
+The data is preprocessed `as described
+here <https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.md#download-the-imagenet-dataset>`__
+
+`Here is an
+example <https://github.com/pytorch/examples/blob/27e2a46c1d1505324032b1d94fc6ce24d5b67e97/imagenet/main.py#L48-L62>`__.
+
+Models
+======
+
+The models subpackage contains definitions for the following model
+architectures:
+
+-  `AlexNet <https://arxiv.org/abs/1404.5997>`__: AlexNet variant from
+   the "One weird trick" paper.
+-  `VGG <https://arxiv.org/abs/1409.1556>`__: VGG-11, VGG-13, VGG-16,
+   VGG-19 (with and without batch normalization)
+-  `ResNet <https://arxiv.org/abs/1512.03385>`__: ResNet-18, ResNet-34,
+   ResNet-50, ResNet-101, ResNet-152
+
+You can construct a model with random weights by calling its
+constructor:
+
+.. code:: python
+
+    import torchvision.models as models
+    resnet18 = models.resnet18()
+    alexnet = models.alexnet()
+
+We provide pre-trained models for the ResNet variants and AlexNet, using
+the PyTorch `model zoo <http://pytorch.org/docs/model_zoo.html>`__.
+These can be constructed by passing ``pretrained=True``:
+
+``python  import torchvision.models as models  resnet18 = models.resnet18(pretrained=True)  alexnet = models.alexnet(pretrained=True)``
+
+Transforms
+==========
+
+Transforms are common image transforms. They can be chained together
+using ``transforms.Compose``
+
+``transforms.Compose``
+~~~~~~~~~~~~~~~~~~~~~~
+
+One can compose several transforms together. For example.
+
+.. code:: python
+
+    transform = transforms.Compose([
+        transforms.RandomSizedCrop(224),
+        transforms.RandomHorizontalFlip(),
+        transforms.ToTensor(),
+        transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ],
+                              std = [ 0.229, 0.224, 0.225 ]),
+    ])
+
+Transforms on PIL.Image
+~~~~~~~~~~~~~~~~~~~~~~~
+
+``Scale(size, interpolation=Image.BILINEAR)``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Rescales the input PIL.Image to the given 'size'. 'size' will be the
+size of the smaller edge.
+
+For example, if height > width, then image will be rescaled to (size \*
+height / width, size) - size: size of the smaller edge - interpolation:
+Default: PIL.Image.BILINEAR
+
+``CenterCrop(size)`` - center-crops the image to the given size
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Crops the given PIL.Image at the center to have a region of the given
+size. size can be a tuple (target\_height, target\_width) or an integer,
+in which case the target will be of a square shape (size, size)
+
+``RandomCrop(size, padding=0)``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Crops the given PIL.Image at a random location to have a region of the
+given size. size can be a tuple (target\_height, target\_width) or an
+integer, in which case the target will be of a square shape (size, size)
+If ``padding`` is non-zero, then the image is first zero-padded on each
+side with ``padding`` pixels.
+
+``RandomHorizontalFlip()``
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Randomly horizontally flips the given PIL.Image with a probability of
+0.5
+
+``RandomSizedCrop(size, interpolation=Image.BILINEAR)``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Random crop the given PIL.Image to a random size of (0.08 to 1.0) of the
+original size and and a random aspect ratio of 3/4 to 4/3 of the
+original aspect ratio
+
+This is popularly used to train the Inception networks - size: size of
+the smaller edge - interpolation: Default: PIL.Image.BILINEAR
+
+``Pad(padding, fill=0)``
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Pads the given image on each side with ``padding`` number of pixels, and
+the padding pixels are filled with pixel value ``fill``. If a ``5x5``
+image is padded with ``padding=1`` then it becomes ``7x7``
+
+Transforms on torch.\*Tensor
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``Normalize(mean, std)``
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Given mean: (R, G, B) and std: (R, G, B), will normalize each channel of
+the torch.\*Tensor, i.e. channel = (channel - mean) / std
+
+Conversion Transforms
+~~~~~~~~~~~~~~~~~~~~~
+
+-  ``ToTensor()`` - Converts a PIL.Image (RGB) or numpy.ndarray (H x W x
+   C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W)
+   in the range [0.0, 1.0]
+-  ``ToPILImage()`` - Converts a torch.\*Tensor of range [0, 1] and
+   shape C x H x W or numpy ndarray of dtype=uint8, range[0, 255] and
+   shape H x W x C to a PIL.Image of range [0, 255]
+
+Generic Transofrms
+~~~~~~~~~~~~~~~~~~
+
+``Lambda(lambda)``
+^^^^^^^^^^^^^^^^^^
+
+Given a Python lambda, applies it to the input ``img`` and returns it.
+For example:
+
+.. code:: python
+
+    transforms.Lambda(lambda x: x.add(10))
+
+Utils
+=====
+
+make\_grid(tensor, nrow=8, padding=2)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Given a 4D mini-batch Tensor of shape (B x C x H x W), makes a grid of
+images
+
+save\_image(tensor, filename, nrow=8, padding=2)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Saves a given Tensor into an image file.
+
+If given a mini-batch tensor, will save the tensor as a grid of images.
diff --git a/setup.cfg b/setup.cfg
new file mode 100644
index 00000000000..3c6e79cf31d
--- /dev/null
+++ b/setup.cfg
@@ -0,0 +1,2 @@
+[bdist_wheel]
+universal=1
diff --git a/setup.py b/setup.py
index 269a1638b33..bd8bf649e41 100644
--- a/setup.py
+++ b/setup.py
@@ -4,12 +4,12 @@
 import sys
 from setuptools import setup, find_packages
 
-VERSION = '0.1.6'
 
-long_description = '''torch-vision provides DataLoaders, Pre-trained models
-and common transforms for torch for images and videos'''
+readme = open('README.rst').read()
+
+VERSION = '0.1.6'
 
-setup_info = dict(
+setup(
     # Metadata
     name='torchvision',
     version=VERSION,
@@ -17,7 +17,7 @@
     author_email='soumith@pytorch.org',
     url='https://github.com/pytorch/vision',
     description='image and video datasets and models for torch deep learning',
-    long_description=long_description,
+    long_description=readme,
     license='BSD',
 
     # Package info
@@ -25,5 +25,3 @@
 
     zip_safe=True,
 )
-
-setup(**setup_info)