Skip to content

Commit

Permalink
Merge pull request #4 from beniz/tf
Browse files Browse the repository at this point in the history
update
  • Loading branch information
kyrs committed Mar 20, 2016
2 parents 1c287f3 + d8116f0 commit f26c868
Show file tree
Hide file tree
Showing 31 changed files with 1,661 additions and 627 deletions.
23 changes: 18 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,24 @@
## DeepDetect : Open Source API & Deep Learning Server
## DeepDetect : Open Source Deep Learning Server & API

[![Join the chat at https://gitter.im/beniz/deepdetect](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/beniz/deepdetect?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

DeepDetect (http://www.deepdetect.com/) is a machine learning API and server written in C++11. It makes state of the art machine learning easy to work with and integrate into existing applications.

DeepDetect relies on external machine learning libraries through a very generic and flexible API. At the moment it has support for the deep learning library [Caffe](https://github.com/BVLC/caffe).

#### Main functionalities:
#### Main functionalities

DeepDetect implements support for supervised deep learning of images and other data, with focus on simplicity and ease of use, test and connection into existing applications.

#### Supported Platforms

The reference platform with support is **Ubuntu 14.04 LTS**.

Supported images that come with pre-trained image classification deep (residual) neural nets:

- **docker images** for CPU and GPU machines are available at https://hub.docker.com/r/beniz/deepdetect_gpu/, see https://github.com/beniz/deepdetect/tree/master/docker/README.md for details on how to use them.
- For **Amazon AMI** see https://github.com/beniz/deepdetect/issues/5#issuecomment-188464262

#### Quickstart
Setup an image classifier API service in a few minutes:
http://www.deepdetect.com/tutorials/imagenet-classifier/
Expand All @@ -31,7 +40,7 @@ Current features include:
- range of built-in model assessment measures (e.g. F1, multiclass log loss, ...)
- no database dependency and sync, all information and model parameters organized and available from the filesystem
- flexible template output format to simplify connection to external applications
- templates for the most useful neural architectures (e.g. Googlenet, Alexnet, convnet, character-based convnet, mlp, logistic regression)
- templates for the most useful neural architectures (e.g. Googlenet, Alexnet, ResNet, convnet, character-based convnet, mlp, logistic regression)

##### Documentation

Expand Down Expand Up @@ -67,10 +76,14 @@ By default DeepDetect automatically relies on a modified version of Caffe, https

The code makes use of C++ policy design for modularity, performance and putting the maximum burden on the checks at compile time. The implementation uses many features from C++11.

##### Visual Demo
##### Demo

- Image classification Web interface:
HTML and javascript classification image demo in [demo/imgdetect](https://github.com/beniz/deepdetect/tree/master/demo/imgdetect)

- Image similarity search:
Python script for indexing and searching images is in [demo/imgsearch](https://github.com/beniz/deepdetect/tree/master/demo/imgsearch)

##### Examples

- List of examples, from MLP for data, text, multi-target regression to CNN and GoogleNet, finetuning, etc...:
Expand All @@ -85,7 +98,7 @@ DeepDetect is designed and implemented by Emmanuel Benazera <beniz@droidnik.fr>.

### Build

Below are instructions for Linux systems.
Below are instructions for Ubuntu 14.04 LTS. For other Linux and Unix systems, steps may differ, CUDA, Caffe and other libraries may prove difficult to setup.

Beware of dependencies, typically on Debian/Ubuntu Linux, do:
```
Expand Down
53 changes: 26 additions & 27 deletions clients/python/dd_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,31 +65,30 @@ def __str__(self):
msg += "\n"
return msg

class DDDataError(Exception):
def __init__(self, url, http_method, headers, body, data=None):
self.msg = "DeepDetect Data Error"
self.http_method = http_method
self.req_headers = headers
self.req_body = body
self.url = url
self.data = data

def __str__(self):
msg = "%s %s\n"%(str(self.http_method),str(self.url))
if self.data is not None:
msg += str(self.data)[:100]
msg += "\n"
return msg
for h,v in self.req_headers.iteritems():
msg += "%s:%s\n"%(h,v)
msg += "\n"
if self.req_body is not None:
msg += str(self.req_body)
msg += "\n"
msg += "--\n"
msg += str(self.data)
msg += "\n"
return msg
class DDDataError(Exception):
def __init__(self, url, http_method, headers, body, data=None):
self.msg = "DeepDetect Data Error"
self.http_method = http_method
self.req_headers = headers
self.req_body = body
self.url = url
self.data = data

def __str__(self):
msg = "%s %s\n"%(str(self.http_method),str(self.url))
if self.data is not None:
msg += str(self.data)[:100]
msg += "\n"
for h,v in self.req_headers.iteritems():
msg += "%s:%s\n"%(h,v)
msg += "\n"
if self.req_body is not None:
msg += str(self.req_body)
msg += "\n"
msg += "--\n"
msg += str(self.data)
msg += "\n"
return msg

API_METHODS_URL = {
"0.1" : {
Expand Down Expand Up @@ -272,7 +271,7 @@ def info(self):
# - PUT services
# - GET services
# - DELETE services
def put_service(self,sname,model,description,mllib,parameters_input,parameters_mllib,parameters_output):
def put_service(self,sname,model,description,mllib,parameters_input,parameters_mllib,parameters_output,mltype='supervised'):
"""
Create a service
Parameters:
Expand All @@ -284,7 +283,7 @@ def put_service(self,sname,model,description,mllib,parameters_input,parameters_m
parameters_mllib -- dict ML library parameters
parameters_output -- dict of output parameters
"""
body={"description":description,"mllib":mllib,"type":"supervised",
body={"description":description,"mllib":mllib,"type":mltype,
"parameters":{"input":parameters_input,"mllib":parameters_mllib,"output":parameters_output},
"model":model}
return self.put(self.__urls["services"] + '/%s'%sname,json.dumps(body))
Expand Down
54 changes: 54 additions & 0 deletions demo/imgsearch/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
### Image similary search demo

This is a small Python demo of an image similarity search application.

It does two things:

- use a DeepDetect image classification service in order to generate a numerical or binary code for every image
- indexes images with [annoy](https://github.com/spotify/annoy), an approximate nearest neighbors C++/Python library
- search by images, even for new images, not previously indexed, and return the closest images

To run the code on your own collection of images:

- install Annoy:
```
pip install annoy
```
or go look at https://github.com/spotify/annoy

- create a model repository with the pre-trained image classification network of your choice. Here we are using a pre-trained GoogleNet, but you can also use a built-in ResNet or [other provided models](http://www.deepdetect.com/applications/model/):
```
mkdir model
cd model
wget http://www.deepdetect.com/models/ggnet/bvlc_googlenet.caffemodel
```

**make sure that the `model` repository is in the same repository as the script `imgsearch.py`**

- start a DeepDetect server:
```
./dede
```

- index your collection of images:
```
python imgsearch.py --index /path/to/your/images --index-batch-size 64
```
Here `index-batch-size` controls the number of images that are processed at once.
The index file is then `index.ann` in the repository. `names.bin` indexes the filenames.

**Index and name files are erased upon every new indexing call**

- search for similar images:
```
python imgsearch.py --search /path/your/image.png --search-size 10
```
Here `search-size` controls the number of approximate neighbors.

Notes:

- The search uses a deep convolutional net layer as a code for every image. Using top layers (e.g. `loss3/classifier` with GoogleNet) uses high level features and thus image similarity is based on high level concepts such as whether the image contains a lakeshore, a bottle, etc... Using bottom or mid-range layers (e.g. `pool5/7x7_s1` with GoogleNet) makes image similarity based on lower level, potentially invariant, universal features such as lightning conditions, basic shapes, etc... Experiment and see what is best for your application.

- Annoy is a nice piece of code but in experiments the index building step becomes very memory inefficient and time-consuming around a million of images. If this is an issue, get in touch, as they are other, more complicated, ways to index and perform the search and scale.

- The code in `imgsearch.py` allows for more options such as whether to use `binarized` codes, `angular` or `euclidean` metric for similar image retrieval, and control of the accuracy of the search through `ntrees`.
1 change: 1 addition & 0 deletions demo/imgsearch/dd_client.py
112 changes: 112 additions & 0 deletions demo/imgsearch/imgsearch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
import os, sys, argparse
from os import listdir
from os.path import isfile, join
from os import walk
from dd_client import DD
from annoy import AnnoyIndex
import shelve
import cv2

parser = argparse.ArgumentParser()
parser.add_argument("--index",help="repository of images to be indexed")
parser.add_argument("--index-batch-size",type=int,help="size of image batch when indexing",default=1)
parser.add_argument("--search",help="image input file for similarity search")
parser.add_argument("--search-size",help="number of nearest neighbors",type=int,default=10)
args = parser.parse_args()

def batch(iterable, n=1):
l = len(iterable)
for ndx in range(0, l, n):
yield iterable[ndx:min(ndx + n, l)]

def image_resize(imgfile,width):
imgquery = cv2.imread(imgfile)
r = width / imgquery.shape[1]
dim = (int(width), int(imgquery.shape[0] * r))
small = cv2.resize(imgquery,dim)
return small

host = 'localhost'
sname = 'imgserv'
description = 'image classification'
mllib = 'caffe'
mltype = 'unsupervised'
extract_layer = 'loss3/classifier'
#extract_layer = 'pool5/7x7_s1'
nclasses = 1000
layer_size = 1000 # default output code size
width = height = 224
binarized = False
dd = DD(host)
dd.set_return_format(dd.RETURN_PYTHON)
ntrees = 100
metric = 'angular' # or 'euclidean'

# creating ML service
model_repo = os.getcwd() + '/model'
model = {'repository':model_repo,'templates':'../templates/caffe/'}
parameters_input = {'connector':'image','width':width,'height':height}
parameters_mllib = {'nclasses':nclasses,'template':'googlenet'}
parameters_output = {}
dd.put_service(sname,model,description,mllib,
parameters_input,parameters_mllib,parameters_output,mltype)

# reset call params
parameters_input = {}
parameters_mllib = {'gpu':True,'extract_layer':extract_layer}
parameters_output = {'binarized':binarized}

if args.index:
try:
os.remove('names.bin')
except:
pass
s = shelve.open('names.bin')

# list files in image repository
c = 0
onlyfiles = []
for (dirpath, dirnames, filenames) in walk(args.index):
nfilenames = []
for f in filenames:
nfilenames.append(dirpath + '/' + f)
onlyfiles.extend(nfilenames)
for x in batch(onlyfiles,args.index_batch_size):
sys.stdout.write('\r'+str(c)+'/'+str(len(onlyfiles)))
sys.stdout.flush()
classif = dd.post_predict(sname,x,parameters_input,parameters_mllib,parameters_output)
for p in classif['body']['predictions']:
if c == 0:
layer_size = len(p['vals'])
s['layer_size'] = layer_size
t = AnnoyIndex(layer_size,metric) # prepare index
t.add_item(c,p['vals'])
s[str(c)] = p['uri']
c = c + 1
#if c >= 10000:
# break
print 'building index...\n'
print 'layer_size=',layer_size
t.build(ntrees)
t.save('index.ann')
s.close()

if args.search:
s = shelve.open('names.bin')
u = AnnoyIndex(s['layer_size'],metric)
u.load('index.ann')
data = [args.search]
classif = dd.post_predict(sname,data,parameters_input,parameters_mllib,parameters_output)
near = u.get_nns_by_vector(classif['body']['predictions']['vals'],args.search_size,include_distances=True)
print near
near_names = []
for n in near[0]:
near_names.append(s[str(n)])
print near_names
cv2.imshow('query',image_resize(args.search,224.0))
cv2.waitKey(0)
for n in near_names:
cv2.imshow('res',image_resize(n,224.0))
cv2.waitKey(0)

dd.delete_service(sname,clear='')
84 changes: 84 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
## DeepDetect Docker images

This repository contains the Dockerfiles for building the CPU and GPU images for deepdetect.

Also see https://hub.docker.com/u/beniz/starred/ for pre-built images

The docker images contain:
- a running `dede` server ready to be used, no install required
- `googlenet` and `resnet_50` pre-trained image classification models, in `/opt/models/`

This allows to run the container and set an image classification model based on deep (residual) nets in two short command line calls.

### Getting and running official images

```
docker pull beniz/deepdetect_cpu
```
or
```
docker pull beniz/deepdetect_gpu
```

#### Running the CPU image

```
docker run -d -p 8080:8080 beniz/deepdetect_cpu
```

`dede` server is now listening on your port `8080`:

```
curl http://localhost:8080/info
{"status":{"code":200,"msg":"OK"},"head":{"method":"/info","version":"0.1","branch":"master","commit":"c8556f0b3e7d970bcd9861b910f9eae87cfd4b0c","services":[]}}
```

Here is how to do a simple image classification service and prediction test:
- service creation
```
curl -X PUT "http://localhost:8080/services/imageserv" -d "{\"mllib\":\"caffe\",\"description\":\"image classification service\",\"type\":\"supervised\",\"parameters\":{\"input\":{\"connector\":\"image\"},\"mllib\":{\"nclasses\":1000,\"template\":\"googlenet\"}},\"model\":{\"templates\":\"../templates/caffe/\",\"repository\":\"/opt/models/ggnet/\"}}"
{"status":{"code":201,"msg":"Created"}}
```
- image classification
```
curl -X POST "http://localhost:8080/predict" -d "{\"service\":\"imageserv\",\"parameters\":{\"input\":{\"width\":224,\"height\":224},\"output\":{\"best\":3},\"mllib\":{\"gpu\":false}},\"data\":[\"http://i.ytimg.com/vi/0vxOhd4qlnA/maxresdefault.jpg\"]}"
{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","time":852.0,"service":"imageserv"},"body":{"predictions":{"uri":"http://i.ytimg.com/vi/0vxOhd4qlnA/maxresdefault.jpg","classes":[{"prob":0.2255125343799591,"cat":"n03868863 oxygen mask"},{"prob":0.20917612314224244,"cat":"n03127747 crash helmet"},{"last":true,"prob":0.07399296760559082,"cat":"n03379051 football helmet"}]}}}
```

#### Running the GPU image

This requires [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) in order for the local GPUs to be made accessible by the container.

The following steps are required:

- install `nvidia-docker`: https://github.com/NVIDIA/nvidia-docker
- run with
```
nvidia-docker run -d -p 8080:8080 beniz/deepdetect_gpu
```

Notes:
- `nvidia-docker` requires docker >= 1.9

To test on image classification on GPU:
```
curl -X PUT "http://localhost:8080/services/imageserv" -d "{\"mllib\":\"caffe\",\"description\":\"image classification service\",\"type\":\"supervised\",\"parameters\":{\"input\":{\"connector\":\"image\"},\"mllib\":{\"nclasses\":1000,\"template\":\"googlenet\"}},\"model\":{\"templates\":\"../templates/caffe/\",\"repository\":\"/opt/models/ggnet/\"}}"
{"status":{"code":201,"msg":"Created"}}
```
and
```
curl -X POST "http://localhost:8080/predict" -d "{\"service\":\"imageserv\",\"parameters\":{\"input\":{\"width\":224,\"height\":224},\"output\":{\"best\":3},\"mllib\":{\"gpu\":true}},\"data\":[\"http://i.ytimg.com/vi/0vxOhd4qlnA/maxresdefault.jpg\"]}"
```

Try the `POST` call twice: first time loads the net so it takes slightly below a second, then second call should yield a `time` around 100ms as reported in the output JSON.

#### Building an image

Example goes with the CPU image:
```
cd cpu
docker build -t beniz/deepdetect_cpu --no-cache .
```
Loading

0 comments on commit f26c868

Please sign in to comment.