Skip to content

Commit

Permalink
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
Browse files Browse the repository at this point in the history
… acc_image_proc
  • Loading branch information
qingqing01 committed Dec 6, 2016
2 parents ae06deb + f93af82 commit 9d2f49c
Show file tree
Hide file tree
Showing 153 changed files with 10,706 additions and 4,544 deletions.
1 change: 1 addition & 0 deletions .dockerignore
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ build/
.cproject
.pydevproject
Makefile
.test_env/
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ before_install:
fi
- if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then sudo paddle/scripts/travis/before_install.linux.sh; fi
- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then paddle/scripts/travis/before_install.osx.sh; fi
- pip install wheel protobuf sphinx breathe recommonmark virtualenv numpy
- pip install wheel protobuf sphinx breathe recommonmark virtualenv numpy sphinx_rtd_theme
script:
- paddle/scripts/travis/main.sh
notifications:
Expand Down
47 changes: 17 additions & 30 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,17 +1,24 @@
cmake_minimum_required(VERSION 2.8)

project(paddle CXX C)
set(PADDLE_MAJOR_VERSION 0)
set(PADDLE_MINOR_VERSION 9)
set(PADDLE_PATCH_VERSION 0)
set(PADDLE_VERSION ${PADDLE_MAJOR_VERSION}.${PADDLE_MINOR_VERSION}.${PADDLE_PATCH_VERSION})

set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake")
set(PROJ_ROOT ${CMAKE_SOURCE_DIR})
include(package)
find_package(SWIG 2.0)
find_package(CUDA QUIET)
find_package(Protobuf REQUIRED)

# Check protobuf library version.
execute_process(COMMAND ${PROTOBUF_PROTOC_EXECUTABLE} --version
OUTPUT_VARIABLE PROTOBUF_VERSION)
string(REPLACE "libprotoc " "" PROTOBUF_VERSION ${PROTOBUF_VERSION})

set(PROTOBUF_3 OFF)
if (${PROTOBUF_VERSION} VERSION_GREATER "3.0.0" OR ${PROTOBUF_VERSION} VERSION_EQUAL "3.0.0")
set(PROTOBUF_3 ON)
endif()

find_package(PythonLibs 2.7 REQUIRED)
find_package(PythonInterp 2.7 REQUIRED)
find_package(ZLIB REQUIRED)
Expand Down Expand Up @@ -45,7 +52,7 @@ option(ON_COVERALLS "Generating code coverage data on coveralls or not." OFF)
option(COVERALLS_UPLOAD "Uploading the generated coveralls json." ON)

if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE "RelWithDebInfo" CACHE STRING
set(CMAKE_BUILD_TYPE "RelWithDebInfo" CACHE STRING
"Choose the type of build, options are: Debug Release RelWithDebInfo MinSizeRel"
FORCE)
endif()
Expand All @@ -64,31 +71,11 @@ include(check_packages)
include(swig)
include(coveralls)

# add PaddlePaddle version
if(DEFINED ENV{PADDLE_VERSION})
add_definitions(-DPADDLE_VERSION=\"$ENV{PADDLE_VERSION}\")
else()
if(EXISTS ${PROJ_ROOT}/.svn/)
find_package(Subversion REQUIRED)
if(SUBVERSION_FOUND)
Subversion_WC_INFO(${PROJ_ROOT} Project)
add_definitions(-DPADDLE_VERSION=${Project_WC_REVISION})
endif()
elseif(EXISTS ${PROJ_ROOT}/.git/)
find_package(Git REQUIRED)
execute_process(
COMMAND ${GIT_EXECUTABLE} log -1 --format=%H
WORKING_DIRECTORY ${PROJ_ROOT}
OUTPUT_VARIABLE GIT_SHA1
RESULT_VARIABLE GIT_RESULT
ERROR_QUIET OUTPUT_STRIP_TRAILING_WHITESPACE)
if(NOT ${GIT_RESULT})
add_definitions(-DPADDLE_VERSION=\"${GIT_SHA1}\")
else()
message(WARNING "Cannot add paddle version from git tag")
endif()
endif()
endif()
# Set PaddlePaddle version to Git tag name or Git commit ID.
find_package(Git REQUIRED)
# version.cmake will get the current PADDLE_VERSION
include(version)
add_definitions(-DPADDLE_VERSION=\"${PADDLE_VERSION}\")


if(NOT WITH_GPU)
Expand Down
9 changes: 9 additions & 0 deletions benchmark/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
paddle/image/logs
paddle/image/*.pyc
paddle/image/train.list
paddle/rnn/logs
paddle/rnn/*.pyc
paddle/rnn/imdb.pkl
caffe/image/logs
tensorflow/image/logs
tensorflow/rnn/logs
168 changes: 168 additions & 0 deletions benchmark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# Benchmark

Machine:

- CPU: 12-core Intel(R) Xeon(R) CPU E5-2620 v2 @2.10GHz
- GPU: Tesla K40m
- cuDNN: v5.1
- system: Docker 1.12.1, all platforms are tested in docker environment.

Platforms:

- PaddlePaddle: paddledev/paddle:gpu-devel-v0.9.0a0
- Tensorflow: gcr.io/tensorflow/tensorflow:0.11.0rc0-gpu
- Caffe: kaixhin/cuda-caffe

Several convolutional neural networks and recurrent neural networks are used to test.

## Image

### Benchmark Model

AlexNet, GoogleNet and a small network used in Caffe.

- [AlexNet](https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet): but the group size is one.

- [GoogleNet](https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet): but remove loss1 and loss2 when testing benchmark.

- [SmallNet](https://github.com/BVLC/caffe/blob/master/examples/cifar10/cifar10\_quick\_train\_test.prototxt)


### Single-GPU

- AlexNet: input - 3 * 227 * 227, Time: ms/batch

| BatchSize | 64 | 128 | 256 | 512 |
|--------------|-----| -----| ------| -----|
| PaddlePaddle | 195 | 334 | 602 | 1629 |
| TensorFlow | 223 | 364 | 645 | 1235 |
| Caffe | 324 | 627 | 1232 | 2513 |

**Notation**

All platforms use cuDNN-v5.1. We see that caffe is slower in this experiment, because its workspace limit size of cuDNN-conv interface is 8 * 1024 * 1024, which is smaller in PaddlePaddle and TensorFlow. Note that Caffe will be faster if increasing the workspace limit size.

- GoogletNet: input - 3 * 224 * 224, Time: ms/batch


| BatchSize | 64 | 128 | 256 |
|--------------|-------| -------| --------|
| PaddlePaddle | 613 | 1149 | 2348 |
| TensorFlow | 644 | 1176 | 2219 |
| Caffe | 694 | 1364 | out of memory |

- SmallNet: input - 3 * 32 * 32, Time ms/batch

| BatchSize | 64 | 128 | 256 | 512 |
|--------------|--------| -------- | --------|---------|
| PaddlePaddle | 10.463 | 18.184 | 33.113 | 63.039 |
| TensorFlow | 9 | 15 | 28 | 59 |
| Caffe | 9.373 | 16.6606 | 31.4797 | 59.719 |

**Notation**

All the single-GPU experiments in caffe use `caffe time` to calculate elapsed time, which does not include parameter updating time. However, both PaddlePaddle and TensorFlow experiments contain the parameter updating time. As compared with the total time, this part is relatively little on single machine, we can ignore it.

In Tensorflow, they implement algorithm searching method instead of using the algorithm searching interface in cuDNN.

### Multi-GPU: 4 GPUs

- AlexNet, ms / batch

| total-BatchSize | 128 * 4 | 256 * 4 |
|------------------|----------| -----------|
| PaddlePaddle | 347 | 622 |
| TensorFlow | 377 | 675 |
| Caffe | 1229 | 2435 |

For example, if `total-BatchSize = 128 * 4`, the speedup ratio is calculated by

```
time_at_1gpu_batch_128 * 4 / time_at_4gpu_total_batch_512
= (334 * 4)/347
= 3.85
```

<img src="figs/alexnet-4gpu.png" width="420">


- GoogleNet, ms / batch

| total-BatchSize | 128 * 4 | 256 * 4 |
|-------------------|--------------| ----------- |
| PaddlePaddle | 1178 | 2367 |
| TensorFlow | 1210 | 2292 |
| Caffe | 2007 | out of memory |

<img src="figs/googlenet-4gpu.png" width="420">


## RNN
We use lstm network for text classfication to test benchmark.

### Dataset
- [IMDB](http://www.iro.umontreal.ca/~lisa/deep/data/imdb.pkl)
- Sequence length is 100. In fact, PaddlePaddle supports training with variable-length sequence, but TensorFlow needs to pad. Thus, we also pad sequence length to 100 in PaddlePaddle in order to compare.
- Dictionary size=30000
- Peephole connection is used in `lstmemory` by default in PaddlePaddle. It is also configured in TensorFlow.

### Single-GPU

#### LSTM in Text Classification

Testing `2 lstm layer + fc` network with different hidden size and batch size.

- Batch size = 64, ms / batch

| hidden_size | 256 | 512 | 1280 |
|--------------|-------| -------| --------|
| PaddlePaddle | 83 | 184 | 641 |
| TensorFlow | 175 | 280 | 818 |

- Batch size = 128, ms / batch

| hidden_size | 256 | 512 | 1280 |
|--------------|------- | -------| --------|
| PaddlePaddle | 110 | 261 | 1007 |
| TensorFlow | 181 | 361 | 1237 |


- Batch size = 256, ms / batch

| hidden_size | 256 | 512 | 1280 |
|--------------|-------| -------| --------|
| PaddlePaddle | 170 | 414 | 1655 |
| TensorFlow | 238 | 536 | 1905 |

<img src="figs/rnn_lstm_cls.png" width="600">

#### Seq2Seq

The benchmark of sequence-to-sequence network will be added later.


### Multi GPU: 4 GPUs

#### LSTM in Text Classification

- hidden_size = 256, ms / batch

| batch_size | 256 | 512 |
|--------------| -------| --------|
| PaddlePaddle | 90 | 118 |
| TensorFlow | 226 | 118 |


- hidden_size = 512, ms / batch

| batch_size | 256 | 512 |
|--------------| -------| --------|
| PaddlePaddle | 189 | 268 |
| TensorFlow | 297 | 383 |


<img src="figs/rnn_lstm_4gpus.png" width="420">

#### Seq2Seq

The benchmark of sequence-to-sequence network will be added later.
Loading

0 comments on commit 9d2f49c

Please sign in to comment.