Accelerating image processing for CNN #668

qingqing01 · 2016-11-30T07:34:04Z

The image processing is a key step for CNN. It's slower when reading image list directly by PyDataProvider. And this leads to a lower speedup on Multi-GPUs, especially on 8 GPUs or 16 GPUs.

This PR has two features:

A C++ accelerated library that can be called by Python: OpenCV + Multi-threads
A Python module with python-multiprocessing and Python-OpenCV

One can select one of them to use.

The performance on 4 GPUs( K40 ) with input size 224 * 224 *3 is as follows.
实验条件：

4 Tesla K40m
总batch size: 192
下面时间为20个mini-Batch的时间

PIL直接读取图片list: 64.8s
Pickle对数据打Batch再读取：46.8s
数据打Batch + Python多进程 + Python-OpenCV: 29.7s
数据打Batch + C++多线程 + OpenCV: 26.7s

下一个PR增加使用文档。

#457

… acc_image_proc

reyoung

有个基本问题，先把cpplint加到这些文件里

reyoung · 2016-11-30T10:45:49Z

plugin/opencv/DataTransformer.h

+#include "paddle/utils/Thread.h"
+
+using namespace cv;
+using namespace paddle;


不要在.h里面using namespace

reyoung · 2016-11-30T10:48:00Z

plugin/opencv/PyDecodejpeg.cpp

+
+#include "DataTransformer.h"
+
+using namespace boost::python;


不要引boost呀。。这个依赖太大了

qingqing01 · 2016-11-30T12:04:45Z

@reyoung cpplint及using namespace问题已经修改。关于Boost问题，一开始没有用，后来看到Boost.Python用起来很方便，就改成Boost.Python了，是否真的要去掉？如果真的要去掉的话，我就改下。

reyoung

python文件的稳定性需要提升一些。

reyoung · 2016-12-01T04:18:15Z

python/paddle/utils/image_multiproc.py

@@ -0,0 +1,170 @@
+import os, psutil
+import cv2


try: import cv2 except ImporError: log.warning("OpenCV2 is not installed, using single process image process instead") cv2 = None

如果cv2不存在，就退化到没有opencv一样的情况吧。

reyoung · 2016-12-01T04:18:50Z

python/paddle/utils/image_multiproc.py

+from paddle.utils.image_util import *
+import multiprocessing
+import subprocess, signal, sys
+


加上__all__字段，把需要Export的东西，Export出来。

reyoung · 2016-12-01T04:19:36Z

python/paddle/utils/image_multiproc.py

+                 channel_swap=None,
+                 mean=None,
+                 is_train=True,
+                 is_color=True):


这个MultiProcessImageTransfomer可以接一个参数，就是Transformer的类型是OpenCv的？还是Numpy的？

reyoung · 2016-12-01T04:20:36Z

python/paddle/utils/image_multiproc.py

+        return self.transform(im)
+
+
+class MultiProcessImageTransfomer():


继承自object

reyoung · 2016-12-01T04:20:59Z

python/paddle/utils/image_multiproc.py

+                 is_train=True,
+                 is_color=True):
+        self.procnum = procnum
+        self.capacity = capacity


self.capacity 没有用

reyoung · 2016-12-01T04:28:30Z

python/paddle/utils/image_multiproc.py

+        for i in xrange(self.procnum):
+            start = dlen * i / self.procnum
+            end = dlen * (i + 1) / self.procnum
+            proc = multiprocessing.Process(


这里应该还可以再快些。。。因为args会多一步序列化，如果直接用shared_mem应该就会好一些。。不过也不好说。

reyoung · 2016-12-01T04:29:22Z

python/paddle/utils/image_multiproc.py

+        except Exception as e:
+            print str(e)
+
+    def reset(self, size):


这个文件如果不按照预期方式调用，可能会有一些隐患，例如process没停止等等。

比如，我一直调用run_proc，应该就挂了。。

reyoung · 2016-12-05T08:45:03Z

python/paddle/utils/image_multiproc.py

+    def run(self, data, label):
+        try:
+            fun = partial(warpper, self)
+            return self.pool.imap_unordered(fun, zip(data, label), chunksize=5)


有几个问题:

chunksize最好再设大一些，并且是procnum的倍数。例如 procnum * 100

是procnum的倍数是因为这样各个进程间会比较均衡。

再大一些是因为可以避免一些时候，某个核心被其他人占用，产生的抖动。

推荐用itertools.izip替换zip。因为zip会先把list生成出来，然后再传递给imap_unordered函数，而izip会返回一个generator，一条一条的返回。这样首先内存占用应该理论上会小一些。同时，izip可以按照需要zip，应该会稍微快一些。

reyoung · 2016-12-05T08:45:20Z

python/paddle/utils/image_multiproc.py

+            return self.pool.imap_unordered(fun, zip(data, label), chunksize=5)
+        except KeyboardInterrupt:
+            self.pool.terminate()
+        except Exception, e:


这里不是阻塞的，所以这里try...except一点用应该都没有

reyoung · 2016-12-05T08:47:10Z

python/paddle/utils/image_multiproc.py

+        else:
+            return self.transformer.transform_from_file(data), label
+
+    def __getstate__(self):


不要加这个函数了

reyoung · 2016-12-05T08:47:17Z

python/paddle/utils/image_multiproc.py

+        del self_dict['pool']
+        return self_dict
+
+    def __setstate__(self, state):


不要加这个函数了

reyoung · 2016-12-05T08:48:39Z

python/paddle/utils/image_multiproc.py

+            self.pool.terminate()
+        except Exception, e:
+            self.pool.terminate()
+


@staticmethod def __job__(is_img_string, transformer, data, label): if is_img_string: transformer. transform_from_string(data), label else: transformer. transform_from_file(data), label

reyoung · 2016-12-05T08:49:58Z

python/paddle/utils/image_multiproc.py

+
+    def run(self, data, label):
+        try:
+            fun = partial(warpper, self)


func = partial(MultiProcessImageTransformer. __job__, self.is_img_string, self.transformer)

另外，最好不要 from functools import partial 而是 import functools，functools.partial
这样可读性强一点点。

reyoung · 2016-12-05T08:50:06Z

python/paddle/utils/image_multiproc.py

+        return self.transform(im)
+
+
+def warpper(cls, (dat, label)):


删了吧，没用

reyoung · 2016-12-05T08:50:24Z

python/paddle/utils/image_multiproc.py

+
+from paddle.utils.image_util import *
+from paddle.trainer.config_parser import logger
+


加入 __all__，管理export的符号。

… acc_image_proc

ghost · 2016-12-08T10:05:50Z

0@nva 1 no

ghost · 2016-12-08T10:06:06Z

0@nva 1 no

reyoung

Basically LGTM

reyoung · 2016-12-06T05:45:25Z

python/paddle/utils/image_multiproc.py

+            start_w = (col - self.crop_size) / 2
+        end_h, end_w = start_h + self.crop_size, start_w + self.crop_size
+        im = im.crop((start_h, start_w, end_h, end_w))
+        if (self.is_train) and (np.random.randint(2) == 0):


这里不需要加括号

reyoung · 2016-12-06T05:51:15Z

python/paddle/utils/image_multiproc.py

+        return self.transform(im)
+
+    def load_image_from_file(self, file):
+        im = Image.open(file)


最好不要使用file作为参数名，file在python里面是个函数

ghost · 2016-12-08T11:11:07Z

不吃垃圾，开心快乐！ At 2016-12-08 18:09:11, "Yu Yang" <notifications@github.com> wrote: @reyoung approved this pull request. Basically LGTM In python/paddle/utils/image_multiproc.py:

+ def crop_and_flip(self, im):

+ """ + Return cropped image. + The size of the cropped image is inner_size * inner_size. + """ + row, col = im.size[:2] + start_h, start_w = 0, 0 + if self.is_train: + start_h = np.random.randint(0, row - self.crop_size + 1) + start_w = np.random.randint(0, col - self.crop_size + 1) + else: + start_h = (row - self.crop_size) / 2 + start_w = (col - self.crop_size) / 2 + end_h, end_w = start_h + self.crop_size, start_w + self.crop_size + im = im.crop((start_h, start_w, end_h, end_w)) + if (self.is_train) and (np.random.randint(2) == 0): 这里不需要加括号 In python/paddle/utils/image_multiproc.py:

+ im = self.crop_and_flip(im)

+ im = np.array(im, dtype=np.float32) # convert to numpy.array + # transpose, swap channel, sub mean + ImageTransformer.transformer(self, im) + return im + + def load_image_from_string(self, data): + im = Image.open(StringIO(data)) + return im + + def transform_from_string(self, data): + im = self.load_image_from_string(data) + return self.transform(im) + + def load_image_from_file(self, file): + im = Image.open(file) 最好不要使用file作为参数名，file在python里面是个函数 — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Sampson1107 · 2017-10-11T02:05:42Z

@qingqing01
你好，是否有下面相关测试算例的文档或者demo？
PIL直接读取图片list: 64.8s
Pickle对数据打Batch再读取：46.8s
数据打Batch + Python多进程 + Python-OpenCV: 29.7s
数据打Batch + C++多线程 + OpenCV: 26.7s

* synchronize with develop (PaddlePaddle#642) * update_commitid1.3 (PaddlePaddle#641) * update inference c++ API doc (PaddlePaddle#634) * update inference c++ API doc * fix link * thorough clean for doc (PaddlePaddle#644) * thorough clean * delete_DS_Store * Cherrypick1.3 (PaddlePaddle#652) * thorough clean * delete_DS_Store * [Don't merge now]update_install_doc (PaddlePaddle#643) * update_install_doc * follow_comments * add maxdepth (PaddlePaddle#646) * upload_md (PaddlePaddle#649) * update_version (PaddlePaddle#650) * Translation of 16 new apis (PaddlePaddle#651) * fix_windows * Final update 1.3 (PaddlePaddle#653) * thorough clean * delete_DS_Store * update_1.3 * Deadlink fix (PaddlePaddle#654) * fix_deadlinks * update_docker * Update release_note.rst * Update index_cn.rst * update_Paddle (PaddlePaddle#658) * fix pic (PaddlePaddle#659) * [to 1.3] cn api debug (PaddlePaddle#655) (PaddlePaddle#661) * debug * fix 2 -conv2d * "锚" ==> anchor(s) * Weekly cherrypick0302 (PaddlePaddle#668) * Update programming_guide.md (PaddlePaddle#664) * Update programming_guide.md * Update programming_guide_en.md * Update cn api to 1.3 (PaddlePaddle#663) * Update cn api to 1.3 fluid & layers * Rest to 1.3 * Weeklyupdate 0301 (PaddlePaddle#666) * Tables_rm_op * update_op * update_index * update_book_0302 (PaddlePaddle#667) * fix_format (PaddlePaddle#669) (PaddlePaddle#670) * fix_format * Update Tables.md * Update Tables_en.md * add dataset api_cn (PaddlePaddle#673) * rm fluid.core in desigin_idea (PaddlePaddle#674) * Update fluid_design_idea.md * Update fluid_design_idea_en.md * Fix array_read code example error. (PaddlePaddle#671) Signed-off-by: zhaoyuchen <zhaoyuchen01@baidu.com> * add data_reader_cn (PaddlePaddle#676) * fix doc error (PaddlePaddle#675) * update_book_commitid (PaddlePaddle#680) * update_book_commitid * commitid0309 * fix typo * book indexes (PaddlePaddle#677)

Co-authored-by: ceci3 <ceci3@users.noreply.github.com>

…r self-attn and cross-attn (PaddlePaddle#668)

qingqing01 added 3 commits November 30, 2016 12:53

Accelerating image processing for CNN

9d72cab

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

8c9a967

… acc_image_proc

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

4d99782

… acc_image_proc

qingqing01 assigned reyoung, gangliao and luotao1 Nov 30, 2016

reyoung requested changes Nov 30, 2016

View reviewed changes

Add style check and remove 'using namespace'

fe073d1

reyoung requested changes Dec 1, 2016

View reviewed changes

qingqing01 force-pushed the acc_image_proc branch from 480a638 to 0b5eb6e Compare December 2, 2016 07:11

Remove the C++ code and refine Python code.

ae06deb

qingqing01 force-pushed the acc_image_proc branch from 0b5eb6e to ae06deb Compare December 2, 2016 07:16

reyoung requested changes Dec 5, 2016

View reviewed changes

qingqing01 added 3 commits December 6, 2016 09:57

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

9d2f49c

… acc_image_proc

follow comments

84d47ac

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

978d6e8

… acc_image_proc

reyoung approved these changes Dec 8, 2016

View reviewed changes

reyoung merged commit 2039070 into PaddlePaddle:develop Dec 8, 2016

qingqing01 deleted the acc_image_proc branch April 20, 2017 10:34

Meiyim pushed a commit to Meiyim/Paddle that referenced this pull request May 21, 2021

update ernie-unimo info (PaddlePaddle#668)

90d1e79

wangxicoding pushed a commit to wangxicoding/Paddle that referenced this pull request Dec 9, 2021

make cmake friendly to user (PaddlePaddle#668)

0b4cc4c

lizexu123 pushed a commit to lizexu123/Paddle that referenced this pull request Feb 23, 2024

fix test_fsp_loss ut (PaddlePaddle#668)

c701d2b

Co-authored-by: ceci3 <ceci3@users.noreply.github.com>

WAYKEN-TSE pushed a commit to WAYKEN-TSE/Paddle that referenced this pull request Dec 6, 2024

support cogvlm and cogagent fp16, fix past_key_value dimension bug fo…

96d6f54

…r self-attn and cross-attn (PaddlePaddle#668)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerating image processing for CNN #668

Accelerating image processing for CNN #668

qingqing01 commented Nov 30, 2016 •

edited

Loading

reyoung left a comment

reyoung Nov 30, 2016

reyoung Nov 30, 2016

qingqing01 commented Nov 30, 2016

reyoung left a comment

reyoung Dec 1, 2016

reyoung Dec 1, 2016

reyoung Dec 1, 2016

reyoung Dec 1, 2016

reyoung Dec 1, 2016

reyoung Dec 1, 2016

reyoung Dec 1, 2016

reyoung Dec 5, 2016

reyoung Dec 5, 2016

reyoung Dec 5, 2016

reyoung Dec 5, 2016

reyoung Dec 5, 2016

reyoung Dec 5, 2016 •

edited

Loading

reyoung Dec 5, 2016

reyoung Dec 5, 2016

reyoung Dec 5, 2016 •

edited

Loading

ghost commented Dec 8, 2016

ghost commented Dec 8, 2016

reyoung left a comment

reyoung Dec 6, 2016

reyoung Dec 6, 2016

ghost commented Dec 8, 2016 via email

Sampson1107 commented Oct 11, 2017

		return self.transform(im)


		class MultiProcessImageTransfomer():


		from paddle.utils.image_util import *
		from paddle.trainer.config_parser import logger

Accelerating image processing for CNN #668

Accelerating image processing for CNN #668

Conversation

qingqing01 commented Nov 30, 2016 • edited Loading

reyoung left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 commented Nov 30, 2016

reyoung left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reyoung Dec 5, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reyoung Dec 5, 2016 • edited Loading

Choose a reason for hiding this comment

ghost commented Dec 8, 2016

ghost commented Dec 8, 2016

0@nva 1 no

reyoung left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented Dec 8, 2016 via email

Sampson1107 commented Oct 11, 2017

qingqing01 commented Nov 30, 2016 •

edited

Loading

reyoung Dec 5, 2016 •

edited

Loading

reyoung Dec 5, 2016 •

edited

Loading