Refine profiler and expose to Python. #7576

qingqing01 · 2018-01-16T14:25:54Z

Add profiler in the executor to record time for each op.
Expose profiler to Python.
The output is as follows in the unit test

------------------------->     Profiling Report <-------------------------

Place: CPU
Time unit: ms
Sorted by total time in descending order in the same thread

Event                            Calls       Total       Min.        Max.        Ave.
thread0::mul_grad                24          3.9817      0.030622    2.19622     0.165904
thread0::mul                     24          2.43597     0.018009    0.257821    0.101499
thread0::momentum                48          1.65413     0.012138    0.128806    0.0344611
thread0::softmax                 8           1.04434     0.129704    0.131114    0.130543
thread0::elementwise_add_grad    24          0.427291    0.01508     0.023408    0.0178038
thread0::elementwise_add         24          0.35214     0.009682    0.020287    0.0146725
thread0::cast                    32          0.286981    0.006007    0.0149      0.00896816
thread0::relu_grad               16          0.268582    0.013828    0.02076     0.0167864
thread0::top_k                   8           0.251921    0.02915     0.036       0.0314901
thread0::softmax_grad            8           0.196288    0.024375    0.024798    0.024536
thread0::sum                     16          0.172279    0.009126    0.012375    0.0107674
thread0::feed                    16          0.1705      0.00402     0.017667    0.0106562
thread0::relu                    16          0.132929    0.007018    0.009637    0.00830806
thread0::accuracy                8           0.116111    0.01412     0.015031    0.0145139
thread0::cross_entropy_grad      8           0.106584    0.012811    0.014423    0.013323
thread0::cross_entropy           8           0.096511    0.011705    0.012282    0.0120639
thread0::elementwise_div         8           0.089527    0.010935    0.01219     0.0111909
thread0::mean_grad               8           0.079708    0.008923    0.015726    0.0099635
thread0::fetch                   24          0.067049    0.001737    0.003392    0.00279371
thread0::mean                    8           0.05091     0.006111    0.006513    0.00636375
thread0::fill_constant           8           0.038575    0.004709    0.004976    0.00482187

qingqing01 · 2018-01-17T02:14:18Z

Use with profiler.profiler('CPU', 'total') as prof to package the profiling code.
profiler.reset_profiler() can be used to clear the previous records.

A simple usage is as follows:

image = fluid.layers.data(name='x', shape=[784], dtype='float32')
# ...
avg_cost = fluid.layers.mean(x=cost)
optimizer = fluid.optimizer.Momentum(learning_rate=0.001, momentum=0.9)
opts = optimizer.minimize(avg_cost)
accuracy = fluid.evaluator.Accuracy(input=predict, label=label)

place = fluid.CPUPlace() # or fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
accuracy.reset(exe)

with profiler.profiler('CPU', 'total') as prof:
     for iter in range(10):
         if iter == 2:
             profiler.reset_profiler()
         x = np.random.random((32, 784)).astype("float32")
         y = np.random.randint(0, 10, (32, 1)).astype("int64")

         outs = exe.run(fluid.default_main_program(),
                        feed={'x': x,
                              'y': y},
                        fetch_list=[avg_cost] + accuracy.metrics)
         acc = np.array(outs[1])
         pass_acc = accuracy.eval(exe)

kuke

Great. Thx.

kuke · 2018-01-17T04:17:39Z

python/paddle/v2/fluid/profiler.py

+    and GPU program.
+
+    Args:
+        state (string) : The profiler state, It should be 'CPU' or 'GPU'.


Feel that it is necessary to add more explanation about param state. Users may be confused why a state is needed after the place is given.

Add more explanation here.

kuke · 2018-01-17T04:22:44Z

python/paddle/v2/fluid/profiler.py

+
+    Args:
+        state (string) : The profiler state, It should be 'CPU' or 'GPU'.
+        sorted_key (string) : If None, the profiler results will be printed


If None, the profiler results will be printed without sorting. -> If None, the profiling results will be printed in the order of first end time of events.

profiler results -> profiling results, the same below.

kuke · 2018-01-17T04:23:50Z

python/paddle/v2/fluid/profiler.py

+            without sorting. Otherwise, the profiler results will be sorted
+            by the this flag. This flag should be one of 'calls', 'total',
+            'max', 'min' or 'ave'.
+            The `calls` means sorting by the calling counter.


the calling counter -> number of calls

kuke · 2018-01-17T04:29:38Z

python/paddle/v2/fluid/profiler.py

+@contextmanager
+def profiler(state, sorted_key=None):
+    """The profiler interface.
+    Different from cuda_profiler, this fuction can be used to profile both CPU


function -> profiler
Maybe CPU and GPU program -> CPU and GPU operator kernels is better.

Modified the comments.

chengduoZH · 2018-01-17T08:35:52Z

paddle/framework/executor.cc

+
+    platform::DeviceContextPool& pool = platform::DeviceContextPool::Instance();
+    auto dev_ctx = const_cast<platform::DeviceContext*>(pool.Get(place_));
+    platform::RecordEvent record_event(op->Type(), dev_ctx);


RecordEvent should not be always called, only when ProfilerState is not kDisabled.

@chengduoZH Thanks for your review. Whether to record timeline is judged in the constructor of RecordEvent: https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/platform/profiler.cc#L130

chengduoZH · 2018-01-17T08:40:06Z

paddle/platform/profiler.cc

+  ParseEvents(all_events, sorted_key);
+  ResetProfiler();
+}
+


I think there should have a ProfilerState GetProfilerState(), and this function is called in executor.cc.

As the above reply. The less the operation, the better.

… profiling_py

qingqing01

@kuke @chengduoZH Thanks!

qingqing01 · 2018-01-22T08:10:03Z

paddle/platform/profiler.cc

+  ParseEvents(all_events, sorted_key);
+  ResetProfiler();
+}
+


As the above reply. The less the operation, the better.

qingqing01 · 2018-01-22T08:10:23Z

python/paddle/v2/fluid/profiler.py

+@contextmanager
+def profiler(state, sorted_key=None):
+    """The profiler interface.
+    Different from cuda_profiler, this fuction can be used to profile both CPU


Modified the comments.

qingqing01 · 2018-01-22T08:10:54Z

python/paddle/v2/fluid/profiler.py

+    and GPU program.
+
+    Args:
+        state (string) : The profiler state, It should be 'CPU' or 'GPU'.


Add more explanation here.

qingqing01 · 2018-01-22T08:10:59Z

python/paddle/v2/fluid/profiler.py

+
+    Args:
+        state (string) : The profiler state, It should be 'CPU' or 'GPU'.
+        sorted_key (string) : If None, the profiler results will be printed


qingqing01 · 2018-01-22T08:11:04Z

python/paddle/v2/fluid/profiler.py

+            without sorting. Otherwise, the profiler results will be sorted
+            by the this flag. This flag should be one of 'calls', 'total',
+            'max', 'min' or 'ave'.
+            The `calls` means sorting by the calling counter.


… profiling_py

chengduoZH · 2018-01-23T07:57:35Z

python/paddle/v2/fluid/tests/test_profiler.py

+        self.profiler('CPU')
+
+    def not_test_cuda_profiler(self):
+        self.profiler('GPU')


profiler does not be executed in the stage of unit tests, right?

Fixed. Thanks!

kuke

Maybe need a bit improvement in comments. Thx

kuke · 2018-01-23T08:22:15Z

python/paddle/v2/fluid/profiler.py

+    to add more records.
+
+    Args:
+        state (string) : The profiling state, It should be 'CPU' or 'GPU'.


-> The profiling state, which should be CPU or GPU, telling the profiler to use CPU timer or GPU timer for profiling.

kuke · 2018-01-23T08:31:57Z

python/paddle/v2/fluid/profiler.py

+
+    Args:
+        state (string) : The profiling state, It should be 'CPU' or 'GPU'.
+            Although users may define CPUPlace or CUDAPlace when using Fluid,


Although users may define CPUPlace or CUDAPlace when using Fluid, the profiler doesn't get the state based on this Place. Since the implementation is an independent part from the Fluid.
->
Although users may have already specified the execution place(CPUPlace/CUDAPlace) in the begining, for flexibility the profiler doesn't inherit this place.

kuke · 2018-01-23T11:34:30Z

LGTM

chengduoZH

LGTM

chengduoZH

LGTM

Refine profiler and expose to Python.

d2a7024

qingqing01 requested review from reyoung and kuke January 16, 2018 14:26

qingqing01 mentioned this pull request Jan 17, 2018

Upgrade pybind and refine pybind/protobuf.cc #7563

Closed

kuke reviewed Jan 17, 2018

View reviewed changes

chengduoZH reviewed Jan 17, 2018

View reviewed changes

qingqing01 added 2 commits January 22, 2018 17:00

Update comments and revert pybind11.

579449b

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

609ede2

… profiling_py

qingqing01 commented Jan 22, 2018

View reviewed changes

qingqing01 added 3 commits January 22, 2018 21:31

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

9baba9a

… profiling_py

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

eaabf2a

… profiling_py

Fix unit test bug in test_profiler.py.

05a733b

chengduoZH reviewed Jan 23, 2018

View reviewed changes

kuke reviewed Jan 23, 2018

View reviewed changes

Refine profiler code.

0358fd0

chengduoZH previously approved these changes Jan 23, 2018

View reviewed changes

Resolve conflicts.

f18016b

qingqing01 dismissed chengduoZH’s stale review via f18016b January 24, 2018 02:07

chengduoZH approved these changes Jan 24, 2018

View reviewed changes

qingqing01 merged commit 750299f into PaddlePaddle:develop Jan 24, 2018

qingqing01 deleted the profiling_py branch November 14, 2019 05:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refine profiler and expose to Python. #7576

Refine profiler and expose to Python. #7576

qingqing01 commented Jan 16, 2018 •

edited

Loading

qingqing01 commented Jan 17, 2018 •

edited by dzhwinter

Loading

kuke left a comment

kuke Jan 17, 2018

qingqing01 Jan 22, 2018

kuke Jan 17, 2018

qingqing01 Jan 22, 2018

kuke Jan 17, 2018

qingqing01 Jan 22, 2018

kuke Jan 17, 2018

qingqing01 Jan 22, 2018

chengduoZH Jan 17, 2018

qingqing01 Jan 17, 2018

chengduoZH Jan 17, 2018

qingqing01 Jan 22, 2018

qingqing01 left a comment

qingqing01 Jan 22, 2018

qingqing01 Jan 22, 2018

qingqing01 Jan 22, 2018

qingqing01 Jan 22, 2018

qingqing01 Jan 22, 2018

chengduoZH Jan 23, 2018

qingqing01 Jan 23, 2018

kuke left a comment

kuke Jan 23, 2018

qingqing01 Jan 23, 2018

kuke Jan 23, 2018 •

edited

Loading

qingqing01 Jan 23, 2018

kuke commented Jan 23, 2018

chengduoZH left a comment

chengduoZH left a comment

Refine profiler and expose to Python. #7576

Refine profiler and expose to Python. #7576

Conversation

qingqing01 commented Jan 16, 2018 • edited Loading

qingqing01 commented Jan 17, 2018 • edited by dzhwinter Loading

kuke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qingqing01 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuke Jan 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuke commented Jan 23, 2018

chengduoZH left a comment

Choose a reason for hiding this comment

chengduoZH left a comment

Choose a reason for hiding this comment

qingqing01 commented Jan 16, 2018 •

edited

Loading

qingqing01 commented Jan 17, 2018 •

edited by dzhwinter

Loading

kuke Jan 23, 2018 •

edited

Loading