Performance in batch mode #21

VolkerH · 2019-03-14T06:22:46Z

For this particular dataset, the naive approach takes about 1s per image including read/write if I use this naive approach. I can see the GPU utilization going up and down as well.

In my commandline batch tool processing that same dataset is almost a factor of 4 slower:

That code does an additional affine transform and MIP but that should not make a significant difference. Maybe the overhead is due to passing around partially evaluated functions.

VolkerH · 2019-03-14T22:19:55Z

partial function evaluation indeed incurs a performance penalty
https://stackoverflow.com/questions/17388438/python-functools-partial-efficiency
Still the slowdown is almost a factor of 4 which is difficult to explain just by function call overhead.

VolkerH · 2019-03-15T03:56:29Z

Profiling with cProfile. This is for deconvolving 100 volumes with 10 iterations each. Actual deconvolution is about 0.75 s / frame. The next thing is a pyopencl call (probably related to deskew/rotate) taking nearly 0.4s. np.astype takes up a considerable amount of processing time as do reading and writing. Some of this stuff could probably be parallelized.

Fri Mar 15 11:57:54 2019    process_stats

         2275854 function calls (2263132 primitive calls) in 347.478 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100   75.846    0.758   75.846    0.758 {built-in method _pywrap_tensorflow_internal.TF_SessionRun_wrapper}
      101   39.725    0.393   39.725    0.393 {built-in method pyopencl._cl._enqueue_read_buffer}
      227   36.042    0.159   36.042    0.159 {built-in method numpy.core.multiarray.concatenate}
      922   34.942    0.038   34.942    0.038 {method 'astype' of 'numpy.ndarray' objects}
      101   32.304    0.320   32.305    0.320 /home/vhil0002/anaconda3/envs/newllsm/lib/python3.6/site-packages/pyopencl/__init__.py:872(image_init)
    10252   20.804    0.002   20.804    0.002 {method 'readinto' of '_io.BufferedReader' objects}
      101   20.215    0.200   20.215    0.200 {method 'tofile' of 'numpy.ndarray' objects}
      217   19.790    0.091   19.790    0.091 {built-in method numpy.core.multiarray.copyto}
      101   13.144    0.130   13.144    0.130 {method 'clip' of 'numpy.ndarray' objects}
      101   12.960    0.128   12.960    0.128 {built-in method pyopencl._cl.enqueue_nd_range_kernel}
      100    8.832    0.088  343.422    3.434 /home/vhil0002/anaconda3/envs/newllsm/lib/python3.6/site-packages/l

have run similar profiling for gputools RL deconv. There, most of the time is spent in np.astype

VolkerH · 2019-03-15T07:15:26Z

some more comments about gputools deconvolve (seperate from flowdec). There are some obvious improvements that can be made, e.g. an FFT plan is calculated in the gputools implementation but never used. The PSF is pre-processed and sent to the GPU each time the deconvolution is called. Separating the deconvolution into an init and a run step would allow for processing the PSF once and leaving it on the GPU.

TODO: check whether flowdec sends the PSF to the CPU each time (I believe it does). Maybe that can be optimized as well.

VolkerH · 2019-03-15T13:13:40Z

rewrote maweigerts gputools based convolution to reuse fft-plan, processed psf (remains in GPU ram) and temporary gpu-buffers. Removed unnecessary duplicate .astype(np.complex64).
See https://github.com/VolkerH/Lattice_Lightsheet_Deskew_Deconv/blob/benchmarking/lls_dd/deconv_gputools_rewrite.py

Major speed improvement. Actual deconvolution much faster than time per iteration. Will have to read/write from disk in seperate threads.

VolkerH · 2019-03-15T13:17:02Z

Cprofile stats for the above three runs:
gputools rewrite:

gputools:

flowdec:

VolkerH mentioned this issue Mar 15, 2019

performance improvements for deconv_rl.py maweigert/gputools#17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance in batch mode #21

Performance in batch mode #21

VolkerH commented Mar 14, 2019 •

edited

Loading

VolkerH commented Mar 14, 2019 •

edited

Loading

VolkerH commented Mar 15, 2019

VolkerH commented Mar 15, 2019

VolkerH commented Mar 15, 2019

VolkerH commented Mar 15, 2019

Performance in batch mode #21

Performance in batch mode #21

Comments

VolkerH commented Mar 14, 2019 • edited Loading

VolkerH commented Mar 14, 2019 • edited Loading

VolkerH commented Mar 15, 2019

VolkerH commented Mar 15, 2019

VolkerH commented Mar 15, 2019

VolkerH commented Mar 15, 2019

VolkerH commented Mar 14, 2019 •

edited

Loading

VolkerH commented Mar 14, 2019 •

edited

Loading