ARIMA: pre-allocation of temporary memory to reduce latencies #3895

Nyrio · 2021-05-25T16:14:53Z

This PR can speed up the evaluation of the log-likelihood in ARIMA by 5x for non-seasonal datasets (the impact is smaller for seasonal datasets). It achieves this by pre-allocating all the temporary memory only once instead of every iteration and providing all the pointers with a very low overhead thanks to a dedicated structure. Additionally, I removed some unnecessary copies.

Regarding the unnecessary synchronizations, I'll fix that later in a separate PR. Note that non-seasonal ARIMA performance is now even more limited by the python-side solver bottleneck:

One problem is that batched matrix operations are quite memory-hungry so I've duplicated or refactored some bits to avoid allocating extra memory there, but that leads to some duplication that I'm not entirely happy with. Both the ARIMA code and batched matrix prims are due some refactoring.

… and add alpha to doxygen

tfeher

Hi @Nyrio, thanks for this PR! I am halfway through the review, I will share what I have now.

Overall it looks good, I have mostly smaller comments. I see a potential issue with the lifetime management of the ARIMAMemory, but that can be easily fixed.

I hope to finish reviewing rest later today.

cpp/src_prims/linalg/batched/matrix.cuh

python/cuml/tsa/arima.pyx

tfeher

Hi Louis, I have finished my review. I just have a few additional questions, overall it looks great.

cpp/src_prims/sparse/batched/csr.cuh

cpp/src/arima/batched_arima.cu

cpp/src/arima/batched_kalman.cu

…emory

tfeher

Thanks Louis for the updates, the PR looks good to me!

Nyrio · 2021-05-28T17:44:29Z

@rapidsai/cuml-python-codeowners can you review this PR?
Also, there are CI failures that appear unrelated, is that a known issue?

dantegd · 2021-05-28T22:34:53Z

rerun tests

codecov-commenter · 2021-05-29T01:33:58Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.06@92484fb). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-21.06    #3895   +/-   ##
===============================================
  Coverage                ?   85.44%           
===============================================
  Files                   ?      226           
  Lines                   ?    17306           
  Branches                ?        0           
===============================================
  Hits                    ?    14787           
  Misses                  ?     2519           
  Partials                ?        0

Flag	Coverage Δ
dask	`48.90% <0.00%> (?)`
non-dask	`77.43% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 92484fb...70fb880. Read the comment docs.

dantegd · 2021-05-29T12:46:10Z

@Nyrio it is a bit hard to find, but from the log the error is a small doxygen issue:

Generating /workspace/cpp/include/cuml/tsa/arima_common.h:342: error: argument 'in_buff' of command @param is not found in the argument list of ML::ARIMAMemory< T, ALIGN >::ARIMAMemory(const ARIMAOrder &order, int batch_size, int n_obs, char *in_buf) (warning treated as error, aborting now)

dantegd · 2021-06-01T14:37:24Z

@gpucibot merge

…ai#3895) This PR can speed up the evaluation of the log-likelihood in ARIMA by 5x for non-seasonal datasets (the impact is smaller for seasonal datasets). It achieves this by pre-allocating all the temporary memory only once instead of every iteration and providing all the pointers with a very low overhead thanks to a dedicated structure. Additionally, I removed some unnecessary copies. ![arima_memory](https://user-images.githubusercontent.com/17441062/119530801-a44ff100-bd83-11eb-9278-3f9071521553.png) Regarding the unnecessary synchronizations, I'll fix that later in a separate PR. Note that non-seasonal ARIMA performance is now even more limited by the python-side solver bottleneck: ![optimizer_bottleneck](https://user-images.githubusercontent.com/17441062/119531952-b8e0b900-bd84-11eb-88cc-b58497b283fc.png) One problem is that batched matrix operations are quite memory-hungry so I've duplicated or refactored some bits to avoid allocating extra memory there, but that leads to some duplication that I'm not entirely happy with. Both the ARIMA code and batched matrix prims are due some refactoring. Authors: - Louis Sugy (https://github.com/Nyrio) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#3895

ARIMA memory structure for pre-allocation of temporary memory

852c171

Nyrio requested review from a team as code owners May 25, 2021 16:14

github-actions bot added CUDA/C++ Cython / Python Cython or Python issue labels May 25, 2021

Nyrio added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 25, 2021

Nyrio added 2 commits May 25, 2021 09:42

Update copyright year + remove unnecessary coefficient beta in b_kron…

8192afb

… and add alpha to doxygen

flake8 style fix

2dc415f

tfeher self-assigned this May 26, 2021

tfeher requested changes May 26, 2021

View reviewed changes

cpp/src_prims/sparse/batched/csr.cuh Outdated Show resolved Hide resolved

cpp/src/arima/batched_arima.cu Show resolved Hide resolved

cpp/src/arima/batched_kalman.cu Show resolved Hide resolved

cpp/src/arima/batched_kalman.cu Outdated Show resolved Hide resolved

Nyrio added 3 commits May 27, 2021 07:46

Code improvements after review

06c2488

C++ code improvements after review

f5ccc27

Merge remote-tracking branch 'official/branch-21.06' into enh-arima-m…

70fb880

…emory

Nyrio requested a review from tfeher May 28, 2021 11:56

Nyrio added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 3 - Ready for Review Ready for review by team labels May 28, 2021

tfeher approved these changes May 28, 2021

View reviewed changes

Nyrio unassigned tfeher May 28, 2021

dantegd added 4 - Waiting on Author Waiting for author to respond to review and removed 4 - Waiting on Reviewer Waiting for reviewer to review or respond labels Jun 1, 2021

Fixed typo in doxygen

184cadb

Nyrio added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 4 - Waiting on Author Waiting for author to respond to review labels Jun 1, 2021

dantegd approved these changes Jun 1, 2021

View reviewed changes

rapids-bot bot merged commit 94be76f into rapidsai:branch-21.06 Jun 1, 2021

This was referenced Jul 20, 2021

[ENH] Avoid unnecessary allocations and memory transfers in ARIMA #2233

Closed

ARIMA performance tracker #2912

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARIMA: pre-allocation of temporary memory to reduce latencies #3895

ARIMA: pre-allocation of temporary memory to reduce latencies #3895

Nyrio commented May 25, 2021

tfeher left a comment

tfeher left a comment

tfeher left a comment

Nyrio commented May 28, 2021

dantegd commented May 28, 2021

codecov-commenter commented May 29, 2021

dantegd commented May 29, 2021

dantegd commented Jun 1, 2021

ARIMA: pre-allocation of temporary memory to reduce latencies #3895

ARIMA: pre-allocation of temporary memory to reduce latencies #3895

Conversation

Nyrio commented May 25, 2021

tfeher left a comment

Choose a reason for hiding this comment

tfeher left a comment

Choose a reason for hiding this comment

tfeher left a comment

Choose a reason for hiding this comment

Nyrio commented May 28, 2021

dantegd commented May 28, 2021

codecov-commenter commented May 29, 2021

Codecov Report

dantegd commented May 29, 2021

dantegd commented Jun 1, 2021