Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARIMA: pre-allocation of temporary memory to reduce latencies #3895

Merged
merged 7 commits into from
Jun 1, 2021

Conversation

Nyrio
Copy link
Contributor

@Nyrio Nyrio commented May 25, 2021

This PR can speed up the evaluation of the log-likelihood in ARIMA by 5x for non-seasonal datasets (the impact is smaller for seasonal datasets). It achieves this by pre-allocating all the temporary memory only once instead of every iteration and providing all the pointers with a very low overhead thanks to a dedicated structure. Additionally, I removed some unnecessary copies.

arima_memory

Regarding the unnecessary synchronizations, I'll fix that later in a separate PR. Note that non-seasonal ARIMA performance is now even more limited by the python-side solver bottleneck:

optimizer_bottleneck

One problem is that batched matrix operations are quite memory-hungry so I've duplicated or refactored some bits to avoid allocating extra memory there, but that leads to some duplication that I'm not entirely happy with. Both the ARIMA code and batched matrix prims are due some refactoring.

@Nyrio Nyrio requested review from a team as code owners May 25, 2021 16:14
@github-actions github-actions bot added CUDA/C++ Cython / Python Cython or Python issue labels May 25, 2021
@Nyrio Nyrio added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 25, 2021
@tfeher tfeher self-assigned this May 26, 2021
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Nyrio, thanks for this PR! I am halfway through the review, I will share what I have now.

Overall it looks good, I have mostly smaller comments. I see a potential issue with the lifetime management of the ARIMAMemory, but that can be easily fixed.

I hope to finish reviewing rest later today.

cpp/src_prims/linalg/batched/matrix.cuh Outdated Show resolved Hide resolved
cpp/src_prims/linalg/batched/matrix.cuh Outdated Show resolved Hide resolved
cpp/src_prims/linalg/batched/matrix.cuh Show resolved Hide resolved
cpp/src_prims/linalg/batched/matrix.cuh Show resolved Hide resolved
cpp/src_prims/linalg/batched/matrix.cuh Outdated Show resolved Hide resolved
python/cuml/tsa/arima.pyx Show resolved Hide resolved
python/cuml/tsa/arima.pyx Show resolved Hide resolved
python/cuml/tsa/arima.pyx Show resolved Hide resolved
python/cuml/tsa/arima.pyx Show resolved Hide resolved
python/cuml/tsa/arima.pyx Show resolved Hide resolved
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Louis, I have finished my review. I just have a few additional questions, overall it looks great.

cpp/src_prims/sparse/batched/csr.cuh Outdated Show resolved Hide resolved
cpp/src/arima/batched_arima.cu Show resolved Hide resolved
cpp/src/arima/batched_kalman.cu Show resolved Hide resolved
cpp/src/arima/batched_kalman.cu Outdated Show resolved Hide resolved
@Nyrio Nyrio requested a review from tfeher May 28, 2021 11:56
@Nyrio Nyrio added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 3 - Ready for Review Ready for review by team labels May 28, 2021
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Louis for the updates, the PR looks good to me!

@Nyrio
Copy link
Contributor Author

Nyrio commented May 28, 2021

@rapidsai/cuml-python-codeowners can you review this PR?
Also, there are CI failures that appear unrelated, is that a known issue?

@dantegd
Copy link
Member

dantegd commented May 28, 2021

rerun tests

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.06@92484fb). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.06    #3895   +/-   ##
===============================================
  Coverage                ?   85.44%           
===============================================
  Files                   ?      226           
  Lines                   ?    17306           
  Branches                ?        0           
===============================================
  Hits                    ?    14787           
  Misses                  ?     2519           
  Partials                ?        0           
Flag Coverage Δ
dask 48.90% <0.00%> (?)
non-dask 77.43% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 92484fb...70fb880. Read the comment docs.

@dantegd
Copy link
Member

dantegd commented May 29, 2021

@Nyrio it is a bit hard to find, but from the log the error is a small doxygen issue:

Generating /workspace/cpp/include/cuml/tsa/arima_common.h:342: error: argument 'in_buff' of command @param is not found in the argument list of ML::ARIMAMemory< T, ALIGN >::ARIMAMemory(const ARIMAOrder &order, int batch_size, int n_obs, char *in_buf) (warning treated as error, aborting now)

@dantegd dantegd added 4 - Waiting on Author Waiting for author to respond to review and removed 4 - Waiting on Reviewer Waiting for reviewer to review or respond labels Jun 1, 2021
@Nyrio Nyrio added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 4 - Waiting on Author Waiting for author to respond to review labels Jun 1, 2021
@dantegd
Copy link
Member

dantegd commented Jun 1, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 94be76f into rapidsai:branch-21.06 Jun 1, 2021
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
…ai#3895)

This PR can speed up the evaluation of the log-likelihood in ARIMA by 5x for non-seasonal datasets (the impact is smaller for seasonal datasets). It achieves this by pre-allocating all the temporary memory only once instead of every iteration and providing all the pointers with a very low overhead thanks to a dedicated structure. Additionally, I removed some unnecessary copies.

![arima_memory](https://user-images.githubusercontent.com/17441062/119530801-a44ff100-bd83-11eb-9278-3f9071521553.png)

Regarding the unnecessary synchronizations, I'll fix that later in a separate PR. Note that non-seasonal ARIMA performance is now even more limited by the python-side solver bottleneck:

![optimizer_bottleneck](https://user-images.githubusercontent.com/17441062/119531952-b8e0b900-bd84-11eb-88cc-b58497b283fc.png)

One problem is that batched matrix operations are quite memory-hungry so I've duplicated or refactored some bits to avoid allocating extra memory there, but that leads to some duplication that I'm not entirely happy with. Both the ARIMA code and batched matrix prims are due some refactoring.

Authors:
  - Louis Sugy (https://github.com/Nyrio)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#3895
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Reviewer Waiting for reviewer to review or respond CUDA/C++ Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants