Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

[NSE-465] POC release memory using GC #466

Merged
merged 2 commits into from
Sep 1, 2021

Conversation

zhztheplayer
Copy link
Collaborator

What changes were proposed in this pull request?

See #465

How was this patch tested?

Still in POC

@github-actions
Copy link

#465

@zhztheplayer
Copy link
Collaborator Author

Dependent Arrow branch oap-project/arrow#31

@zhztheplayer zhztheplayer marked this pull request as draft August 12, 2021 00:51
@zhztheplayer zhztheplayer force-pushed the NSE-465 branch 3 times, most recently from 410cbd1 to 87a70e4 Compare August 23, 2021 10:17
@zhztheplayer zhztheplayer force-pushed the NSE-465 branch 3 times, most recently from b119b5b to 55c5327 Compare August 25, 2021 03:41
@zhztheplayer zhztheplayer changed the title [NSE-465][WIP] POC release memory using GC [NSE-465]POC release memory using GC Sep 1, 2021
@zhztheplayer zhztheplayer marked this pull request as ready for review September 1, 2021 05:03
@zhztheplayer zhztheplayer changed the title [NSE-465]POC release memory using GC [NSE-465] POC release memory using GC Sep 1, 2021
@zhztheplayer zhztheplayer merged commit c9bdb00 into oap-project:master Sep 1, 2021
@zhztheplayer
Copy link
Collaborator Author

zhztheplayer commented Sep 1, 2021

To enable this feature in TPC-DS benchmark, set spark.oap.sql.columnar.autorelease=true, then set -XX:MaxDirectMemorySize=Xg to all executors.

To determine X:

X = 576 / {executor instances} * ({scale factor} / 1000)

For example, we have 18 executor instances, to run TPC-DS SF 1500, then

X = 576 / 16 * 1500 / 1000 = 54.

Then set -XX:MaxDirectMemorySize=54g

If we don't have enough memory, replace 576 with a smaller value, at least 288.

zhouyuan added a commit that referenced this pull request Sep 7, 2021
* Use the SparkMemoryUtil API to allocate the direct buffer and code clean

* code refine

* clang format

* remove the close() call for record batch

* handling the data column by column in c++ side

* code refine and remove the delete logic

* support timestamp type

* fix failed UT

* [NSE-465] POC release memory using GC (#466)

* [NSE-465] Provide option to rely on JVM GC to release Arrow buffers in Java

* optimize arrow columnartorow in the for loop and use the DirectBuffer to allocate and free memory

* add the binary type support and add the failed UTs in the failed ut list

* fix the wrong hash code of the UnsafeRow

* fix NPE issue

* clang format

Co-authored-by: Hongze Zhang <hongze.zhang@intel.com>
Co-authored-by: Yuan <yuan.zhou@intel.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants