Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-8685][VL] Add null check to avoid core dump when rss push partition data size is large #8686

Merged
merged 3 commits into from
Feb 10, 2025

Conversation

zjuwangg
Copy link
Contributor

@zjuwangg zjuwangg commented Feb 7, 2025

What changes were proposed in this pull request?

When JavaRssClient push partition data once exceeds Int.MAX_VALUE or when executor memory is in a pressure situation, It's possible that https://github.com/apache/incubator-gluten/blob/main/cpp/core/jni/JniCommon.h#L500
array_ = env->NewByteArray(size); will return a null ptr, due to int64_t cast to a int32_t or there no enough memory, and following logic will cause a unexpected core dump as described in #8685 .

This PR split the bytes into serval chunk if the bytes length is very large and add the check logic to make sure throwing exception when get a null ptr array!

(Fixes: #8685)

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

manual tests in my own local environment.

@github-actions github-actions bot added the VELOX label Feb 7, 2025
Copy link

github-actions bot commented Feb 7, 2025

#8685

@zjuwangg zjuwangg marked this pull request as draft February 8, 2025 06:50
@zjuwangg
Copy link
Contributor Author

zjuwangg commented Feb 8, 2025

Thanks @jackylee-ch review. I just get a information that the bytes passed to pushPartitionData method can not be split into multiple blocks, which depends on the underlying rss client implementation.

So I planed here just add a null check here to avoid unexpected core dump!

@zjuwangg zjuwangg marked this pull request as ready for review February 8, 2025 07:33
@zjuwangg zjuwangg changed the title [GLUTEN-8685][VL]Fix coredump when rss push partition data size once exceeds int MAX_VALUE [GLUTEN-8685][VL]Add null check to avoid core dump when rss push partition data size is large Feb 8, 2025
@zhouyuan
Copy link
Contributor

CC: @kerwin-zk

@kerwin-zk
Copy link
Contributor

LGTM. Thanks! I just re-ran the CI, waiting for the CI to finish.

@kerwin-zk kerwin-zk changed the title [GLUTEN-8685][VL]Add null check to avoid core dump when rss push partition data size is large [GLUTEN-8685][VL] Add null check to avoid core dump when rss push partition data size is large Feb 10, 2025
@kerwin-zk kerwin-zk merged commit 4b741cf into apache:main Feb 10, 2025
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[VL] Gluten + rss shuffle(celeborn) encounter core dump in some situations
4 participants