-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TPC-DS causes OOM #594
Comments
And then the driver hangs with the following:
|
logs from one of the executor
|
Spark version is |
@vaibhawvipul I am curious if you see the same issue if you disable Comet shuffle? |
I can try #600 once its merged with main. The exception raised which led to JNI error is marked as WARNING, not sure it is the root cause, seems like it happens when an executor wants to get some intermediate result during a shuffle and the remote executor was already killed due to |
Shouldn't it be when enabling Comet shuffle? Because I am able to run the test I disable the Comet shuffle. I am using kubernetes, maybe if we have access to a YARN based cluster we can try to run it with and without comet shuffle? |
Describe the bug
initially started with a 3TB dataset, which i then scalled to 200GB. This is the driver and executor config on my end.
java options
Comet configurations are as described in the benchmark section website.
Running this with 40 executors, and observe some OOM, which is intriguing because the dataset is small.
Steps to reproduce
No response
Expected behavior
No OOM
Additional context
The text was updated successfully, but these errors were encountered: