-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nexmark q5 long running OOM due to unawareness of container memory limit #6615
Comments
Hi, does someone have ideas about the reason? |
I'll help to investigate it. 👀 |
One possible reason.
In EKS environment, the compute node is encapsulated in K8S pod. One VM(EC2) can have multiple pod. The resource is limited at pod level. But the API fetchs the VM system information. |
#6536 |
Is this the same as #6536? |
Related: #6536 Some ideas:
|
This should be fixed by #6536, close it now. If happen again, feel free to reopen or create another issue, thanks! |
Describe the bug
When running nexmark sql5, risingwave ComputeNode will OOM after about 30 mins.
To Reproduce
the bug emerges in EKS environment, but should also happen in local environment.
use nexmark-bench to generate data through Kafka.
Expected behavior
No response
Additional context
Based on metrics, the OOM happens on both of the two hashagg fragment(max and count).
Here's the nexmark q5
The text was updated successfully, but these errors were encountered: