Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eviction strategy can be aggressive sometimes unexpectedly #8065

Closed
Tracked by #7289
lmatz opened this issue Feb 20, 2023 · 7 comments
Closed
Tracked by #7289

eviction strategy can be aggressive sometimes unexpectedly #8065

lmatz opened this issue Feb 20, 2023 · 7 comments
Labels
priority/high type/enhancement Improvements to existing implementation. type/perf
Milestone

Comments

@lmatz
Copy link
Contributor

lmatz commented Feb 20, 2023

SCR-20230220-ukr
image

Slack history:
https://risingwave-labs.slack.com/archives/C03GTKX3C8G/p1676901892312759
https://risingwave-labs.slack.com/archives/C03GTKX3C8G/p1676957678684269

You can see the memory starts growing at about 00:53, and that is when watermark start to grow to evict cache. But the memory keeps growing until just pass 00:55. So the memory eviction start to be aggressive to prevent OOM. The memory drop to 0 because the watermark hit the upper bound (current epoch), so it evicts all cache.
by @yuhao-su

@github-actions github-actions bot added this to the release-0.1.18 milestone Feb 20, 2023
@lmatz lmatz added type/enhancement Improvements to existing implementation. type/perf labels Feb 20, 2023
@lmatz
Copy link
Contributor Author

lmatz commented Feb 22, 2023

It OOMed when The memory allocated by Jemalloc hits 0

@yuhao-su
Copy link
Contributor

It OOMed when The memory allocated by Jemalloc hits 0

Is it possible the memory drop to 0 because it got killed?

@lmatz
Copy link
Contributor Author

lmatz commented Feb 22, 2023

It OOMed when The memory allocated by Jemalloc hits 0

Is it possible the memory drop to 0 because it got killed?

I think so, it OOMed first, i.e. use more memory resources than it requested from the risingwave-operator, and then it gets killed I suppose

@lmatz
Copy link
Contributor Author

lmatz commented Feb 22, 2023

I mean kube-bench never voluntarily kills a pod if the pod does noting wrong. cc @huangjw806
And this phenomenon almost(I manually checked 3-4 Grafana dashboards in the recent month) always happens in every performance testing pipeline every day

@huangjw806
Copy link
Contributor

I mean kube-bench never voluntarily kills a pod if the pod does noting wrong.

During the test, kube-bench will not actively kill the risingwave pod. It may be restarted by k8s only if it encounters issues.

@lmatz
Copy link
Contributor Author

lmatz commented Feb 22, 2023

OOM may be caused by #8125

@lmatz
Copy link
Contributor Author

lmatz commented Mar 15, 2023

Reopen if it appears again

@lmatz lmatz closed this as completed Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/high type/enhancement Improvements to existing implementation. type/perf
Projects
None yet
Development

No branches or pull requests

3 participants