-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Would it be useful to use rate of swapin as a threshold? #34
Comments
Interesting, I have never run Linux with overcommit off. Why do you disable it? To prevent having random processes being killed when running out of memory? |
Yeah, that's right. Postgres will roll back transactions and deliver an error message to the user. But requires allocations to fail, which in Linux virtually requires overcommit being off. |
Ok I see. But if you run earlyoom, won't that you back to the situation that process are randomly killed? |
On Wed, Oct 25, 2017 at 1:46 PM rfjakob ***@***.***> wrote:
Ok I see. But if you run earlyoom, won't that you back to the situation
that process are randomly killed?
The error message will be bad (the protocol will report: "terminated by
administrator command" or something like this), but it won't crash the
entire server. Linux's OOM is much more severe than a SIGTERM, causing
every connection to crash and crash recovery to start.
|
Just had to check myself, earlyoom also sends SIGKILL, I'm afraid it would be about the same as the kernel oom killer ( https://github.com/rfjakob/earlyoom/blob/master/main.c#L208 ). But could be modified to send SIGTERM without too much trouble. Swap statistics are available through /proc/vmstat. We'd have to do our own averaging but that should work.
Have you monitored if the swap rate is a good indicator for things going south on your machines? Like letting
run while imposing heavy load? |
On Thu, Oct 26, 2017 at 7:19 AM rfjakob ***@***.***> wrote:
Just had to check myself, earlyoom also sends SIGKILL, I'm afraid it would
be about the same as the kernel oom killer (
https://github.com/rfjakob/earlyoom/blob/master/main.c#L208 ). But could
be modified to send SIGTERM without too much trouble.
That's interesting. I'm surprised SIGTERM isn't the default. One of the big
advantages earlyoom has is the system can still find memory to allow
processes to do clean-up.
Swap statistics are available through /proc/vmstat. We'd have to do our
own averaging but that should work.
$ cat /proc/vmstat | grep swp
pswpin 15886
pswpout 99679
Have you monitored if the swap rate is a good indicator for things going
south on your machines? Like letting
$ vmstat 1
run while imposing heavy load?
Not yet. On EC2 for real-time workloads, I am going to guess many people
run with swap disabled (indeed, the new "r4" line, popular for databases,
doesn't even have local disk to use for swap). Thus, I haven't been
gathering my own numbers for pathological swapping.
|
I'll close this for now. |
Hello,
In addition to the absolute amount of memory swapped out, would it make sense to use the quantity of swap-in or the amount of time spent waiting on swap-in (if available) as a trigger for kill?
My use case are servers that have overcommit off, which means they are very conservative on how much memory can be committed, and extent of concurrency of multiple processes winds up penalized. When one considers simple Python, Go, or even C programs can allocate hundreds of megabytes of virtual memory that they never use (
cat
does this for me, in fact), this is a meaningful problem.Thus, the amount of swap I may wish to allocate be somewhat large.
However, these servers have soft-real-time performance requirements. It's not desirable to actually let memory be swapped in. It would be better to start killing things if swap-in pressure gets beyond the residual.
Thoughts?
The text was updated successfully, but these errors were encountered: