-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up allocs that previously failed cleanup #1497
Comments
I've also seen Nomad version0.4.1 Operating system and Environment detailsBoth Ubuntu 14.04 and 16.04 Nomad Client logs (if appropriate)
|
Have you all upped the file descriptor limit for the Nomad process? |
@dadgar we have max file descriptors we are still seeing it as 0.4.2. We are trying to get up to 0.5.2 to check. In your opinion this should be fixed in 0.5.2? |
@jshaw86 0.5.3 should help as it does client side garbage collection |
@dadgar For us this is #1 priority as we keep having to manually rotate nomad clients and clean up disk. So will we see no benefit if we upgrade to 0.5.2 right now from 0.4.2? It seems nomad is exhausting FD's when it tries to GC or something like that again still on 0.4.2 we are trying to expedite a 0.5.2 upgrade but due to the breaking job spec change it is pretty difficult. My current running theory is nomad tries to clean up too many files at once, the GC process crashes, loses track of the allocs that were marked for cleanup, those allocs remain on the FS forever while nomad still provisions new chroots creating more inodes and eventually crashing the systems. We also have correlation through some monitoring to when a nomad client reports "too many open files" and i-node exhaustion. FD exhaustion logs below.
|
@jshaw86 I would upgrade to Nomad 0.5.3. It does garbage collection client side which should avoid the thunderherd of GCs that causes so many FDs to be opened. |
@dadgar OK we will try to get to 0.5.3 ASAP. I was also able to catch this thundering heard on one of the boxes and have a detailed syslog that I attached below if you just want to validate this is the same issue. |
@jshaw86 Thanks! Let us know how 0.5.3 works out for GCing. Btw please do not do an in-place update of Nomad on the clients. |
Is that because of this #2256? If so, can you please update https://www.nomadproject.io/docs/upgrade/upgrade-specific.html? |
Doing some issue cleanup; this one was closed by the addition of GC. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Sometimes allocs are failing to clean up due to too many open files. These allocs then get orphaned by nomad and the allocs have to be cleaned by hand. See below logs:
What would be nice is if the orphaned allocs were not orphaned and nomad still cleaned them up.
The text was updated successfully, but these errors were encountered: