-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build post-commit container names are not unique #8266
Comments
(by stopped my system, i mean cleaned out etcd/etc. hence the reason my build sequence id went back to "1" and caused the conflict) |
reducing to p3 because i think i hit this via a timing window (I killed my setup while the container was running, whereas normally the code would have removed the container after it finished). so it's less likely someone hits this under normal circumstances. |
I suspect the "clean up" left some dirty state? It is possible that if you interrupt the server while a build hook is running, the hook container will not be cleaned up, leading to the situation described in the issue. Perhaps now our clean up scripts should remove not only And we should also fix openshift/source-to-image#433 to "namespace" containers created by S2I, and remove them in the clean up script. Here's the original discussion where it was decided that we don't want the random suffix: #6715 (comment) |
right, that's what i was implying in my follow up comment.
perhaps, but i still think we should be using unique names. k8s/openshift do not have this problem for other containers they launch (you don't have to clean up dead containers to rerun something).
+1
thanks, it's a fair point i suppose, but if you've got two concurrent builds running w/ the same name, running two instances of the post-commit image seems like the least of your worries (they are unlikely to intterfere with each other anyway). Not to mention it could happen if the two builds were running on distinct nodes anyway, so using the same name doesn't actually protect you from all that much. @smarterclayton ? |
I would expect the post-commit container names to be correlated to the On Tue, Mar 29, 2016 at 8:50 AM, Ben Parees notifications@github.com
|
Looks like we have at least two ways out here:
|
personally i like the random suffix
|
(and we already have pruning to take care of old containers in case we accidentally leave one around...though i'm not sure if pruning applies to these containers?) |
Most likely it does not apply, since nothing outside of the builder knows about it... |
right, i'm not sure how pruning identifies containers for pruning. I actually think it may prune any stopped containers not associated w/ a pod, in which case these would qualify. If not, our assemble containers would also not be subject to pruning, which is a bigger problem. @ncdc can you comment on how containers are identified for pruning? |
Containers are garbage collected periodically. The default settings are to keep at most 100 dead containers across all pods, and at most 2 dead containers per pod. It won't start pruning them until you've reached these thresholds. |
@ncdc but how are the containers identified? By the prefix "k8s_" in the name? |
Please don't add random suffixes. If two builds are running at the same On Wed, Mar 30, 2016 at 10:26 AM, Andy Goldstein notifications@github.com
|
Looks like the first listing comes from So, @bparees, all the containers we're starting with "fluffy_panda" 🐼 names or other prefixes won't be taken into account by the pruning process 😢 |
@rhcarvalho it does look at other containers - those are the |
that means if i delete my buildconfig and recreate it, i'll never be able to run the build on that node again because the name will conflict, unless we explicitly remove containers with conflicting names before we start a new container. Also, relying on the build
seems like a strange way to catch duplicated builds. if duplicated builds are happening, we need a better way to prevent that.... and we do, it's the name of the build object itself which has to be unique. (I don't know how our assemble container is named today, i think it's always unique, it should be updated to match the naming convention used by the post-commit container, whatever we decide) |
If you give the post commit container a deterministic name, and you come in ALL of our container names should be deterministic, period, full stop. On Wed, Mar 30, 2016 at 10:39 AM, Ben Parees notifications@github.com
|
these are orthogonal statements. If things are working correctly, then the reentrant "clean up when we detect a collision" logic would never get triggered anyway, so that's not a useful way to deal with the dead container cleanup problem. Dead containers should be handled via pruning mechanisms, just like the rest of k8s does today. (whether that is actually true or not is something we're currently trying to determine...) Which means the names can be as unique as we want. |
It is always unique, but completely arbitrary, generated by Docker, see openshift/source-to-image#395 |
So i feel like the summary here is:
Note that none of this is a huge problem today because we rm the container when we're done with it, so it only gets left around if the system gets killed in the middle of a build/post-commit operation. |
Which can happen today pretty easily just by a) deleting a build or b) On Wed, Mar 30, 2016 at 10:54 AM, Ben Parees notifications@github.com
|
Basically, if you're going to use a random suffix, you still need to define On Wed, Mar 30, 2016 at 11:04 AM, Clayton Coleman ccoleman@redhat.com
|
They are included |
Oops, they aren't included. Misread the code. |
still an orthogonal issue. not using a random suffix solves very little of the cleanup problem. almost none of it. cleanup needs to be handled explicitly (as you say, either by us using a common prefix and adding logic, or by updating the k8s logic to include our containers). |
(or by rearchitecting the entire build system to use managed pods instead of manual container launches) |
Sure - the issues then are that a) we don't guarantee that we clean up On Wed, Mar 30, 2016 at 11:29 AM, Ben Parees notifications@github.com
|
we do prevent simultaneous execution. you cannot create two build objects w/ the same name. |
@smarterclayton and I chatted offline, path forward is:
|
+1, #8316 and openshift/source-to-image#395 (working on a fix)
+1, kubernetes/kubernetes#23640
#8306 and openshift/source-to-image#452 This makes code more complicated and doesn't always guarantee the container gets removed. We really need a way to hook into the platform pruning mechanisms (2), or the long term goal of having better platform support so that builds don't start their own unmanaged containers. |
Every container on the system will ALWAYS receive SIGTERM, get at least 2 On Thu, Mar 31, 2016 at 8:36 AM, Rodolfo Carvalho notifications@github.com
|
I ran a build, stopped my system, started it again, ran the build again, so now i get this error:
users shouldn't have to clean up all their dead containers to be able to use this function. I think we need to append a unique suffix to the container name, if we're going to explicitly name them.
@mfojtik @rhcarvalho
The text was updated successfully, but these errors were encountered: