-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assumptions about queue message for a server are broken in pods #19957
Comments
So I think I like this approach in general, I am just wondering if it makes sense to proactively look at more ways to increase the boot speed for Though, speaking specifically about using this approach specifically for Anyway, just the first thing that popped into my head after reading that. |
I think the assumption is that In my mind I think we need a MonitoredProcess abstraction which then can have different backends for pods (Kube Jobs) and appliances (Processes) and then callers can just go through that abstraction. Haven't fully thought it through though, and I'm not sure how we'd handle things like the smart state cases. |
I feel like that is still missing a step, since you need the repo checked out on the same container/volume/whatever, and that isn't part of the CLI call to That said, I am unsure the capabilities of "Kube Jobs", and if they allow passing a file system of sorts along to the job to be executed, so my point might be moot if that is the case. |
Yeah, at the very start of development of |
Currently when running ManageIQ in pods from https://github.com/ManageIQ/manageiq-pods workers associated with the same "server" don't actually share a process space or filesystem as they are running in different containers.
This upends the assumption that process or filesystem level shared resources will be accessible to all workers on the same server. We use that assumption when we queue messages targeted for a specific server.
An example of this is the
AnsibleRunnerWorkflow
class. This job queues each of its states to avoid tying up a worker as the ansible playbook runs, but those queue messages are pinned to a specific server, not to a particular worker. This means that when running workers as separate containers we could clone the playbook repo on one container then try to run a playbook from that repo in a separate one. In the course of a single playbook run at least 3 messages are sent over the queue between cloning the repo and cleaning it up. It's very unlikely that all three messages get picked up by the same worker that queued the first one, so it's fair to consider this feature completely broken.I'm sure there are other examples of this kind of behavior, but I don't know where they might be.
It seems like this might be an issue for smartstate if we are mounting something or trying to copy data to/from the local disk. Maybe V2V could also run into issues?
The best idea I have currently for this is to add a worker guid to queue messages if we need the message to be processed by a particular worker instance and use
deliver_on
to ensure the worker waits before consuming the message (think thread context switching). I don't like the idea of adding to the queue though so this may not be the best way forward.From a pods-first viewpoint, I would like to spin up
Job
instances for this kind of work, then poll for the status of the job. It would be that separate pod's responsibility to take care of most of what we're doing in the workflow class. That way we don't have to worry about separate workers trying to monitor a process. I'm not sure what we would implement on an appliance that would fulfill the same use-case though.I made a quick fix to the
AnsibleRunnerWorkflow
case in #19956, but this bug report is really about the larger issue so we can get input about other features that might be affected.cc @Fryguy @chessbyte @NickLaMuro
The text was updated successfully, but these errors were encountered: