-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REARCH] Container workers #15884
[REARCH] Container workers #15884
Conversation
cc @Fryguy @bdunne @gtanzillo @jrafanie @NickLaMuro Just a look at the work so far for Openshift based worker containers |
d978618
to
176d515
Compare
This pull request is not mergeable. Please rebase and repush. |
93b8ff8
to
e7ad45a
Compare
088f836
to
55d2d92
Compare
require_nested :Runner | ||
|
||
include PerEmsTypeWorkerMixin | ||
|
||
self.required_roles = ["ems_metrics_collector"] | ||
|
||
def self.supports_container? | ||
true | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of this now?
@@ -25,6 +25,7 @@ def ansible_secret_key | |||
def ansible_secret_key=(key) | |||
auth = authentication_for_type(ANSIBLE_SECRET_KEY_TYPE, "Ansible Secret Key") | |||
update_ansible_authentication(auth, :auth_key => key) | |||
key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary as any setter method (ending in =
) automatically returns the incoming param
$ irb
[1] pry(main)> def blah=(x)
[1] pry(main)* "HI"
[1] pry(main)* end
=> :blah=
[2] pry(main)> blah = "NOT HI"
=> "NOT HI"
lib/container_orchestrator.rb
Outdated
@@ -1,28 +1,84 @@ | |||
require 'kubeclient' | |||
|
|||
class ContainerOrchestrator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to namespace this under ManageIQ, so as not to confuse any code from the container providers
lib/container_orchestrator.rb
Outdated
yield(definition) if block_given? | ||
connection.create_deployment_config(definition) | ||
rescue KubeException => e | ||
raise unless e.message =~ /already exists/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can open this as an RFE on KubeClient where they deal with the "already exists" message and raise a different exception. Also, their exceptions should be namespaced (a second separate issue).
def create_deployment_config(name)
...
rescue KubeClient::AlreadyExistsException
# suppress
end
85b9f70
to
b72c986
Compare
This pull request is not mergeable. Please rebase and repush. |
dae0f95
to
fd35242
Compare
This pull request is not mergeable. Please rebase and repush. |
fd35242
to
ce8125c
Compare
29166d4
to
5052f7b
Compare
end | ||
|
||
def container_port | ||
3001 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be 3000?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, 3000 is exposed from the container and since the UI needs to serve assets as well as run the worker httpd listens on 3000 then points to 3001 for the worker.
end | ||
|
||
def container_image_name | ||
"manageiq/manageiq-ui-worker" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you be able to change the org name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we drop the "manageiq-" from the container name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to put all of this in settings like we did for the awx stuff, but I think I'm going to change that in a follow-up.
As for the name, I don't really have a preference. Thoughts @Fryguy ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carbonin and I discussed moving this into Settings, but I wanted to get this in first, and do that as an incremental improvement.
|
||
def prepare | ||
super | ||
MiqApache::Control.start if MiqEnvironment::Command.is_container? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was going away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
We need it to serve static assets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot about that 👍
app/models/miq_worker.rb
Outdated
end | ||
|
||
def self.create_worker_record(*params) | ||
w = init_worker_object(*params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could simplify to init_worker_object(*params).tap(&:save)
end | ||
|
||
def container_image_name | ||
"manageiq/manageiq-base-worker" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
end | ||
|
||
def worker_deployment_name | ||
@worker_deployment_name ||= begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol
|
||
# Can be overriden by including classes | ||
def container_image_name | ||
"manageiq/manageiq-webserver-worker" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
These types of things will probably not be needed when we use containers exclusively, but try to support both for now.
The idea is that each leaf worker class will include one of these concerns which will determine how it gets deployed. ReplicaPerWorker workers are more like cattle. If we need a new one we just add a new one; if we need one less we just delete one. We never need to know their identity. DeploymentPerWorker workers are more like pets, but instead of using StatefulSets, we are using separate deployments so that we can provide a particular environment to each of them. This will be used to give provider workers a provider specific queue name. ServiceWorker workers are like ReplicaPerWorker workers, but fronted by a service. As such they have readiness checks and must define the service's port.
OpenShift does not allow names over 63 characters
In this container, httpd will listen on port 3000 and will serve assets and some error documents as well as the UI
This will allow the worker (in the container) to create its own GUID. This is necessary for worker types with multiple replicas because we cannot tell an individual replica what its GUID is supposed to be. Also, we need the queue_name parameter for the deployment per worker model. In this case we want to be able to reference the queue name on the worker instance, but still don't want to save that instance to the database. We can't use create_worker_record as is, because run_single_worker.rb also needs a way to save the worker record when it's running in a container
Before this was failing because the worker instance didn't have any of the required values set. Things like the pid, server, and guid will be set by the worker when it comes up in the separate container and adds the worker row.
The container entrypoint will provide the contents of the EMS_IDS environment variable to run_single_worker.rb if EMS_IDS is set
aa2abc3
to
7f53a13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments. Mostly questions and refactoring suggestions, but nothing seemed incorrect from my un-educated perspective.
require_nested :Runner | ||
|
||
include PerEmsTypeWorkerMixin | ||
|
||
self.required_roles = ["ems_metrics_collector"] | ||
|
||
def self.supports_container? | ||
true | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fryguy 's comment here is a bit old, but this is on the same lines:
Is it possible that we can put this method into the MiqWorker::ReplicaPerWorker
modules? Or as a shared module that the new modules you have created here inherit from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we could probably move this into the MiqWorker
modules in some way rather than having it implemented explicitly in each worker.
Alternatively I could just flip the default implementation in miq_worker.rb
and override where we still don't want to start a worker as a container which might provide more useful information at this point. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either is fine, and it is probably simpler to only have it in one spot. Though, if you do invert it in miq_worker.rb
, I think maybe a disclaimer comment above the method might be worth while, just so those passing through know it is disabled on certain workers. With the other approach, since you would be adding it to things like MiqWorker::ReplicaPerWorker
and including it in the worker class, I think that would be more easily assumed.
end | ||
|
||
def self.start_worker(*params) | ||
w = create_worker_record(*params) | ||
w = containerized_worker? ? init_worker_object(*params) : create_worker_record(*params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think I see what your doing here, and why, but curious what this means for the future:
- To clarify, we are doing creating worker records because it is too hard to manage our DB records with the containers that are currently running on OpenShift?
- Does this mean scripts/tools like
evm:status
would have to be completely re-tooled to work when running in a container env? - Is the plan just to eventually drop these tables?
- Does this mess any other portions of the
MiqServer#monitor_loop
, now that there aren't DB records? (looks like you coveredstop_worker
, but curious if there are other pieces I am missing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, we're still going to have worker records, it's just that they need to be created by the worker container rather than by the server.
This is really because of the worker guid. We can't generate the guid here because we can't tell a replica what guid it should be using. The solution is to have the worker itself create the guid, and by extension the entire record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's just that they need to be created by the worker container rather than by the server.
Doh... I even wrote the code that does that and forgot... okay, guess that pretty much answers most of the questions then huh?
We can't generate the guid here because we can't tell a replica what guid it should be using.
Forgive my lacking of openshift knowledge, but I assume we can't even pass a replica a ENV var if we were to make a change to support that in the run_single_worker.rb
script as well? I assume no, but just thought I would check and make sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we can't even pass a replica a ENV var if we were to make a change to support that in the run_single_worker.rb script as well?
Correct, the environment for the container is defined at the deployment config level so each replica will have the same set of environment variables.
Checked commits carbonin/manageiq@7c2818b~...7f53a13 with ruby 2.3.3, rubocop 0.52.0, haml-lint 0.20.0, and yamllint 1.10.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, let's do separate PRs for
- moving the hardcoded images into Settings
- Getting rid of self.supports_container, or defaulting differently.
This will allow us to use the G-release architecture for the H-release container images without undoing all of the work in ManageIQ#15884 https://www.pivotaltracker.com/story/show/157774394
This will allow us to use the G-release architecture for the H-release container images without undoing all of the work in ManageIQ#15884 https://www.pivotaltracker.com/story/show/157774394
This will allow us to use the G-release architecture for the H-release container images without undoing all of the work in ManageIQ#15884 https://www.pivotaltracker.com/story/show/157774394
Run workers in OpenShift containers.
This change allows the server to start workers by talking to OpenShift and creating deployments and scaling them when needed.
This also stubs out some existing functionality which is process specific and doesn't apply to workers running in containers.