-
Notifications
You must be signed in to change notification settings - Fork 15
[RFC] EmbeddedAnsible with ansible-runner-based implementation #45
Comments
Do we want to run Right now I don't think we'll ever fetch the contents of the I believe that AWX would do this, but we can try to confirm that. |
|
Using this comment as an excuse to get myself tagged as a participant on this issue, but a list of the currently in flight and completed PRs for this effort can be found with this pull request search: Probably wouldn't hurt adding the |
ManageIQ/manageiq-appliance#240 handles making the roles provided by plugins available for playbook runs. It doesn't actually have anything to do with the initial git repo we create, that's for playbooks and we don't actually provide any of those currently. So we could solve this one:
by removing the consolidated repo and leaving the roles .... |
Opened ManageIQ/manageiq#19056 to remove the default consolidated playbook repo thing. |
So based on https://github.com/ansible/awx/blob/128fa8947ac620add275a15cb07577178745a849/awx/playbooks/project_update.yml#L141-L165 it looks like pulling the roles down from ansible galaxy was a part of the project update process in AWX. That said, I'm not sure when/how we should do this. They kept the whole project repo on disk which meant that they could install roles into that directory directly. This, I assume, lead to things like the "clean" option which would remove the role files. Since we're using bare repos and since the playbook lives somewhere other than where we're executing ansible-runner this becomes a bit more difficult. My first thought was to run it from |
@carbonin I saw this yesterday, but I was going to dig into it a bit more this morning since I do have to understand the playbook specifics that you linked (but also the parts around it that setup some of the conditionals), so I will get back to you on that. Though, I don't know that we need to make this a playbook like they did, in fact, I think using I think if I am reading that playbook right, they are always doing a Anyway, I think the plan of doing a check for a |
Updated the OP with Fedora/CentOS instructions. |
Created https://bugzilla.redhat.com/show_bug.cgi?id=1737149 to track the issue of ansible.cfg files included in repos. Originally raised here ManageIQ/manageiq#19079 (comment) |
Completed in ManageIQ/manageiq#18687 and subsequent PRs |
Architecture
General approach
The current AWX implementation work by creating a provider that talks to an AWX instance, and uses the provider refresh to pull data into the database. CRUD operations on AWX objects go through the provider API, where the object is created in AWX, and then brought in via EMS refresh. After that callers use the ManageIQ models to do whatever they need to with the data.
As such, all of the ManageIQ callers use the provider API as an abstraction layer, and we can take advantage of that. Instead of have provider CRUD operations go to a provider, we can instead write the data directly into the database tables as if a "refresh" had occurred immediately.
Repositories
A repository is created as a
ManageIQ::Providers::EmbeddedAnsible::AutomationManager::ConfigurationScriptSource
(< ConfigurationScriptSource
). For the implementation in this PR, the git repos are cloned intoRails.root.join("tmp/git_repos/:id")
. This works great for single appliance, but will not work as well for federated appliances, nor appliances that can't access the internet directly. As such a different design is needed, which is below in the git repo management section.Once the repository is cloned, then the playbooks are each synced as a
ManageIQ::Providers::EmbeddedAnsible::AutomationManager::Playbook
(< ConfigurationScriptPayload < ConfigurationScriptBase
(table nameconfiguration_scripts
). In this PR I've also pulled in the "name" attribute as the playbook description, though I'm not sure if this is correct or not.Service Template
When designing a service, the service template is saved as a
ManageIQ::Providers::EmbeddedAnsible::AutomationManager::ConfigurationScript
which is a subclass ofConfigurationScript
, which is a subclass ofConfigurationScriptBase
(table nameconfiguration_scripts
).CONFUSION NOTE: Both services templates and playbooks are stored in the same table, but with different subclasses and different column usage. Additionally confusing is that unlike playbooks which create a subclass with that native term, the class here is ConfigurationScript instead of the native term JobTemplate, but some of the relationships use the term job_template instead.
For the purposes of this PoC, I've stored some of the options for the service template in the
variables
column, but I don't believe that is the correct way to do it. We will have to go back to the original design to see where the Tower provider stores those values during refresh.Service execute
When an ansible service template is ordered, a
ServiceTemplateProvisionRequest
(< MiqRequest
) is started, which goes through automate, and ultimately an instance of aServiceAnsiblePlaybook
(< Service
) is executed. In the general Service flow there are 2 main methods that need to be implemented,execute
andcheck_completed
. In theexecute
method aManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job
(< OrchestrationStack
) is created as a resource for this service, and "launched", moving on to thecheck_completed
step.Launching ansible-runner
For launching ansible-runner, we are using the
ManageIQ::Providers::AnsibleRunnerWorkflow
class which will eventually useAnsible::Runner
helper class. (Note: this workflow class was created as a helper for provider authors to create ansible based operations, however, the code itself is not provider specific and this code should be moved out of the providers namespace and into theAnsible::Runner
namespace instead).CONFUSION NOTE: The workflow class is a subclass of
::Job
, which is our generic state machine usingMiqTask
s. This is completely unrelated toManageIQ::Providers::EmbeddedAnsible::AutomationManager::Job
, which is just a resource representation for the service.The
AnsibleRunnerWorkflow
, being a self-containedJob
will launch ansible-runner with json output, asynchronously poll if the ansible-runner execution has completed, and once it has detected completion, it will grab the results, store them in theMiqTask
context, and cleanup the ansible-runner execution temp directory.Service check_completed
In the meantime, the
check_completed
step of theServiceAnsiblePlaybook
is run every so often. In this implementation, theMiqTask
associated with theAnsibleRunnerWorkflow
is being watched for completion. Once it has been marked as finished, then the service can move on with its post-execution steps.Services page
The services page shows the details of the
ServiceAnsiblePlaybook
, and the user can drill into the provision details. One of those details is the ansiblestdout
. In the AWX-based implementation, this was one of the few places where the database records were not used, and instead an asynchronous call would be made to AWX directly to fetch the stdout on demand. In the new ansible-runner design we don't have that option. For now, in this implementation, we happen to have this information already stored in theAnsibleRunnerWorkflow
's associatedMiqTask
, and since we have a relationship between theServiceAnsiblePlaybook
, and theMiqTask
, we can get the data directly from the database. We may not want to store this information in the MiqTask permanently, so a better design might be need which I'll elaborate on in the Ansible stdout sectionThe
stdout
is extracted from the stored json records, however it has ANSI character codes for terminal colors embedded. In the previous implementation, one could ask AWX for the HTML version, but we don't have that in this implementation. So, instead we use theterminal
ruby gem, which converts the raw terminal output to HTML replacing ANSI escape sequences with css classes. For this PoC, I've use the default CSS file that comes with theterminal
gem, which styles the HTML by wrapping it in a div and scoping that style to the wrapper div. We will likely want the UI team to have the freedom to style this directly, so instead we can forego the built-in CSS for styles directly in our ManageIQ stylesheets.Installing ansible-runner
git repo management
@mkanoor and I had started on a federated git repo management design back when we had the idea that the automate models would work better stored in git repos, thus allowing us to run them at any point in time as well as for history tracking, auditing, and reverting capabilities.
The premise was that an appliance would be given the
git_owner
role, which would behave much like thedb_owner
role. This appliance would allow internet access and thus could clone from public locations like github and/or private git instances. A record would be put into thegit_repositories
table, so that if we needed to failover the appliance we could re-clone.All other appliances, if they needed to access something about the git repository, would
git clone/fetch
from the appliance with thegit_owner
role. This would allow non-internet connected appliances to get at the data in an on-demand fashion.Some of these classes already exist, such as the GitRepository, GitReference, GitBranch, and GitTag models, as well as the GitWorktree class which manages the on-disk repositories using the
rugged
gem.The work that still needs to occur is to
MiqServer.api_system_auth_token_for_region
?)MiqRegion#remote_ui_miq_server
Once these are completed, we can ensure a git repo by checking if our on-disk git exists, to which we can git clone from the git_owner appliance, or if it already exists but is not up-to-date (checked by comparing to the expected SHA stored in the git_repositories table), then git fetching from the git_owner appliance.
Additionally, this would allow us to support things like "Update on Launch", because we would know the expected SHA for launching and can ensure we use that SHA, so when doing an Update on Launch we git fetch first and update the expected SHA.
Extra-bonus, since all of this is done, @mkanoor and I will be able to realize our git-based automate design 😄
Seeding
I'm not sure we need to seed any more than what's in the PR (i.e. default credentials for "localhost"). The original code had to create defaults for a number of things in order to please AWX, but those aren't necessarily needed for the new implementation. Even so, we need to research each one of those. (cc @carbonin)
Ansible stdout
In this implementation ansible stdout is stored in the
MiqTask
and it's associatedAnsibleRunnerWorkflow
job. (cc @agrare) These stdouts can get really big, so it's probably best to only have it stored once. We probably also do not want to store it in MiqTask, as that class could get cleaned up eventually, so it's probably better to hang a binary_blob entry off of theServiceAnsiblePlaybook
instance.Another complication here is how the UI is implemented, since this was originally a special casing for asynchronously fetching the stdout from AWX on-demand. In the original implementation, the backend code would start a special MiqTask specifically to get the output as HTML, and temporarily store it in the task. Then, the UI would wait_for_task, and when it was done delete the MiqTask.
None of this is needed anymore, and I think the backend code could be changed such that when the
AnsibleRunnerWorkflow
is completed, the data is extracted from the MiqTask, and stored as a binary_blob. Later, when the UI asks for the output, no MiqTask is needed as the data is already in the database and can just be served directly. Even better, this can probably be done as a normal controller action, where the controller just asks the model for the raw output and the TerminalToHtml call is done in the controller (since that's the more logical place to convert raw data to presentation HTML).Automate methods that are playbooks directly (without the service/service catalog)
Automate methods that are playbooks directly can use the AnsiblePlaybookWorkflow directly. Unlike the Service modeling which had its own execute and check_completed callouts, the automate methods do not.
TODO
Credential management
TODO
run pahse, only have a single credential type, allowing the user to define how to map credential details to playbook env vars and/or extra vars. Then we can get rid of the specialized types. Slight alternative have 2 types, a mappable type and a key pair type where the latter would map to SSH machine creds.
crawl phase (or ran out of time phase) we can keep the specialized types, and do the mapping in code a la
This section will likely need UI work.
Some settings in the service, such as logging, verbosity
TODO
Using the embedded_ansible or perhaps automate role
TODO
Upgrades
TODO
Tests
TODO
The text was updated successfully, but these errors were encountered: