Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use custom EE image #9917

Closed
lemmy04 opened this issue Apr 15, 2021 · 48 comments
Closed

Unable to use custom EE image #9917

lemmy04 opened this issue Apr 15, 2021 · 48 comments
Labels

Comments

@lemmy04
Copy link

lemmy04 commented Apr 15, 2021

ISSUE TYPE
  • Bug Report
SUMMARY

I have built a custom EE image (quay.io/mhomann/awx-ee-community), but when I try to use it for a job the job fails with a very uninformative "runtime error"

ENVIRONMENT
  • AWX version: 19.0.0
  • AWX install method: k3s, awx-operator
  • Ansible version: (unknown)
  • Operating System: debian buster
  • Web Browser: Firefox
STEPS TO REPRODUCE
  • create a custom EE from these requirement files:
    requirements.yml:
---
collections:
# With just the collection name
- community.general
- ansible.windows

requirements.txt:

dnspython
winrm
  • build the image and upload to quay.io:
ansible-builder build --tag quay.io/mhomann/awx-ee-community:0.0.1 --context ./context --container-runtime docker
docker push quay.io/mhomann/awx-ee-community:0.0.1
  • Edit your instance group to use that image instead of the default
  • run an ad-hoc job with a module that should be supported now, for example win_ping

Observe the job to fail with the following error:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 1397, in run
    res = receptor_job.run()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 2957, in run
    return self._run_internal(receptor_ctl)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 3008, in _run_internal
    raise RuntimeError(detail)
RuntimeError: Pod Running
EXPECTED RESULTS

The job should be executed and give some meaningful result

ACTUAL RESULTS
Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 1397, in run
    res = receptor_job.run()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 2957, in run
    return self._run_internal(receptor_ctl)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 3008, in _run_internal
    raise RuntimeError(detail)
RuntimeError: Pod Running
ADDITIONAL INFORMATION

there is nothing in the system journal as to why the job failed except for something that looks like some failed tcp connection to something.
If i manually start a container with a shell from the image i've build i can manually execute ansible m win_ping and it works (within the limits of not having inventory etc etc).

@shanemcd
Copy link
Member

Can you try:

ansible-builder build --tag quay.io/mhomann/awx-ee-community:0.0.1 --context ./context --container-runtime docker --build-arg ANSIBLE_RUNNER_IMAGE=quay.io/ansible/awx-ee:0.1.1

@lemmy04
Copy link
Author

lemmy04 commented Apr 15, 2021

the build process fails at the end:


Step 8/8 : RUN ansible-galaxy collection install $ANSIBLE_GALAXY_CLI_COLLECTION_OPTS -r requirements.yml --collections-path /usr/share/ansible/collections
 ---> Running in 3c58edcaaddd
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Downloading https://galaxy.ansible.com/download/ansible-windows-1.5.0.tar.gz to /home/runner/.ansible/tmp/ansible-local-1oodge4fc/tmpujgux4rn/ansible-windows-1.5.0-yeilc26g
Installing 'ansible.windows:1.5.0' to '/usr/share/ansible/collections/ansible_collections/ansible/windows'
ERROR! Unexpected Exception, this is probably a bug: [Errno 2] No such file or directory: '/usr/share/ansible/collections/ansible_collections/ansible/windows'
to see the full traceback, use -vvv
The command '/bin/sh -c ansible-galaxy collection install $ANSIBLE_GALAXY_CLI_COLLECTION_OPTS -r requirements.yml --collections-path /usr/share/ansible/collections' returned a non-zero code: 250

An error occured (rc=250), see output line(s) above for details.

@lemmy04
Copy link
Author

lemmy04 commented Apr 15, 2021

note: i have ansible-builder 0.6.0 here - 1.0.0a1 can't be installed, see ansible/ansible-builder#204

@lemmy04
Copy link
Author

lemmy04 commented Apr 15, 2021

with ansible-builder 1.0.0 the build fails in the same way at the same step.

@shanemcd
Copy link
Member

Can you post your execution-environment.yml?

@lemmy04
Copy link
Author

lemmy04 commented Apr 15, 2021

mathias@appsrv:~/work/k3s-appsrv.eregion.home/AWX/awx-ee-community$ cat execution-environment.yml
---
version: 1
dependencies:
  galaxy: requirements.yml
  python: requirements.txt

@shanemcd
Copy link
Member

I just tried with your inputs and it worked fine for me. Can you please share which version of Docker you are using?

@lemmy04
Copy link
Author

lemmy04 commented Apr 15, 2021

lemmy@kumiko:~> docker --version
Docker version 19.03.15, build 99e3ed89195c

@shanemcd
Copy link
Member

Hmmm...

$ docker --version
Docker version 20.10.5, build 55c4c88

Can you try updating and see what happens?

@lemmy04
Copy link
Author

lemmy04 commented Apr 15, 2021

Sending build context to Docker daemon  6.144kB
Step 1/8 : ARG ANSIBLE_RUNNER_IMAGE=quay.io/ansible/ansible-runner:devel
Step 2/8 : ARG PYTHON_BUILDER_IMAGE=quay.io/ansible/python-builder:latest
Step 3/8 : FROM $ANSIBLE_RUNNER_IMAGE as galaxy
 ---> 92e89bac126f
Step 4/8 : ARG ANSIBLE_GALAXY_CLI_COLLECTION_OPTS=
 ---> Using cache
 ---> 325895fddd64
Step 5/8 : ADD _build /build
 ---> Using cache
 ---> e3da740645d2
Step 6/8 : WORKDIR /build
 ---> Using cache
 ---> cc241773248b
Step 7/8 : RUN ansible-galaxy role install -r requirements.yml --roles-path /usr/share/ansible/roles
 ---> Using cache
 ---> 599710bbcdbb
Step 8/8 : RUN ansible-galaxy collection install $ANSIBLE_GALAXY_CLI_COLLECTION_OPTS -r requirements.yml --collections-path /usr/share/ansible/collections
 ---> Running in 311f1060b21e
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Downloading https://galaxy.ansible.com/download/ansible-windows-1.5.0.tar.gz to /home/runner/.ansible/tmp/ansible-local-17jval6mn/tmpnh5ihdav/ansible-windows-1.5.0-sd0kweki
Installing 'ansible.windows:1.5.0' to '/usr/share/ansible/collections/ansible_collections/ansible/windows'
ERROR! Unexpected Exception, this is probably a bug: [Errno 2] No such file or directory: '/usr/share/ansible/collections/ansible_collections/ansible/windows'
to see the full traceback, use -vvv
The command '/bin/sh -c ansible-galaxy collection install $ANSIBLE_GALAXY_CLI_COLLECTION_OPTS -r requirements.yml --collections-path /usr/share/ansible/collections' returned a non-zero code: 250

An error occured (rc=250), see output line(s) above for details.

lemmy@kumiko:~/Work/k3s-appsrv.eregion.home/AWX/awx-ee-community> docker --version
Docker version 20.10.5-ce, build 363e9a88a11b

@shanemcd
Copy link
Member

Alright, I'm stumped. Now we bring out the hammer:

$ sudo systemctl stop docker
$ sudo rm -rf /var/lib/docker
$ sudo systemctl start docker

@lemmy04
Copy link
Author

lemmy04 commented Apr 15, 2021

I used an even bigger hammer:

sudo systemctl stop docker
sudo rm -rf /var/lib/docker
sudo systemctl reboot

lemmy@kumiko:~> docker --version
Docker version 20.10.5-ce, build 363e9a88a11b
lemmy@kumiko:~/Work/k3s-appsrv.eregion.home/AWX/awx-ee-community> ansible-builder build --tag quay.io/mhomann/awx-ee-community:0.0.1 --context ./context --container-runtime docker --build-arg ANSIBLE_RUNNER_IMAGE=quay.io/ansible/awx-ee:0.1.1
Running command:
  docker build -f ./context/Dockerfile -t quay.io/mhomann/awx-ee-community:0.0.1 --build-arg=ANSIBLE_RUNNER_IMAGE=quay.io/ansible/awx-ee:0.1.1 ./context
Sending build context to Docker daemon  6.144kB
Step 1/8 : ARG ANSIBLE_RUNNER_IMAGE=quay.io/ansible/ansible-runner:devel
Step 2/8 : ARG PYTHON_BUILDER_IMAGE=quay.io/ansible/python-builder:latest
Step 3/8 : FROM $ANSIBLE_RUNNER_IMAGE as galaxy
0.1.1: Pulling from ansible/awx-ee
(...)
Digest: sha256:76ecd20bd375c1cc4c1fd2fc9f43e7a321136905b45f3c0fdd0304de59467b93
Status: Downloaded newer image for quay.io/ansible/awx-ee:0.1.1
 ---> 92e89bac126f
Step 4/8 : ARG ANSIBLE_GALAXY_CLI_COLLECTION_OPTS=
 ---> Running in 1dadaa00b876
Removing intermediate container 1dadaa00b876
 ---> fbb77840e5cc
Step 5/8 : ADD _build /build
 ---> 380d6f5618d7
Step 6/8 : WORKDIR /build
 ---> Running in ba10ae7470b9
Removing intermediate container ba10ae7470b9
 ---> edff9b649944
Step 7/8 : RUN ansible-galaxy role install -r requirements.yml --roles-path /usr/share/ansible/roles
 ---> Running in c93f741c43d5
Skipping install, no requirements found
Removing intermediate container c93f741c43d5
 ---> 98d3ac98a186
Step 8/8 : RUN ansible-galaxy collection install $ANSIBLE_GALAXY_CLI_COLLECTION_OPTS -r requirements.yml --collections-path /usr/share/ansible/collections
 ---> Running in caa76fa4effe
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Downloading https://galaxy.ansible.com/download/ansible-windows-1.5.0.tar.gz to /home/runner/.ansible/tmp/ansible-local-1yvjrkyvo/tmp314c4lj0/ansible-windows-1.5.0-l756uygm
Installing 'ansible.windows:1.5.0' to '/usr/share/ansible/collections/ansible_collections/ansible/windows'
to see the full traceback, use -vvv
ERROR! Unexpected Exception, this is probably a bug: [Errno 2] No such file or directory: '/usr/share/ansible/collections/ansible_collections/ansible/windows'
The command '/bin/sh -c ansible-galaxy collection install $ANSIBLE_GALAXY_CLI_COLLECTION_OPTS -r requirements.yml --collections-path /usr/share/ansible/collections' returned a non-zero code: 250

An error occured (rc=250), see output line(s) above for details.
lemmy@kumiko:~/Work/k3s-appsrv.eregion.home/AWX/awx-ee-community>

@shanemcd
Copy link
Member

Can you paste the output of the rendered Dockerfile/Containerfile? Your output is showing up differently from mine.

I would expect to see formatting something similar to:

#8 [5/5] RUN ansible-galaxy collection install  -r requirements.yml --collections-path /usr/share/ansible/collections
#8 sha256:7471de7f55af79b2cbffa2a678876af1458be2a1449da1b3ea34848b1d2f5d22
#8 0.685 [WARNING]: You are running the development version of Ansible. You should only
#8 0.685 run Ansible from "devel" if you are modifying the Ansible engine, or trying out
#8 0.685 features under development. This is a rapidly changing source of code and can
#8 0.685 become unstable at any point.
#8 0.691 Starting galaxy collection install process
#8 0.691 Process install dependency map
#8 6.208 Starting collection install process
#8 6.208 Downloading https://galaxy.ansible.com/download/ansible-windows-1.5.0.tar.gz to /home/runner/.ansible/tmp/ansible-local-1m_5tgh30/tmpyswpo6vt/ansible-windows-1.5.0-644gpws6
#8 7.097 Installing 'ansible.windows:1.5.0' to '/usr/share/ansible/collections/ansible_collections/ansible/windows'
#8 7.317 ansible.windows:1.5.0 was installed successfully
#8 7.317 Downloading https://galaxy.ansible.com/download/community-general-2.5.1.tar.gz to /home/runner/.ansible/tmp/ansible-local-1m_5tgh30/tmpyswpo6vt/community-general-2.5.1-yx975hjv
#8 7.711 Installing 'community.general:2.5.1' to '/usr/share/ansible/collections/ansible_collections/community/general'
#8 16.35 community.general:2.5.1 was installed successfully
#8 DONE 16.5s

@shanemcd
Copy link
Member

Wait a sec... I'm running with DOCKER_BUILDKIT=1 set in my environment, that's why the output looks different.

@shanemcd
Copy link
Member

With unset DOCKER_BUILDKIT I see:

Step 8/8 : RUN ansible-galaxy collection install $ANSIBLE_GALAXY_CLI_COLLECTION_OPTS -r requirements.yml --collections-path /usr/share/ansible/collections
 ---> Running in 4cc46ab484b7
[WARNING]: You are running the development version of Ansible. You should only
run Ansible from "devel" if you are modifying the Ansible engine, or trying out
features under development. This is a rapidly changing source of code and can
become unstable at any point.
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Downloading https://galaxy.ansible.com/download/ansible-windows-1.5.0.tar.gz to /home/runner/.ansible/tmp/ansible-local-1bqgj8wit/tmpxoh_3d4o/ansible-windows-1.5.0-kuvou1yv
Installing 'ansible.windows:1.5.0' to '/usr/share/ansible/collections/ansible_collections/ansible/windows'
ansible.windows:1.5.0 was installed successfully
Downloading https://galaxy.ansible.com/download/community-general-2.5.1.tar.gz to /home/runner/.ansible/tmp/ansible-local-1bqgj8wit/tmpxoh_3d4o/community-general-2.5.1-95l5x5r5
Installing 'community.general:2.5.1' to '/usr/share/ansible/collections/ansible_collections/community/general'
community.general:2.5.1 was installed successfully

@lemmy04
Copy link
Author

lemmy04 commented Apr 15, 2021

I'd almost think it could be something in my ~/.ansible.cfg but then my local user on my k3s host does not have one.

Right now I am having the very same problem with three different docker versions spread out over two different linux flavors, docker 18.x on debian buster, and 19x + 20.x on openSUSE...

@lemmy04
Copy link
Author

lemmy04 commented Apr 15, 2021

make that four different container runtimes on three different OS - podman 2.2.1 on RHEL 8.4 fails exactly the same way, at the same step.

[mathias@rhel8vm awx-ee-community]$ ansible-builder build --tag quay.io/mhomann/awx-ee-community:0.0.1 --context ./context --container-runtime podman --build-arg ANSIBLE_RUNNER_IMAGE=quay.io/ansible/awx-ee:0.1.1
Running command:
  podman build -f ./context/Containerfile -t quay.io/mhomann/awx-ee-community:0.0.1 --build-arg=ANSIBLE_RUNNER_IMAGE=quay.io/ansible/awx-ee:0.1.1 ./context
STEP 1: FROM quay.io/ansible/awx-ee:0.1.1 AS galaxy
STEP 2: ARG ANSIBLE_GALAXY_CLI_COLLECTION_OPTS=
--> Using cache 70bfad45b4c15b4e1a695544490744346816af7348882d018af1e88c22b53aae
--> 70bfad45b4c
STEP 3: ADD _build /build
--> Using cache d009fc833756f06404892a828759f912d87a888a04f0959a5a4911ddc86aa69d
--> d009fc83375
STEP 4: WORKDIR /build
--> Using cache 93821df157877f65b223684e119fdcfc25304433c3b0d943c5cf6b1c9fff2ae4
--> 93821df1578
STEP 5: RUN ansible-galaxy role install -r requirements.yml --roles-path /usr/share/ansible/roles
--> Using cache 1d161b4731516119dc42cab11624e3183acb862f9fec9b7482c1889158c99234
--> 1d161b47315
STEP 6: RUN ansible-galaxy collection install $ANSIBLE_GALAXY_CLI_COLLECTION_OPTS -r requirements.yml --collections-path /usr/share/ansible/collections
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Downloading https://galaxy.ansible.com/download/ansible-windows-1.5.0.tar.gz to /home/runner/.ansible/tmp/ansible-local-1fntbc0sk/tmp6rr5nf8v/ansible-windows-1.5.0-a70aqmuo
Installing 'ansible.windows:1.5.0' to '/usr/share/ansible/collections/ansible_collections/ansible/windows'
ERROR! Unexpected Exception, this is probably a bug: [Errno 2] No such file or directory: '/usr/share/ansible/collections/ansible_collections/ansible/windows'
to see the full traceback, use -vvv
Error: error building at STEP "RUN ansible-galaxy collection install $ANSIBLE_GALAXY_CLI_COLLECTION_OPTS -r requirements.yml --collections-path /usr/share/ansible/collections": error while running runtime: exit status 250

An error occured (rc=125), see output line(s) above for details.

and I have created that user specifically for this test and haven't done anything with containers on that VM before...

@lemmy04
Copy link
Author

lemmy04 commented Apr 15, 2021

just out of curiosity:

lemmy@kumiko:~> docker run -ti quay.io/ansible/awx-ee:0.1.1 /bin/bash --login[runner@7a3866dcd72f runner]$ ls -l /usr/share/ansible/collections/ansible_collections/
amazon/     ansible/    awx/        azure/      community/  google/     openstack/  ovirt/      theforeman/ 
[runner@7a3866dcd72f runner]$ ls -l /usr/share/ansible/collections/ansible_collections/ansible/windows
ls: cannot access '/usr/share/ansible/collections/ansible_collections/ansible/windows': No such file or directory
[runner@7a3866dcd72f runner]$

explains the "No such file or directory" error, at least...

@lemmy04
Copy link
Author

lemmy04 commented Apr 15, 2021

facepalm

the base container "awx-ee" uses a non-root user. If I start that container "normally" i can't manually install collections - if I run the container with "-u root" i can:

as normal user:

lemmy@kumiko:~> docker run -ti quay.io/ansible/awx-ee:0.1.1 /bin/bash --login
[runner@7a3866dcd72f runner]$ ansible-galaxy collection install ansible.windows --collections-path /usr/share/ansible/collections
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Downloading https://galaxy.ansible.com/download/ansible-windows-1.5.0.tar.gz to /home/runner/.ansible/tmp/ansible-local-4027aaebax/tmpwezipm1o/ansible-windows-1.5.0-e2iplgcn
Installing 'ansible.windows:1.5.0' to '/usr/share/ansible/collections/ansible_collections/ansible/windows'
ERROR! Unexpected Exception, this is probably a bug: [Errno 2] No such file or directory: '/usr/share/ansible/collections/ansible_collections/ansible/windows'
to see the full traceback, use -vvv
[runner@7a3866dcd72f runner]$

as root:

lemmy@kumiko:~> docker run -ti -u root quay.io/ansible/awx-ee:0.1.1 /bin/bash --login
[root@92fd74c62c5d runner]# id -a
uid=0(root) gid=0(root) groups=0(root)
[root@92fd74c62c5d runner]# cd
[root@92fd74c62c5d ~]# ansible-galaxy collection install ansible.windows --collections-path /usr/share/ansible/collections
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Downloading https://galaxy.ansible.com/download/ansible-windows-1.5.0.tar.gz to /home/runner/.ansible/tmp/ansible-local-32p2nyxy37/tmpbf554sfl/ansible-windows-1.5.0-3xatn8gv
Installing 'ansible.windows:1.5.0' to '/usr/share/ansible/collections/ansible_collections/ansible/windows'
ansible.windows:1.5.0 was installed successfully

@shanemcd
Copy link
Member

shanemcd commented Apr 15, 2021

@lemmy04 The awx-ee doesn't flip the USER to 1000 until the very end of the Containerfile: https://github.com/ansible/awx-ee/blob/268fe3d79f99e2a44a11df7a216cb6cfdd43be9f/Containerfile#L29

Which you can see by manually patching the produced Containerfile by shoving whoami before the part where it installs the collections:

$ podman build -f context/Dockerfile context
STEP 1: FROM quay.io/ansible/ansible-runner:devel AS galaxy
STEP 2: ARG ANSIBLE_GALAXY_CLI_COLLECTION_OPTS=
--> Using cache f8f1f2959d3d9db5192fad572532344c939417fd6d7d551cf7ccd7bb58c2b865
--> f8f1f2959d3
STEP 3: ADD _build /build
--> f2695ce03ce
STEP 4: WORKDIR /build
--> 213926fa044
STEP 5: RUN whoami
root

Edit: works with Docker too:

$ docker build -f context/Dockerfile context
Sending build context to Docker daemon  6.144kB
Step 1/16 : ARG ANSIBLE_RUNNER_IMAGE=quay.io/ansible/ansible-runner:devel
Step 2/16 : ARG PYTHON_BUILDER_IMAGE=quay.io/ansible/python-builder:latest
Step 3/16 : FROM $ANSIBLE_RUNNER_IMAGE as galaxy
 ---> e7564e256ad6
Step 4/16 : ARG ANSIBLE_GALAXY_CLI_COLLECTION_OPTS=
 ---> Using cache
 ---> 395a49e52b99
Step 5/16 : ADD _build /build
 ---> Using cache
 ---> 778e8c97f823
Step 6/16 : WORKDIR /build
 ---> Using cache
 ---> a1a7a0b73e06
Step 7/16 : RUN whoami
 ---> Running in 23007f5437e0
root

@lemmy04
Copy link
Author

lemmy04 commented Apr 15, 2021

when I do the same thing (manually insert a "RUN whoami" I get "command not found" - and if I put in "RUN id -a" instead I see it already as UID 1000 just before the galaxy install stuff....

lemmy@kumiko:~/Work/k3s-appsrv.eregion.home/AWX/awx-ee-community> cat context/Dockerfile 
ARG ANSIBLE_RUNNER_IMAGE=quay.io/ansible/ansible-runner:devel
ARG PYTHON_BUILDER_IMAGE=quay.io/ansible/python-builder:latest

FROM $ANSIBLE_RUNNER_IMAGE as galaxy

ARG ANSIBLE_GALAXY_CLI_COLLECTION_OPTS=
ADD _build /build

WORKDIR /build
RUN id -a
RUN ansible-galaxy role install -r requirements.yml --roles-path /usr/share/ansible/roles
RUN ansible-galaxy collection install $ANSIBLE_GALAXY_CLI_COLLECTION_OPTS -r requirements.yml --collections-path /usr/share/ansible/collections
lemmy@kumiko:~/Work/k3s-appsrv.eregion.home/AWX/awx-ee-community> docker build -f ./context/Dockerfile -t quay.io/mhomann/awx-ee-community:0.0.1 --build-arg=ANSIBLE_RUNNER_IMAGE=quay.io/ansible/awx-ee:0.1.1 ./context
Sending build context to Docker daemon  6.144kB
Step 1/9 : ARG ANSIBLE_RUNNER_IMAGE=quay.io/ansible/ansible-runner:devel
Step 2/9 : ARG PYTHON_BUILDER_IMAGE=quay.io/ansible/python-builder:latest
Step 3/9 : FROM $ANSIBLE_RUNNER_IMAGE as galaxy
 ---> 92e89bac126f
Step 4/9 : ARG ANSIBLE_GALAXY_CLI_COLLECTION_OPTS=
 ---> Using cache
 ---> fbb77840e5cc
Step 5/9 : ADD _build /build
 ---> Using cache
 ---> 380d6f5618d7
Step 6/9 : WORKDIR /build
 ---> Using cache
 ---> edff9b649944
Step 7/9 : RUN id -a
 ---> Using cache
 ---> 428feef5979a
Step 8/9 : RUN ansible-galaxy role install -r requirements.yml --roles-path /usr/share/ansible/roles
 ---> Using cache
 ---> 028433caaa04
Step 9/9 : RUN ansible-galaxy collection install $ANSIBLE_GALAXY_CLI_COLLECTION_OPTS -r requirements.yml --collections-path /usr/share/ansible/collections
 ---> Running in 56b2eb7f54c4
Starting galaxy collection install process
Process install dependency map
Starting collection install process
Downloading https://galaxy.ansible.com/download/ansible-windows-1.5.0.tar.gz to /home/runner/.ansible/tmp/ansible-local-1r_woka1i/tmpuxrh3bh7/ansible-windows-1.5.0-ysdj4v2p
Installing 'ansible.windows:1.5.0' to '/usr/share/ansible/collections/ansible_collections/ansible/windows'
ERROR! Unexpected Exception, this is probably a bug: [Errno 2] No such file or directory: '/usr/share/ansible/collections/ansible_collections/ansible/windows'
to see the full traceback, use -vvv
The command '/bin/sh -c ansible-galaxy collection install $ANSIBLE_GALAXY_CLI_COLLECTION_OPTS -r requirements.yml --collections-path /usr/share/ansible/collections' returned a non-zero code: 250

@shanemcd
Copy link
Member

Oh, shit. Because using awx-ee as the base image. 🤦

My bad, should have realized that.

I thought we added some code to ansible-builder to explicitly put USER root as the first build step, but I'm not seeing it. I'll try to make that happen soon.

In the meantime, you can hack your way through this by shoving USER root before the ansible-galaxy commands in the Dockerfile, and manually running docker build.

@lemmy04
Copy link
Author

lemmy04 commented Apr 16, 2021

that finishes now - but I think ansible-builder does two more steps after that "docker build"? Now, if i push the resulting image and re-run my failed job ir fails like this:

cripple-windows | FAILED! => {
"msg": "winrm or requests is not installed: No module named 'winrm'"
}

as in, the missing collections were there, but the python modules listed in requirements.txt didn't get installed...

@lemmy04
Copy link
Author

lemmy04 commented Apr 16, 2021

I manually patched the ansible-builder main.py file to what is in ansible/ansible-builder#205...
...now the build fails in a completely different way:

lemmy@kumiko:~/Work/k3s-appsrv.eregion.home/AWX/awx-ee-community> ansible-builder build --tag quay.io/mhomann/awx-ee-community:0.0.1 --context ./context --container-runtime docker --build-arg ANSIBLE_RUNNER_IMAGE=quay.io/ansible/awx-ee:0.1.1
Running command:
  docker build -f ./context/Dockerfile -t quay.io/mhomann/awx-ee-community:0.0.1 --build-arg=ANSIBLE_RUNNER_IMAGE=quay.io/ansible/awx-ee:0.1.1 ./context
Running command:
  docker run --rm -v /usr/lib/python3.6/site-packages/ansible_builder:/ansible_builder_mount:Z quay.io/mhomann/awx-ee-community:0.0.1 python3 /ansible_builder_mount/introspect.py
File ./context/_build/bindep_combined.txt had modifications and will be rewritten
File ./context/_build/requirements_combined.txt had modifications and will be rewritten
Running command:
  docker build -f ./context/Dockerfile -t quay.io/mhomann/awx-ee-community:0.0.1 --build-arg=ANSIBLE_RUNNER_IMAGE=quay.io/ansible/awx-ee:0.1.1 ./context
...showing last 20 lines of output...
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.8
    creating build/lib.linux-x86_64-3.8/curl
    copying python/curl/__init__.py -> build/lib.linux-x86_64-3.8/curl
    running build_ext
    building 'pycurl' extension
    creating build/temp.linux-x86_64-3.8
    creating build/temp.linux-x86_64-3.8/src
    gcc -pthread -Wno-unused-result -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -D_GNU_SOURCE -fPIC -fwrapv -fPIC -DPYCURL_VERSION="7.43.0.6" -DHAVE_CURL_SSL=1 -DHAVE_CURL_OPENSSL=1 -DHAVE_CURL_SSL=1 -I/usr/include/python3.8 -c src/docstrings.c -o build/temp.linux-x86_64-3.8/src/docstrings.o
    In file included from src/docstrings.c:4:
    src/pycurl.h:5:10: fatal error: Python.h: No such file or directory
     #include <Python.h>
              ^~~~~~~~~~
    compilation terminated.
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-16wp_vrq/pycurl_07cebe73bbfc4e458fe8a17a161be72e/setup.py'"'"'; __file__='"'"'/tmp/pip-install-16wp_vrq/pycurl_07cebe73bbfc4e458fe8a17a161be72e/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-3zgxikkd/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.8/pycurl Check the logs for full command output.
The command '/bin/sh -c assemble' returned a non-zero code: 1

An error occured (rc=1), see output line(s) above for details.

I think this whole shebang should move over into a bug on ansible-builder, just filed ansible/ansible-builder#206

@shanemcd
Copy link
Member

@tvo318 @beeankha Let's chat about this next week and how we can document what went wrong here / improve the UX.

@weiyentan
Copy link

weiyentan commented Apr 24, 2021

@lemmy04 Were you able to get a build done successfully? I have been following this thread and followed the steps that you went through.
I tried to use 0.1.1 version of the awx-ee but its erroring out at one of the collections.

I pulled the 0.1.1 version of the awx-ee. Ran that Pushed it up to dockerhub using podman. Then used ansible-builder to build an image and I have been getting this in my AWX output:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 1397, in run
    res = receptor_job.run()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 2957, in run
    return self._run_internal(receptor_ctl)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 3008, in _run_internal
    raise RuntimeError(detail)
RuntimeError: Pod Running

and I managed to get a snippet of the pod contents before it terminated as being:

{"status": "error", "job_explanation": "Failed to extract private data directory on worker.", "result_traceback": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.8/site-packages/ansible_runner/streaming.py\", line 107, in run\n    unstream_dir(self._input, data['zipfile'], self.private_data_dir)\n  File \"/usr/local/lib/python3.8/site-packages/ansible_runner/utils/streaming.py\", line 52, in unstream_dir\n    with zipfile.ZipFile(tmp.name, 'r') as archive:\n  File \"/usr/lib64/python3.8/zipfile.py\", line 1268, in __init__\n    self._RealGetContents()\n  File \"/usr/lib64/python3.8/zipfile.py\", line 1335, in _RealGetContents\n    raise BadZipFile(\"File is not a zip file\")\nzipfile.BadZipFile: File is not a zip file\n"}
{"eof": true}

Sorry @shanemcd for hijacking this thread....I thought i would add it here as it may provide a bit more context....

@techBeck03
Copy link

techBeck03 commented Apr 25, 2021

I'm also experiencing the exact behavior described by @weiyentan. I was previously (3-4 days ago) able to build EE imges just fine using this repo and simply modifying the requirements.yml and execution-environment.yml files. Now every time I build an image I get the same RuntimeError: Pod Running.

The only thing I can think of that's changed is the ansible runner image. I cleared my local docker images at some point so I assume it pulled a newer version of this image considering the repo shows frequent updates to those tagged images. Does it make more sense to use different tag versions for newer image pushes to the https://quay.io/repository/ansible/ansible-runner repo?

EDIT: I realized I was using:

build_arg_defaults:
  ANSIBLE_RUNNER_IMAGE: 'quay.io/ansible/ansible-runner:stable-2.11-devel'

I changed this to --build-arg ANSIBLE_RUNNER_IMAGE=quay.io/ansible/awx-ee:0.1.1 and --build-arg ANSIBLE_RUNNER_IMAGE=quay.io/ansible/awx-ee:0.2.0 but still experienced the same resulting error.

Thanks!

@Bbett
Copy link

Bbett commented Apr 29, 2021

So I've been able to get past the RuntimeError: Pod Running message by doing the following. There is something up with the ansible-runner install in the image and a reinstall seems to work.

Run the ansible-builder build command as outlined in the documentation.
Once complete go edit the context/Dockerfile and add the following line after FROM $ANSIBLE_RUNNER_IMAGE

RUN pip3 uninstall --yes ansible-runner && pip3 install ansible-runner==2.0.0a1

Then rerun the docker build command that the Ansible builder provided.
docker build -f ./context/Dockerfile -t image:tag ./context

Which will rebuild the container and reinstall the ansible-runner.

After I do that I no longer get the RuntimeError: Pod Running message and the EE container runs as expected.

@weiyentan
Copy link

You are a life saver @Bbett . I ower you a coffee/beer. That fixed my problem.

@DrackThor
Copy link

@Bbett at this point I'm honestly confused about all steps to take or not take.
Could you please post your execution-environment.yml and all other modified files?
Thanks in advance :)

@weiyentan
Copy link

Hi @DrackThor ,

Would love to help while waiting, which part is confusing you? I can add some clarity?

@Bbett
Copy link

Bbett commented Apr 30, 2021

@DrackThor here are my files.

execution-environment.yml

---
version: 1
dependencies:
  galaxy: requirements.yml
  python: requirements.txt
  system: bindep.txt
additional_build_steps:
  append:
    - RUN alternatives --set python /usr/bin/python3
    - COPY --from=quay.io/project-receptor/receptor:0.9.7 /usr/bin/receptor /usr/bin/receptor
    - RUN mkdir -p /var/run/receptor
    - ADD run.sh /run.sh
    - CMD /run.sh
    - USER 1000
    - RUN git lfs install

requirements.yml

---
collections:
  - community.general

requirements.txt
urllib3

bindep.txt

python38-devel [platform:rpm compile]
subversion [platform:rpm]
subversion [platform:dpkg]
git-lfs [platform:rpm]

and finally context/run.sh

#! /bin/bash
ansible-runner worker --private-data-dir=/runner

Be sure to update the run.sh to be executable

chmod +x context/run.sh

Then build the container with the following command.

ansible-builder build --tag yourtagname --context ./context --container-runtime docker

When I attempt to use the created container in AWX 19.0.0 I get the RuntimeError: Pod Running message.

However if I go edit context/Dockerfile
and add the following line after FROM $ANSIBLE_RUNNER_IMAGE
RUN pip3 uninstall --yes ansible-runner && pip3 install ansible-runner==2.0.0a1

and run
docker build -f ./context/Dockerfile -t yourtagname ./context

Then when I run the container in AWX 19.0.0 it works as expected.

@stasonspb
Copy link

You can add reinstall ansible-runner to execution-environment.yml:

version: 1
dependencies:
  galaxy: requirements.yml
  python: requirements.txt
  system: bindep.txt
additional_build_steps:
  append:
    - RUN alternatives --set python /usr/bin/python3
    - RUN pip3 uninstall --yes ansible-runner && pip3 install ansible-runner==2.0.0a1
    - COPY --from=quay.io/project-receptor/receptor:0.9.7 /usr/bin/receptor /usr/bin/receptor
    - RUN mkdir -p /var/run/receptor
    - ADD run.sh /run.sh
    - CMD /run.sh
    - USER 1000
    - RUN git lfs install

Then build:
ansible-builder build --tag yourtagname --context ./context --container-runtime docker

@Bbett
Copy link

Bbett commented Apr 30, 2021

@stasonspb that is correct, I didn't think to do that after I found that doing a reinstall of ansible-runner was correcting the issue.

@saxx0n
Copy link

saxx0n commented Apr 30, 2021

@Bbett I owe you several beers. That fix just solved a week of banging my head against a wall.

@DrackThor
Copy link

Hi @DrackThor ,

Would love to help while waiting, which part is confusing you? I can add some clarity?

Hi @weiyentan what's still unclear to me:

  • when and why to use ANSIBLE_RUNNER_IMAGE=quay.io/ansible/ansible-runner:devel or ANSIBLE_RUNNER_IMAGE=quay.io/ansible/awx-ee:0.1.1?
  • I'm having my requirements.txt, requirements.yml and bindeb.txt files in my awx-ee repo root folder now, do I need to copy the pre-existing values from _build/requirements.* and _build/bindep.txt as well? (I assume yes)

@Bbett thank you very much - also works for me now! I also struggled with this for almost three days now...

@weiyentan
Copy link

You need bindep.txt. I wish some one can explain to me how I can add what I want from different repositories. But the requirement.yml you choose what you want. These ate galaxy things. Requirements.txt are what the old venv python libraries that you want.

@weiyentan
Copy link

weiyentan commented May 1, 2021

The runner image is relating to the engine that is driving ansible. 2.0 runner from pip3 seems to work with awx

@DrackThor
Copy link

I've just tried to install awx 19.1.0 and use my previously built ee container, which brought me this error

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 1377, in run
    res = receptor_job.run()
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 2904, in run
    return self._run_internal(receptor_ctl)
  File "/var/lib/awx/venv/awx/lib64/python3.8/site-packages/awx/main/tasks.py", line 2962, in _run_internal
    raise RuntimeError(detail)
RuntimeError: Sending stdin to pod

steps to reproduce:

  • install AWX 19.1.0 via awx operator 0.9.0
  • build custom EE using awx-ee repo and ansible-builder (1.0.0a1)
  • create Execution Environment with this image
  • run simple "ping" playbook on this EE
    • same behaviour for all my playbooks

@shanemcd should I add additional information (eg. my config files etc) here, or should we open a new issue?

@shanemcd
Copy link
Member

shanemcd commented May 5, 2021

@DrackThor Did you rebuild your EE on top of AWX EE 0.2.0?

@DrackThor
Copy link

DrackThor commented May 5, 2021

@DrackThor Did you rebuild your EE on top of AWX EE 0.2.0?

No, I've used this execution-environment.yaml:

---
version: 1
build_arg_defaults:  
  ANSIBLE_RUNNER_IMAGE: quay.io/ansible/ansible-runner:devel
  PYTHON_BUILDER_IMAGE: quay.io/ansible/python-builder:latest
dependencies:
  galaxy: requirements.yml
  system: bindep.txt
  python: requirements.txt
additional_build_steps:
  append:
    - RUN alternatives --set python /usr/bin/python3
    - RUN pip3 uninstall --yes ansible-runner && pip3 install ansible-runner==2.0.0a1
    - COPY --from=docker.avl.com/project-receptor/receptor:0.9.7 /usr/bin/receptor /usr/bin/receptor
    - RUN mkdir -p /var/run/receptor
    - ADD run.sh /run.sh
    - CMD /run.sh
    - RUN update-ca-trust force-enable
    - ADD certificates/*.crt /etc/pki/ca-trust/source/anchors/
    - RUN chmod 644 /etc/pki/ca-trust/source/anchors/*.crt && update-ca-trust extract
    - COPY krb5.conf /etc/krb5.conf
    - USER 1000
    - RUN git lfs install

I'll try using quay.io/ansible/awx-ee:0.2.0 instead of quay.io/ansible/ansible-runner:devel asap.

@shanemcd
Copy link
Member

shanemcd commented May 5, 2021

@DrackThor ahhh I think you need to bump to ansible-runner 2.0.0a2

@pabelanger
Copy link
Contributor

There shouldn't be a need to have

    - RUN pip3 uninstall --yes ansible-runner && pip3 install ansible-runner==2.0.0a1

We should be pulling in the latest version by default. If not, then there is a problem some place

@DrackThor
Copy link

@DrackThor ahhh I think you need to bump to ansible-runner 2.0.0a2

I've got it running by building upon awx-ee:0.2.0 like mentioned in #10060
This image, in used in an Execution Environment on awx 19.1.0 works for me:

FROM quay.io/ansible/awx-ee:0.2.0

USER root

# install OS binaries
RUN yum -y install \
ca-certificates \
gcc \
git \
git-lfs \
krb5-devel \
krb5-libs \
krb5-workstation \
libcurl-devel \
libxml2-devel \
openssl-devel \
python3-jmespath \
python3-netaddr \
python3-passlib \
python3-pycurl \
python38-devel \
python38-pytz \
python38-pyyaml \
python38-requests \
qemu-img

# add Python dependencies and Ansible
# Galary dependencies
ADD requirements.yml /tmp/requirements.yml
ADD requirements.txt /tmp/requirements.txt

# upgrade pip
RUN /usr/bin/python3 -m pip install --upgrade pip

# install Ansible Galaxy collections
RUN ansible-galaxy collection install -r /tmp/requirements.yml --collections-path /usr/share/ansible/collections

# install Python dependencies
RUN pip install -r /tmp/requirements.txt

# add certificates
RUN update-ca-trust force-enable
ADD certificates/*.crt /etc/pki/ca-trust/source/anchors/
RUN chmod 644 /etc/pki/ca-trust/source/anchors/*.crt && update-ca-trust extract

# add Kerberos conf
COPY krb5.conf /etc/krb5.conf

USER 1000

This way Kerberos support works as well 😃

  • I assume this is just a workaround and the use of ansible-builder should be the way to go?
  • thanks for your help so far! 😄

@pabelanger
Copy link
Contributor

@DrackThor ahhh I think you need to bump to ansible-runner 2.0.0a2

I've got it running by building upon awx-ee:0.2.0 like mentioned in #10060
This image, in used in an Execution Environment on awx 19.1.0 works for me:

FROM quay.io/ansible/awx-ee:0.2.0

USER root

# install OS binaries
RUN yum -y install \
ca-certificates \
gcc \
git \
git-lfs \
krb5-devel \
krb5-libs \
krb5-workstation \
libcurl-devel \
libxml2-devel \
openssl-devel \
python3-jmespath \
python3-netaddr \
python3-passlib \
python3-pycurl \
python38-devel \
python38-pytz \
python38-pyyaml \
python38-requests \
qemu-img

# add Python dependencies and Ansible
# Galary dependencies
ADD requirements.yml /tmp/requirements.yml
ADD requirements.txt /tmp/requirements.txt

# upgrade pip
RUN /usr/bin/python3 -m pip install --upgrade pip

# install Ansible Galaxy collections
RUN ansible-galaxy collection install -r /tmp/requirements.yml --collections-path /usr/share/ansible/collections

# install Python dependencies
RUN pip install -r /tmp/requirements.txt

# add certificates
RUN update-ca-trust force-enable
ADD certificates/*.crt /etc/pki/ca-trust/source/anchors/
RUN chmod 644 /etc/pki/ca-trust/source/anchors/*.crt && update-ca-trust extract

# add Kerberos conf
COPY krb5.conf /etc/krb5.conf

USER 1000

This way Kerberos support works as well

  • I assume this is just a workaround and the use of ansible-builder should be the way to go?
  • thanks for your help so far! smile

Yah, it looks like you didn't use ansible-builder to create your dockerfile. Which is fine, but you now have development headers in your final image.

https://github.com/ansible/network-ee is another EE, which has a more minimal dockerfile

@DrackThor
Copy link

@pabelanger exactly, now I just extended the awx-ee:0.2.0 image.
I wasn't able to build a working image using ansible-builder for awx 19.1.0 so far.
I know that this approach has some overhead, but as long as it's working thats fine with me now - I'll look into optimizing and building the EE with ansible-builder later on.

So imho there are two workarounds for this issue so far:

  • for 19.0.0: build your EE from scratch by adapting the awx-ee source and using ansible-builder
  • for 19.2.0: extend awx-ee:0.2.0 as you need
    The open topic is to make ansible-builder work as it intentionally should for 19.x.x

@weiyentan
Copy link

weiyentan commented May 6, 2021 via email

@shanemcd
Copy link
Member

shanemcd commented Jun 1, 2021

Thanks for the discussion here folks. I think we learned a lot here. We're actively working on refining documentation and prioritizing UX-related issues as things around Execution Environments begin to stabilize.

AWX 19.2.0 will be out later today, along with a newer version of the awx-ee. This is largely a bugfix release. See the changelog for more details. The protocols should be compatible with awx-ee 0.2.0, but in case you run into any issues, try building on top of the newest awx-ee to see if it helps.

If you encounter any specific issues in the future please open a new issue. Seriously, thank you!

@shanemcd shanemcd closed this as completed Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests