Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[zuul] Use extracted CRC nodes in stf-base #531

Merged
merged 14 commits into from
Jan 13, 2024
Merged

Conversation

elfiesmelfie
Copy link
Collaborator

@elfiesmelfie elfiesmelfie commented Nov 13, 2023

This updates the stf-base job to use a 2-node/extracted CRC setup instead of a single node/nested CRC set-up.

Based on the example here: https://ci-framework.readthedocs.io/en/latest/cookbooks/zuul-job-nodeset.html

Depends-On: openstack-k8s-operators/ci-framework#837

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/d9cfe732a32b4761b24b7c82abf5dd5f

stf-crc-latest-nightly_bundles ERROR Project github.com/openstack-k8s-operators/dataplane-operator does not have the default branch master in 8s
stf-crc-latest-local_build ERROR Project github.com/openstack-k8s-operators/dataplane-operator does not have the default branch master in 9s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/03c063a0ffba4e22bf4771b73fa6c2d1

stf-crc-latest-nightly_bundles ERROR Project github.com/openstack-k8s-operators/infra-operator does not have the default branch master in 10s
stf-crc-latest-local_build ERROR Project github.com/openstack-k8s-operators/infra-operator does not have the default branch master in 8s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/342aafb88cf747358cbe84ed629e08ca

stf-crc-latest-nightly_bundles ERROR Project github.com/openstack-k8s-operators/openstack-operator does not have the default branch master in 9s
stf-crc-latest-local_build ERROR Project github.com/openstack-k8s-operators/openstack-operator does not have the default branch master in 9s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/b0643abc04e0461f89f66a7fee8c6b48

stf-crc-latest-nightly_bundles RETRY_LIMIT in 29m 56s
stf-crc-latest-local_build RETRY_LIMIT in 29m 57s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/bd659b4c62304c888023989c2205c070

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 29m 57s
stf-crc-ocp_412-local_build RETRY_LIMIT in 29m 54s
stf-crc-ocp_413-nightly_bundles RETRY_LIMIT in 29m 54s
stf-crc-ocp_413-local_build RETRY_LIMIT in 29m 52s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/ca13fccbe33c4b829b652d01224225f3

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 18m 23s
stf-crc-ocp_412-local_build RETRY_LIMIT in 18m 29s
stf-crc-ocp_413-nightly_bundles RETRY_LIMIT in 18m 00s
stf-crc-ocp_413-local_build RETRY_LIMIT in 18m 22s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/dda0eac383e042cf8ee6fe753754e65e

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 19m 57s
stf-crc-ocp_412-local_build RETRY_LIMIT in 19m 48s
stf-crc-ocp_413-nightly_bundles RETRY_LIMIT in 19m 55s
stf-crc-ocp_413-local_build RETRY_LIMIT in 21m 54s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/a1d705a987cb4668ae565ede59c4892d

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 18m 56s
stf-crc-ocp_412-local_build RETRY_LIMIT in 21m 34s
stf-crc-ocp_413-nightly_bundles RETRY_LIMIT in 18m 13s
stf-crc-ocp_413-local_build RETRY_LIMIT in 22m 38s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/86b2f6ae2b5f4f9ab3bd073069f78610

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 22m 41s
stf-crc-ocp_412-local_build RETRY_LIMIT in 19m 45s
stf-crc-ocp_413-nightly_bundles RETRY_LIMIT in 18m 27s
stf-crc-ocp_413-local_build RETRY_LIMIT in 20m 16s

ci/prepare.yml Outdated Show resolved Hide resolved
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/7de6de2ab4bc4a1da2c6673f28f353dc

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 20m 47s
stf-crc-ocp_412-local_build RETRY_LIMIT in 20m 22s
stf-crc-ocp_413-nightly_bundles RETRY_LIMIT in 21m 02s
stf-crc-ocp_413-local_build NODE_FAILURE Node request 200-0006663150 failed in 0s

ci/prepare.yml Outdated Show resolved Hide resolved
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/420e26b4bb074c04bc40d31dcd34ef42

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 18m 50s
stf-crc-ocp_412-local_build NODE_FAILURE Node request 200-0006663175 failed in 0s
stf-crc-ocp_413-nightly_bundles NODE_FAILURE Node request 200-0006663176 failed in 0s
stf-crc-ocp_413-local_build NODE_FAILURE Node request 200-0006663177 failed in 0s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/bdb9ef305f214ab0aec5386dd9cd6f6e

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 15m 34s
stf-crc-ocp_412-local_build RETRY_LIMIT in 17m 38s
stf-crc-ocp_413-nightly_bundles RETRY_LIMIT in 19m 04s
stf-crc-ocp_413-local_build RETRY_LIMIT in 14m 46s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/1d8dae4607a4480aa38c2755843d5874

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 18m 35s
stf-crc-ocp_412-local_build RETRY_LIMIT in 18m 07s
stf-crc-ocp_413-nightly_bundles RETRY_LIMIT in 18m 01s
stf-crc-ocp_413-local_build RETRY_LIMIT in 17m 47s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/539fa75e30414094a072f8eb613895aa

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 18m 07s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/f5c824f6c533489e8a15bc8a8979eeef

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 17m 07s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/34adc64f56384ef69a926b3e323ce7c2

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 23m 26s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/d68ffe5588c2494aa9d82c59220670d7

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 18m 46s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/a08dc9fbacd545cda7eb9dc9aa8b7953

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 18m 41s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/c9f301ab949e455cac58490743960de8

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 20m 02s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/a7f7a2edc30946a38f4eda9f96915651

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 21m 38s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/5cd1b8d1764a416f8d044554f4f57ca9

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 22m 34s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/ea08ceb09d1243488fc400436bee8311

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 18m 54s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/9a41715c089d4cc386d1b4ae38952ba4

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 19m 37s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/180383007cc54cceabadab53e5cd4fed

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 18m 48s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/9f989376d5534808830693ffc178d9d5

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 21m 38s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/cb4af1d46f7745728ebb204104e420c1

✔️ stf-crc-ocp_412-nightly_bundles SUCCESS in 26m 34s
stf-crc-ocp_413-nightly_bundles RETRY_LIMIT in 5m 45s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/39643623510c4d2e8cd72dd404335334

stf-crc-ocp_412-nightly_bundles NODE_FAILURE Node request 100-0006738815 failed in 0s
stf-crc-ocp_413-nightly_bundles NODE_FAILURE Node request 100-0006738816 failed in 0s

@elfiesmelfie
Copy link
Collaborator Author

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/c2f534d308b44cc3a835dfe416a6c74d

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 5m 00s (non-voting)
stf-crc-ocp_413-nightly_bundles FAILURE in 31m 00s (non-voting)
stf-crc-ocp_414-nightly_bundles RETRY_LIMIT in 5m 12s (non-voting)
✔️ stf-crc-ocp_412-local_build SUCCESS in 37m 24s
✔️ stf-crc-ocp_413-local_build SUCCESS in 37m 21s
stf-crc-ocp_414-local_build FAILURE in 25m 35s
stf-crc-ocp_412-local_build-index_deploy FAILURE in 29m 05s
stf-crc-ocp_413-local_build-index_deploy FAILURE in 24m 24s
✔️ stf-crc-ocp_414-local_build-index_deploy SUCCESS in 41m 28s

* use ci-framework infra playbook
* add make targets to do set-up
* link the kubeconfig files
* Remove pre-get_kubeconfig.yml; the script is no longer used
@elfiesmelfie
Copy link
Collaborator Author

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/3bd7ad274bb84bf2b47036178b2bc948

stf-crc-ocp_412-nightly_bundles NODE_FAILURE Node request 100-0006803096 failed in 0s (non-voting)
stf-crc-ocp_413-nightly_bundles NODE_FAILURE Node request 100-0006803097 failed in 0s (non-voting)
stf-crc-ocp_414-nightly_bundles NODE_FAILURE Node request 100-0006803098 failed in 0s (non-voting)
stf-crc-ocp_412-local_build NODE_FAILURE Node request 100-0006803099 failed in 0s
stf-crc-ocp_413-local_build NODE_FAILURE Node request 100-0006803100 failed in 0s
stf-crc-ocp_414-local_build NODE_FAILURE Node request 100-0006803101 failed in 0s
stf-crc-ocp_412-local_build-index_deploy NODE_FAILURE Node request 100-0006803102 failed in 0s
stf-crc-ocp_413-local_build-index_deploy NODE_FAILURE Node request 100-0006803103 failed in 0s
stf-crc-ocp_414-local_build-index_deploy NODE_FAILURE Node request 100-0006803104 failed in 0s

@elfiesmelfie
Copy link
Collaborator Author

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/485fa479f858497f88c0d7eb319842f3

stf-crc-ocp_412-nightly_bundles NODE_FAILURE Node request 100-0006803109 failed in 0s (non-voting)
stf-crc-ocp_413-nightly_bundles NODE_FAILURE Node request 100-0006803110 failed in 0s (non-voting)
stf-crc-ocp_414-nightly_bundles NODE_FAILURE Node request 100-0006803111 failed in 0s (non-voting)
stf-crc-ocp_412-local_build NODE_FAILURE Node request 100-0006803112 failed in 0s
stf-crc-ocp_413-local_build NODE_FAILURE Node request 100-0006803113 failed in 0s
stf-crc-ocp_414-local_build NODE_FAILURE Node request 100-0006803114 failed in 0s
stf-crc-ocp_412-local_build-index_deploy NODE_FAILURE Node request 100-0006803115 failed in 0s
stf-crc-ocp_413-local_build-index_deploy NODE_FAILURE Node request 100-0006803116 failed in 0s
stf-crc-ocp_414-local_build-index_deploy NODE_FAILURE Node request 100-0006803117 failed in 0s

@danpawlik
Copy link
Contributor

recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/830ce37f3a6c49a89a6821eacc61177c

stf-crc-ocp_412-nightly_bundles RETRY_LIMIT in 5s (non-voting)
stf-crc-ocp_413-nightly_bundles FAILURE in 32m 34s (non-voting)
stf-crc-ocp_414-nightly_bundles FAILURE in 31m 21s (non-voting)
✔️ stf-crc-ocp_412-local_build SUCCESS in 35m 21s
stf-crc-ocp_413-local_build RETRY_LIMIT in 5s
✔️ stf-crc-ocp_414-local_build SUCCESS in 34m 51s
stf-crc-ocp_412-local_build-index_deploy RETRY_LIMIT in 4s
stf-crc-ocp_413-local_build-index_deploy RETRY_LIMIT in 6s
✔️ stf-crc-ocp_414-local_build-index_deploy SUCCESS in 43m 47s

@danpawlik
Copy link
Contributor

recheck

@danpawlik
Copy link
Contributor

Now it seems to be fine.

@elfiesmelfie
Copy link
Collaborator Author

Now it seems to be fine.

Thanks Daniel!

ci/pre-2node.yml Outdated
Comment on lines 35 to 38
- name: Check for the kubeconfig file
ansible.builtin.shell:
cmd: |
ls {{ cifmw_openshift_kubeconfig }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this play, because there is no register etc on it. Also for checking for file existence, wouldn't stat make more sense here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a task that gets read by the user debugging. It can technically be removed. If there is no kubeconfig file at this expected location, the later linking task will fail

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this task should fail too, because you get a non-zero return code querying a non-existant file

$ ls kubeconfig
kubeconfig
$ echo $?
0
$ ls kubeconfig2
ls: cannot access 'kubeconfig2': No such file or directory
$ echo $?
2

ci/pre-2node.yml Outdated
Comment on lines 46 to 49
- name: Check for the kubeconfig file
ansible.builtin.shell:
cmd: |
ls -l {{ ansible_env.HOME }}/.kube/config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a bit confused here, and with the -l which wasn't on the other play.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to make sure that we see the link has been created, not strictly needed, and used for debugging

ci/deploy_stf.yml Outdated Show resolved Hide resolved
ci/pre-2node.yml Outdated Show resolved Hide resolved
ci/pre-2node.yml Outdated Show resolved Hide resolved
ci/prepare.yml Outdated Show resolved Hide resolved
ci/test_stf.yml Outdated Show resolved Hide resolved
@elfiesmelfie
Copy link
Collaborator Author

test

Copy link
Collaborator

@vkmc vkmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Emma!

@vkmc vkmc merged commit 665577e into master Jan 13, 2024
11 checks passed
@vkmc vkmc deleted the efoley-zuul-crc-no-nested-virt branch January 13, 2024 10:06
vkmc added a commit that referenced this pull request Feb 14, 2024
* Add gitleaks.toml for rh-gitleaks (#510)

Add a .gitleaks.toml file to avoid the false positive leak for the
example certificate when deploying for Elasticsearch.

* [stf-collect-logs] Move describe build|pod from ci/ to the role (#505)

* [stf-run-ci] Fix check to include bool filter (#511)

Update the check to use bool filter instead of a bar var.
By default, ansible parses vars as strings, and without the | bool
filter, this check is invalid, as it will always resolve to true, since
it is a non-empty string. Other instances of the same check did this,
but this one was missed.

* [allow_skip_clone] Allow skipping of the cloning stages (#512)

* [allow_skip_clone] Use <repo>_dir instead of hardcoding all directories relative to base_dir

This will allow configuration of the repo clone destination, so we can
use pre-cloned dirs instead of explicitly cloning the dirs each time.

This is essential for CI systems like zuul, that set-up the repos with
particular versions/branches prior to running the test scripts.

* [zuul] List the other infrawatch repos as required for the job

* [zuul] Set the {sgo,sg-bridge,sg-core,prometheus-webhook-snmp}_dir vars

Add in the repo dir locations where the repos should be pre-cloned by
zuul

* Replace base_dir with sto_dir

* set sto_dir relative to base_dir is it isn't already set

* [ci] use absolute dir for requirements.txt

* [ci] Update sto_dir using explicit reference

zuul.project.src_dir refers to the current project dir. When using the jobs
in another infrawatch project, this becomes invalid.
Instead, sto_dir is explicitly set using
zuul.projects[<project_name>].src_dir, the same way that the other repo dirs
are set in vars-zuul-common

---------

Co-authored-by: Chris Sibbitt <csibbitt@redhat.com>

* Fix qdr auth one_time_upgrade label check (#518)

* Fix qdr auth one_time_upgrade label check

* Fix incorrect variable naming on one_time_upgrade label check

* Adjust QDR authentication password generation (#520)

Adjust the passwords being generated for QDR authentication since
certain characters (such as colon) will cause a failure in the parsing
routine within qpid-dispatch. Updates the lookup function to only use
ascii_letters and digits and increases the length to 32 characters.

---------

Co-authored-by: Leif Madsen <lmadsen@redhat.com>

* Add docs for skip_clone (#515)

* [allow_skip_clone] Add docs for clone_repos and *_dir vars

* Align README table column spacing (#516)

* Align README table column spacing

* Update build/stf-run-ci/README.md

---------

Co-authored-by: Emma Foley <elfiesmelfie@users.noreply.github.com>

---------

Co-authored-by: Leif Madsen <lmadsen@redhat.com>

* [zuul] Add STO to required repos (#524)

It appears that STO is not included explictly when running jobs from
SGO [1]. This will be the case in all the other repos.
This change explicitly add it, in case it's not already included by
zuul.

[1] https://review.rdoproject.org/zuul/build/edd8f17bfdac4360a94186b46c4cea3f

* QDR Auth in smoketest (#525)

* QDR Auth in smoketest

* Added qdr-test as a mock of the OSP-side QDR
* Connection from qdr-test -> default-interconnect is TLS+Auth
* Collectors point at qdr-test instead of default-interconnect directly
* Much more realistic than the existing setup
* Eliminated a substitution in sensubility config
* Used default QDR basic auth in Jenkinsfile

* QDR Auth for infrared 17.1 script (#517)

* QDR Auth for infrared 17.1 script

* Fix missing substitution for AMQP_PASS in infrared script

* [zuul] Define a project template for stf-crc-jobs (#514)

* [allow_skip_clone] Use <repo>_dir instead of hardcoding all directories relative to base_dir

This will allow configuration of the repo clone destination, so we can
use pre-cloned dirs instead of explicitly cloning the dirs each time.

This is essential for CI systems like zuul, that set-up the repos with
particular versions/branches prior to running the test scripts.

* [zuul] List the other infrawatch repos as required for the job

* [zuul] Set the {sgo,sg-bridge,sg-core,prometheus-webhook-snmp}_dir vars

Add in the repo dir locations where the repos should be pre-cloned by
zuul

* Replace base_dir with sto_dir

* set sto_dir relative to base_dir is it isn't already set

* [ci] use absolute dir for requirements.txt

* [ci] Update sto_dir using explicit reference

zuul.project.src_dir refers to the current project dir. When using the jobs
in another infrawatch project, this becomes invalid.
Instead, sto_dir is explicitly set using
zuul.projects[<project_name>].src_dir, the same way that the other repo dirs
are set in vars-zuul-common

* [zuul] Define a project template for stf-crc-jobs

Instead of listing all the jobs for each preoject in-repo, and needing to update the list every time
that a new job is added, the project template can be updated and the changes propogated to the
other infrawatch projects

* [zuul] don't enable using the template

* Revert "[zuul] don't enable using the template"

This reverts commit 56e2009.

---------

Co-authored-by: Chris Sibbitt <csibbitt@redhat.com>

* Restart QDR after changing the password (#530)

* Restart QDR after changing the password

* Fixes bug reported here: #517 (comment)
* Avoids an extra manual step when changing password
* Would affect users who upgrade from earlier STF and subsequently enable basic auth
* Also users who need to change their passwords

* Fixing ansible lint

* Update roles/servicetelemetry/tasks/component_qdr.yml

* Adjust QDR restarts to account for HA

* [smoketest] Wait for qdr-test to be Running

* [smoketest] Wait for QDR password upgrade

* Remove zuul QDR auth override

* [zuul] Add jobs to test with different versions of OCP (#432)


* Add crc_ocp_bundle value to select OCP version
* zuul: add log collection post-task to get crc logs
* Add ocp v13 and a timeout to the job

* Update README for 17.1 IR test (#533)

* Update README for 17.1 IR test

Update the 17.1 infrared test script README to show how to deploy a
virtualized workload on the deployed overcloud infrastructure. Helps
with testing by providing additional telemetry to STF required in
certain dashboards.

* Update tests/infrared/17.1/README.md

Co-authored-by: Chris Sibbitt <csibbitt@redhat.com>

* Update tests/infrared/17.1/README.md

---------

Co-authored-by: Chris Sibbitt <csibbitt@redhat.com>

* Support OCP v4.12 through v4.14 (#535)

Support STF 1.5.3 starting at OpenShift version 4.12 due to
incompatibility with 4.11 due to dependency requirements. Our primary
target is support of OCP EUS releases.

Closes: STF-1632

* [stf-collect-logs] Add ignore_errors to task (#529)

The "Question the deployment" task didn't have
ignore_errors: true set, so when the task fails, the play
is finished. This means that we don't get to the
"copy logs" task and can't see the job logs in zuul.

ignore_errors is set to true to be consistent with other tasks

* Mgirgisf/stf 1580/fix log commands (#526)

* update stf-collect-logs tasks
* Update log path
* solve log bugs in stf-run-ci tasks
* create log directory

* Adjust Operator dependency version requirements (#538)

Adjust the operator package dependency requirements to align to known
required versions. Primarily reduce the version of
openshift-cert-manager from 1.10 to 1.7 in order to support the
tech-preview channel which was previously used.

Lowering the version requirement allows for the
openshift-cert-manager-operator installed previously to be used during
the STF 1.5.2 to 1.5.3 update, removing the update from being blocked.

Related: STF-1636

* Clean up stf-run-ci for OCP 4.12 minimum version (#539)

Update the stf-run-ci base setup to no longer need testing against OCP
4.10 and earlier, meaning we can rely on a single workflow for
installation. Also update the deployment to use
cluster-observability-operator via the redhat-operators CatalogSource
for installation via use_redhat and use_hybrid strategies.

* [zuul] Add job to build locally and do an index-based deployment (#495)

* [zuul] Add job to build locally and do an index-based deployment

* Only require Interconnect and Smart Gateway (#541)

* Only require Interconnect and Smart Gateway

Update the dependency management within Service Telemetry Operator to
only require AMQ Interconnect and Smart Gateway Operator, which is
enough to deploy STF with observabilityStrategy: none. Other Operators
can be installed in order to satisfy data storage of telemetry and
events.

Installation of cert-manager is also required, but needs to be
pre-installed similar to Cluster Observability Operator, either as a
cluster-scoped operator with the tech-preview channel, or a single time
on the cluster as a namespace scoped operator, which is how the
stable-v1 channel installs.

Documentation will be updated to adjust for this change.

Related: STF-1636

* Perform CI update to match docs install changes (#542)

* Perform CI update to match docs install changes

Update the stf-run-ci scripting to match the documented installation
procedures which landed in
infrawatch/documentation#513. These changes are
also reflected in #541.

* Update build/stf-run-ci/tasks/setup_base.yml

Co-authored-by: Emma Foley <elfiesmelfie@users.noreply.github.com>

---------

Co-authored-by: Emma Foley <elfiesmelfie@users.noreply.github.com>

* Also drop cert-manager project

The cert-manager project gets created with workload items when deploying
the cert-manager from the cert-manager-operator project. When removing
cert-manager this project is not cleaned up, so we need to delete it as
well.

---------

Co-authored-by: Emma Foley <elfiesmelfie@users.noreply.github.com>

* [stf-run-ci] Explicitly check the validate_daployment was successful (#545)

In [1], the validate_deployment step is successful, despite the
deployment not being successful.
This causes the job to timeout because the following steps continue to
run despite an invalid state.

To get the expected behaviour, the output should be checked for a string
indicating success.
i.e. * [info] CI Build complete. You can now run tests.
[2] shows the output for a successful run.

[1] https://review.rdoproject.org/zuul/build/245ae63e41884dc09353d938ec9058d7/console#5/0/144/controller
[2] https://review.rdoproject.org/zuul/build/802432b23da24649b818985b7b1633bb/console#5/0/82/controller

* Implement dashboard management (#548)

* Implement dashboard management

Implement a new configuration option graphing.grafana.dashboards.enabled
which results in dashboards objects being created for the Grafana
Operator. Previously loading dashboards would be done manually via 'oc
apply' using instructions from documentation.

The new CRD parameters to the ServiceTelemetry object allows the Service
Telemetry Operator to now make the GrafanaDashboard objects directly.

Related: OSPRH-825

* Drop unnecessary cluster roles

* Update CSV for owned parameter

* Remove basic-auth method from grafana (#550)

* Only openshift auth will be allowed

* Adjust Alertmanager SAR to be more specific

* This matches recent changes in prometheus[1] and grafana[2]

[1] https://github.com/infrawatch/service-telemetry-operator/pull/549/files#diff-2cf84bcf66f12393c86949ec0d3f16c473a650173d55549bb02556d23aa22bd2R46
[2] https://github.com/infrawatch/service-telemetry-operator/pull/550/files#diff-ae71801975adb4f8dd4aa5479a66ad46e46f17de40f9d147b2e09e13ce26633eR45

* Revert "Adjust Alertmanager SAR to be more specific"

This reverts commit 0f94fd5.

* Auth to prometheus using token instead of basicauth (#549)

* Auth to prometheus using token instead of basicauth

* Add present/absent logic to prometheus-reader resources

* s/password/token in smoketest output

* [zuul] Make nightly_bundles jobs non-voting (#551)

---------

Co-authored-by: Emma Foley <elfiesmelfie@users.noreply.github.com>

* Fix branch co-ordination in stf-run-ci (#555)

I think it got broken by an oops recently[1].

Since that change, working_branch (`branch` at that point) is never used because version_branches.sgo has a default value.

This breaks the branch co-ordination in Jenkins[2] and in local testing[3].

[1] https://github.com/infrawatch/service-telemetry-operator/pull/512/files#diff-c073fe1e346d08112920aa0bbc8a7453bbd3032b7a9b09ae8cbc70df4db4ea2dR19
[2] https://github.com/infrawatch/service-telemetry-operator/blob/0f94fd577617aee6a85fc4141f98ebdfc49a9f92/Jenkinsfile#L157
[3] https://github.com/infrawatch/service-telemetry-operator/blob/0f94fd577617aee6a85fc4141f98ebdfc49a9f92/README.md?plain=1#L62

* Adjust Alertmanager SAR to be more specific (#553)

* This matches recent changes in prometheus[1] and grafana[2]

[1] https://github.com/infrawatch/service-telemetry-operator/pull/549/files#diff-2cf84bcf66f12393c86949ec0d3f16c473a650173d55549bb02556d23aa22bd2R46
[2] https://github.com/infrawatch/service-telemetry-operator/pull/550/files#diff-ae71801975adb4f8dd4aa5479a66ad46e46f17de40f9d147b2e09e13ce26633eR45

* Add optional spec.replaces field to CSV for update graph compliance

The way we generate our CSVs uses OLM's skipRange functionality. This is fine,
but using only this leads to older versions becoming unavailable after the
fact -- see the warning at [1].

By adding an optional spec.replaces to our CSV we allow update testing as
well as actual production updates for downstream builds that leverage it.

Populating the field requires knowledge of the latest-released bundle,
so we take it from an environment variable to be provided by the
builder. If this is unset we don't include the spec.replaces field at
all -- leaving previous behavior unchanged.

Resolves #559
Related: STF-1658

[1] https://olm.operatorframework.io/docs/concepts/olm-architecture/operator-catalog/creating-an-update-graph/#skiprange

* Stop using ephemeral storage for testing (#547)

Update the __service_telemetry_storage_persistent_storage_class to use CRC PVs
Use the default value (false) for __service_telemetry_storage_ephemeral_enabled

* [zuul] Use extracted CRC nodes in stf-base (#531)

* [zuul] Update base job for stf-base

* Add in required projects: dataplane-operator, infra-operator, openstack-operator

* Remove nodeset from stf-base
  it overrides the nodeset set in the base job.
  The nodeset is going to be used to select the OCP version

* [zuul] define nodesets for easy reuse

* Define the nodeset
* Rename the base
* Select OCP version with the nodeset

* [zuul] Add a login command to get initial kubeconfig file

* [stf-run-ci] Add retries to pre-clean

* Update galaxy requirements

* [ci] Add retry to login command

* [ci] Configure kubeconfig for rhol_crc role

* Apply suggestions from code review

* Zuul: Update how we get the initial kubeconfig (#558)

* use ci-framework infra playbook
* add make targets to do set-up
* link the kubeconfig files
* Remove pre-get_kubeconfig.yml; the script is no longer used

* [ci] Add common-tasks.yml to cover the tasks that setup every play (#556)

* [zuul] Update the labels used for extracted CRC

* Remove non-default cifmw_rhol_crc_kubeconfig value

* Implement support for Grafana Operator v5 (#561)

* Implement support for Grafana Operator v5

Implement changes to support Grafana Operator v5 when the new
grafana.integreatly.org CRD is available. Use the new CRDs as default
when they are available. Fallover to deploying with Grafana Operator v4
when the Grafana Operator v5 CRDs are not available, thereby providing
backwards compatibility to allow administrators time to migrate.

Additionally, the polystat plugin has been removed from the rhos-cloud
dashboard due to compatibility issues with grafana-cli usage when
dynamically loading plugins. Usage of Grafana Operator v5 is also a
target for disconnected support, and dynamically loading plugins in
these environments is expected to be a problem.

Related: OSPRH-2577
Closes: STF-1667

* Default Grafana role set to Admin

In order to match the previous (Grafana Operator v4) role, set
auto_assign_org_role to the Admin value. Default is Viewer.

* Remove old vendored operator_sdk/util collection (#563)

Remove the old 0.1.0 vendored collection operator_sdk/util from the
upstream Dockerfile and repository. Instead use the default
operator_sdk/util in the base image which is a newer version of 0.4.0.

We only use the util collection for one call to k8s_status when
ephemeral storage is enabled. The newer collection also provides a
k8s_event module which could be useful in the future.

Closes: STF-1683

* Add nightly_bundle jobs to periodic pipeline (#564)

The nightly_bundle jobs will run once a day

* Remove hard-coded Prometheus version in template (#565)

Remove the hard-coded Prometheus version in the Prometheus template when
using observabilityStrategy use_redhat, which uses Cluster Observability
Operator to manage the Prometheus instance requests.

Previously this value was hard-coded to prevent a potential rollback
when moving from Community Prometheus Operator to Cluster Observability
Operator.

Resolves: JIRA#OSPRH-2140

* Set features.operators.openshift.io/disconnected to True (#570)

STF can now be deployed in disconnected mode. This change updates
the features.operators.openshift.io/disconnected annotation to
reflect this.

* [stf-run-ci] Update validation check for bundle URLs (#571)

* [stf-run-ci] Update validation check for bundle URLs

An empty string passed as the bundle URL will pass the existing test
of "is defined" and "is not None" and still be invalid.

The validation for the bundle URL can be done in one check per var:

* If the var is undefined, it becomes "", and the check fails, because of length
* If the var is None, there's an error because None does not have a length
* If the var is an empty string, the check fails because of the length

This simplifies the check and improves readability

* Prefer Grafana 9 workload (#575)

Prefer usage of Grafana 9 container image from RHCC. Grafana 7 is EOL
upstream and receives no security support. Prefer use of Grafana 9 which
is still supported.

---------

Co-authored-by: Leif Madsen <lmadsen@redhat.com>
Co-authored-by: Emma Foley <elfiesmelfie@users.noreply.github.com>
Co-authored-by: Chris Sibbitt <csibbitt@redhat.com>
Co-authored-by: Marihan Girgis <102027102+mgirgisf@users.noreply.github.com>
Co-authored-by: Miguel Garcia <migarcia@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants