-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add index-based deployment method to stf-run-ci #428
Add index-based deployment method to stf-run-ci #428
Conversation
Create bundle images using BuildConfigs through the internal registry after generating contents using generate_bundles. Create a file-based index image from the bundles created and available from the internal registry. Deploy STF using a local CatalogSource that references the internal index image which allows OLM to stand up dependencies using properties.yaml within the Service Teleletry Operator metadata. Skips over pre-deployment artifacts and uses only data available via CatalogSource for dependency validation.
Initial working code to allow bundle builds to be created in support of index-based (CatalogSource) deployments of STF using local builds. Updates the create_builds.yml logic so that it allows deployments to proceed when builds have already been created. If local builds are enabled then if BuildConfigs have already been created, then the role will lookup the latest Build object and set the internal image path so that the deployment can continue. Primary function is for iterative development. Stubbed out functionality to start creating the index image is available, but still needs to be developed fully.
Update generate_bundle.sh to return a JSON map so that it can be consumed by Ansible in stf-run-ci.
Completed the initial implementation of a local bundle build and index-image created from bundle images. Generated via opm and loaded in a CatalogSource that allows stf-run-ci to Subscribe to the service-telemetry-operator package. Created to allow testing of the properties.yaml now included in Service Telemetry Operator to allow dependencies to be resolved without pre-Subscribing to the Operators. Closes: STF-1362
- clean up some ordering of object creation - update index name in a couple of spots (service-telemetry-framework vs service-telemetry-operator) - more checks to allow better idempotency when running stf-run-ci multiple times - create CatalogSource - syntax error on a couple of plays - add some more clean up to make re-running deployments without full artifact builds possible - always need OperatorGroup from CLI... had a check because testing was done from UI incorrectly
Testing is done via this command:
|
command: oc create secret generic -n {{ namespace }} service-telemetry-framework-index-dockercfg --from-file=.dockerconfigjson=working/service-telemetry-framework-index/config.json --type=kubernetes.io/dockerconfigjson | ||
|
||
- name: Create ImageStream for ose-operator-registry | ||
command: oc import-image -n {{ namespace }} ose-operator-registry:v4.12 --from=registry.redhat.io/openshift4/ose-operator-registry:v4.12 --confirm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: set version take in defaults/main.yml at the very least
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to eh ose-operator-registry:v4.12 version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was. I have made these dynamic now with https://github.com/infrawatch/service-telemetry-operator/pull/428/files#diff-1df29b4ac391c466ca2f73f5e32b2bcdd5862a9b527c6eb85f6c574c89dad927R37-R38
sgo_bundle_image_path: "{{ __internal_registry_path }}/{{ namespace }}/smart-gateway-operator-bundle:{{ sgo_bundle_image_tag }}" | ||
sto_bundle_image_path: "{{ __internal_registry_path }}/{{ namespace }}/service-telemetry-operator-bundle:{{ sto_bundle_image_tag }}" | ||
stf_index_image_path: "{{ __internal_registry_path }}/{{ namespace }}/service-telemetry-framework-index:{{ stf_index_image_tag }}" | ||
|
||
- name: Fail on mutually exclusive flags | ||
fail: | ||
msg: __deploy_from_bundles_enabled not currently supported with __local_build_enabled (but should be) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is going to set us up to do this[1]... I also have an idea for a "fast mode" where we can just oc import-image
from upstream quay.io instead of building artifacts every time. If we're using content from master
it's basically available to us through release-automation repo.
Only build images when same-named branch logic is in use.
[1] msg: __deploy_from_bundles_enabled not currently supported with __local_build_enabled (but should be)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I commented above that this TODO should be mostly taken care of now. I'm not convinced this will work exactly as-is, though. It looks to me like the values you set here for the bundle_image_paths are only valid if you have already run with __deploy_from_index_enabled
, otherwise I don't see anywhere that causes those bundles to be built and pushed. We generate the bundle directory contents (generate_bundle.sh) from setup_stf_local_build.yml whenever __local_build_enabled
, which is good, but the builds of that content themselves only get created and pushed when __deploy_from_index_enabled
. https://github.com/infrawatch/service-telemetry-operator/pull/428/files#diff-6e6fe792e5d5d5aa5631eb6f5ebf17906260e91c22f2dcf635c5c5d4b3d5025aR98
I think the solution is to move the bundle builds out of that block and instead put them in setup_stf_local_build.yml. That way we'd build the bundles immediately after generating the contents for them, so they'll always be available in the internal registry. Note that I haven't actually run this yet to test my theories. Maybe later today. :)
As a reminder to myself and other readers, the reason we need to keep the __deploy_from_bundles_enabled
functionality is for doing testing of downstream builds. The container build process sends bundle image URLs to our CI system, not a catalog index URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea the TODO is not totally covered here for sure. This is out of bounds for this particular PR, but I am going to create an issue to come back to this, hopefully in the short term, but I don't want to over complicate this PR any further, so I'll defer this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as this is tested with __deploy_from_bundles_enabled like in downstream CI, you can defer the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah fair point. I haven't verified I didn't regress something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related PR based on some testing at #436
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly at this point, I'm kind of tired of looking at this and moving things around. I'm going to open a separate issue about moving this out and getting local builds of bundles + bundle deployments to work, but I don't want to do it here and now. You've convinced me there is a use case for it, so I'll get to it, but I need to get this landed and move onto some other pressing documentation issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracking this in #434
build/stf-run-ci/templates/service-telemetry-framework-index-Dockerfile.j2
Outdated
Show resolved
Hide resolved
I'm having issues with my deployment because persistent storage (PVC) stopped working for some reason. However it gets pretty far and I'm confident it shows this working now. I will re-test with ephemeral to get a full deployment. Here is the environment though (note Pods are pending due to PVC requests Pending):
|
sgo_bundle_image_path: "{{ __internal_registry_path }}/{{ namespace }}/smart-gateway-operator-bundle:{{ sgo_bundle_image_tag }}" | ||
sto_bundle_image_path: "{{ __internal_registry_path }}/{{ namespace }}/service-telemetry-operator-bundle:{{ sto_bundle_image_tag }}" | ||
stf_index_image_path: "{{ __internal_registry_path }}/{{ namespace }}/service-telemetry-framework-index:{{ stf_index_image_tag }}" | ||
|
||
- name: Fail on mutually exclusive flags | ||
fail: | ||
msg: __deploy_from_bundles_enabled not currently supported with __local_build_enabled (but should be) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I commented above that this TODO should be mostly taken care of now. I'm not convinced this will work exactly as-is, though. It looks to me like the values you set here for the bundle_image_paths are only valid if you have already run with __deploy_from_index_enabled
, otherwise I don't see anywhere that causes those bundles to be built and pushed. We generate the bundle directory contents (generate_bundle.sh) from setup_stf_local_build.yml whenever __local_build_enabled
, which is good, but the builds of that content themselves only get created and pushed when __deploy_from_index_enabled
. https://github.com/infrawatch/service-telemetry-operator/pull/428/files#diff-6e6fe792e5d5d5aa5631eb6f5ebf17906260e91c22f2dcf635c5c5d4b3d5025aR98
I think the solution is to move the bundle builds out of that block and instead put them in setup_stf_local_build.yml. That way we'd build the bundles immediately after generating the contents for them, so they'll always be available in the internal registry. Note that I haven't actually run this yet to test my theories. Maybe later today. :)
As a reminder to myself and other readers, the reason we need to keep the __deploy_from_bundles_enabled
functionality is for doing testing of downstream builds. The container build process sends bundle image URLs to our CI system, not a catalog index URL.
I ran this again as-is before doing any additional testing/development against the branch to verify it would work in QuickCluster lab. Confirmation that it works. Currently as-is deployment takes ~15 mins. oc new-project service-telemetry ; ansible-playbook -e __service_telemetry_storage_ephemeral_enabled=true -e __deploy_from_index_enabled=true build/run-ci.yaml
[...]
PLAY RECAP ***************************************************************************************************************************************************************************************************************************************
localhost : ok=151 changed=59 unreachable=0 failed=0 skipped=16 rescued=0 ignored=0 |
Could I convince you to test w/ |
I was hoping to avoid the setup for that scenario :) I suppose I could give it the ol' college try. |
Test results: oc new-project service-telemetry ; ansible-playbook -e __service_telemetry_storage_ephemeral_enabled=true -e __service_telemetry_observability_strategy=use_community -e __local_build_enabled=false -e __deploy_from_bundles_enabled=true -e __service_telemetry_bundle_image_path=registry-proxy.engineering.redhat.com/rh-osbs/stf-service-telemetry-operator-bundle:latest -e __smart_gateway_bundle_image_path=registry-proxy.engineering.redhat.com/rh-osbs/stf-smart-gateway-operator-bundle:latest -e pull_secret_registry=brew.registry.redhat.io -e pull_secret_user='...' -e pull_secret_pass='... build/run-ci.yaml TASK [stf-run-ci : debug] **************************************************************************************************************************************************
ok: [localhost] => {
"validate_deployment.stdout_lines": [
"Already on project \"service-telemetry\" on server \"https://api.stf15ocp412.lab.upshift.rdu2.redhat.com:6443\".",
"",
"* [info] Waiting for QDR deployment to complete",
"",
"Waiting for deployment \"default-interconnect\" rollout to finish: 0 of 1 updated replicas are available...",
"deployment \"default-interconnect\" successfully rolled out",
"",
"* [info] Waiting for prometheus deployment to complete",
"",
"Waiting for 1 pods to be ready...",
"statefulset rolling update complete 1 pods at revision prometheus-default-54bf9b8986...",
"",
"* [info] Waiting for elasticsearch deployment to complete ",
"",
"",
"* [info] Waiting for alertmanager deployment to complete",
"",
"statefulset rolling update complete 1 pods at revision alertmanager-default-5c9b8b6dbf...",
"",
"* [info] Waiting for smart-gateway deployment to complete",
"",
"deployment \"default-cloud1-coll-meter-smartgateway\" successfully rolled out",
"deployment \"default-cloud1-coll-event-smartgateway\" successfully rolled out",
"deployment \"default-cloud1-ceil-event-smartgateway\" successfully rolled out",
"deployment \"default-cloud1-ceil-meter-smartgateway\" successfully rolled out",
"deployment \"default-cloud1-sens-meter-smartgateway\" successfully rolled out",
"",
"* [info] Waiting for all pods to show Ready/Complete",
"",
"prometheus-default-0 0/3 PodInitializing 0 25s",
"",
"* [info] CI Build complete. You can now run tests."
]
}
PLAY RECAP *****************************************************************************************************************************************************************
localhost : ok=48 changed=19 unreachable=0 failed=0 skipped=17 rescued=0 ignored=0 oc get pods,sub,csv,catalogsource
NAME READY STATUS RESTARTS AGE
pod/95e46c1268be7c2c50b84fb08b8401533692ae5f2b9f9bc5c0640fc8c8gllbv 0/1 Completed 0 6m30s
pod/alertmanager-default-0 3/3 Running 0 5m31s
pod/dcb3e47a2a038738a77d7a4822ed5c48edd3ae05b481b0ded74e9d6732mbh9n 0/1 Completed 0 7m4s
pod/default-cloud1-ceil-event-smartgateway-7d8666dcf6-7n4z7 2/2 Running 1 (4m5s ago) 4m38s
pod/default-cloud1-ceil-meter-smartgateway-669c6cdcf9-rpnbb 3/3 Running 0 5m1s
pod/default-cloud1-coll-event-smartgateway-58c65dc7dc-82kmf 2/2 Running 1 (4m17s ago) 4m51s
pod/default-cloud1-coll-meter-smartgateway-585855c59d-zbhrq 3/3 Running 0 5m1s
pod/default-cloud1-sens-meter-smartgateway-6f8dffb645-8whlm 3/3 Running 0 5m1s
pod/default-interconnect-6994ff546-ftv69 1/1 Running 0 5m50s
pod/default-snmp-webhook-5bb9fdf947-tdrk9 1/1 Running 0 5m35s
pod/elastic-operator-6fcc4bb777-jx9kv 1/1 Running 0 7m15s
pod/elasticsearch-es-default-0 1/1 Running 0 5m20s
pod/ing-redhat-com-rh-osbs-stf-smart-gateway-operator-bundle-latest 1/1 Running 0 7m20s
pod/interconnect-operator-646bfc886c-dzfjm 1/1 Running 0 7m5s
pod/prometheus-default-0 3/3 Running 0 4m24s
pod/prometheus-operator-54d644d8d7-zqnsc 1/1 Running 0 7m7s
pod/redhat-com-rh-osbs-stf-service-telemetry-operator-bundle-latest 1/1 Running 0 6m39s
pod/service-telemetry-operator-5d65bb69c6-dw264 1/1 Running 0 6m17s
pod/smart-gateway-operator-544c4c4c4c-m7lm8 1/1 Running 0 6m48s
NAME PACKAGE SOURCE CHANNEL
subscription.operators.coreos.com/amq7-interconnect-operator amq7-interconnect-operator redhat-operators 1.10.x
subscription.operators.coreos.com/elasticsearch-eck-operator-certified elasticsearch-eck-operator-certified certified-operators stable
subscription.operators.coreos.com/prometheus prometheus community-operators beta
subscription.operators.coreos.com/service-telemetry-operator-v1-5-1680516659-sub service-telemetry-operator service-telemetry-operator-catalog stable-1.5
subscription.operators.coreos.com/smart-gateway-operator-v5-0-1680516659-sub smart-gateway-operator smart-gateway-operator-catalog stable-1.5
NAME DISPLAY VERSION REPLACES PHASE
clusterserviceversion.operators.coreos.com/amq7-interconnect-operator.v1.10.15 Red Hat Integration - AMQ Interconnect 1.10.15 amq7-interconnect-operator.v1.10.4 Succeeded
clusterserviceversion.operators.coreos.com/elasticsearch-eck-operator-certified.v2.8.0 Elasticsearch (ECK) Operator 2.8.0 elasticsearch-eck-operator-certified.v2.7.0 Succeeded
clusterserviceversion.operators.coreos.com/observability-operator.v0.0.23-230605234749 Observability Operator 0.0.23-230605234749 observability-operator.v0.0.22 Succeeded
clusterserviceversion.operators.coreos.com/prometheusoperator.0.56.3 Prometheus Operator 0.56.3 prometheusoperator.0.47.0 Succeeded
clusterserviceversion.operators.coreos.com/service-telemetry-operator.v1.5.1680516659 Service Telemetry Operator 1.5.1680516659 Succeeded
clusterserviceversion.operators.coreos.com/smart-gateway-operator.v5.0.1680516659 Smart Gateway Operator 5.0.1680516659 Succeeded
NAME DISPLAY TYPE PUBLISHER AGE
catalogsource.operators.coreos.com/service-telemetry-operator-catalog service-telemetry-operator grpc operator-sdk 6m40s
catalogsource.operators.coreos.com/smart-gateway-operator-catalog smart-gateway-operator grpc operator-sdk 7m21s |
(Note I'm not really sure why yet Observability Operator CSV shows up here. Perhaps there is a bug in the logic for |
Add a fail check to make sure we're not deploying from both index images and bundle images at the same time.
OK I'm merging this down now. I think I've done all I can based on feedback (thank you for going through it several times!). This will land into another feature branch so we have another opportunity to approve/reject landing this all the way down to master. |
* Manage Operator dependencies with properties.yaml Use the properties.yaml to manage the packages we require when deploying Service Telemetry Operator. Allows us to reference the Operator name (which you can find via 'oc get packagemanifests' and reviewing the packageName value of the packagemanifest (which should just match the name listed in the oc get packagemanifests output). Constraints allow the use of versions as well, setting a target of >= the current version(ish). Per https://olm.operatorframework.io/docs/concepts/olm-architecture/dependency-resolution/#nested-compound-constraints: > A nested compound constraint, one that contains at least one child compound constraint along with zero or more simple constraints, is evaluated from the bottom up following the procedures described for each above. For Prometheus, we set our ordered list from bottom to top, with a preference for Observability Operator, followed by RHODS Prometheus Operator, and finally Prometheus Operator from the Community Catalog. Closes: STF-1356 * Move smart-gateway-operator dependency to properties.yaml * Update deploy/olm-catalog/service-telemetry-operator/metadata/properties.yaml Co-authored-by: Chris Sibbitt <csibbitt@redhat.com> * Add index-based deployment method to stf-run-ci (#428) * Create bundle and index builds for OLM deployment Create bundle images using BuildConfigs through the internal registry after generating contents using generate_bundles. Create a file-based index image from the bundles created and available from the internal registry. Deploy STF using a local CatalogSource that references the internal index image which allows OLM to stand up dependencies using properties.yaml within the Service Teleletry Operator metadata. Skips over pre-deployment artifacts and uses only data available via CatalogSource for dependency validation. * Add bundle builds to stf-run-ci for index-based deployments Initial working code to allow bundle builds to be created in support of index-based (CatalogSource) deployments of STF using local builds. Updates the create_builds.yml logic so that it allows deployments to proceed when builds have already been created. If local builds are enabled then if BuildConfigs have already been created, then the role will lookup the latest Build object and set the internal image path so that the deployment can continue. Primary function is for iterative development. Stubbed out functionality to start creating the index image is available, but still needs to be developed fully. * Update generate_bundle.sh to return a JSON map Update generate_bundle.sh to return a JSON map so that it can be consumed by Ansible in stf-run-ci. * Working development for index-based deployment Completed the initial implementation of a local bundle build and index-image created from bundle images. Generated via opm and loaded in a CatalogSource that allows stf-run-ci to Subscribe to the service-telemetry-operator package. Created to allow testing of the properties.yaml now included in Service Telemetry Operator to allow dependencies to be resolved without pre-Subscribing to the Operators. Closes: STF-1362 * Test and Tune - clean up some ordering of object creation - update index name in a couple of spots (service-telemetry-framework vs service-telemetry-operator) - more checks to allow better idempotency when running stf-run-ci multiple times - create CatalogSource - syntax error on a couple of plays - add some more clean up to make re-running deployments without full artifact builds possible - always need OperatorGroup from CLI... had a check because testing was done from UI incorrectly * Fix syntax error for pre-Subscription * Remove unused template file * Move OCP version lookup to top of plays (#433) * Make debug output of generate_bundle consistent * Clean up block usage for BuildConfig creation * Set operator base image and tag as parameters * Fix typo in annotation * Add check if index and bundle deploys both enabled (#438) Add a fail check to make sure we're not deploying from both index images and bundle images at the same time. * Drop rhods-prometheus-operator from satisfying STO Don't allow rhods-prometheus-operator package to satisfy for installation of Service Telemetry Operator as it is expected to go away. --------- Co-authored-by: Chris Sibbitt <csibbitt@redhat.com>
Depends-On: infrawatch/smart-gateway-operator#141
Provides an implementation into stf-run-ci that allows you to locally generate and build bundle images for Service Telemetry Operator and Smart Gateway Operator, then create an index image in your OCP cluster. Creates a CatalogSource so that you can load the custom packagemanifests, and then Subscribe to Service Telemetry Operator without having to pre-Subscribe to other dependent Operators.
Builds on the properties.yaml changes to Service Telemetry Operator which describes the Operator dependencies (other than Elasticsearch), making it so the pre-deployed Operators are no longer necessary.