Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1794466: modules/installation-mirror-repository: Pivot to by-digest pullspecs #19266

Closed
wants to merge 1 commit into from

Conversation

wking
Copy link
Member

@wking wking commented Jan 23, 2020

The OCP_RELEASE stuff landed in ced98c6 (#16678). Since then, 4.2.14 and later (including some 4.1 releases) have been suffixing the Quay tags with architecture information. By pivoting to digests pulled from the release errata, we decouple these docs from tag names. Also pull the repo portion of the pullspec out of the errata, so there's less hard-coding in the docs.

@openshift-ci-robot
Copy link

@wking: This pull request references Bugzilla bug 1794466, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Bug 1794466: modules/installation-mirror-repository: Pivot to by-digest pullspecs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 23, 2020
@kalexand-rh kalexand-rh added this to the Next Release milestone Jan 24, 2020
Copy link
Contributor

@kalexand-rh kalexand-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking, I have a few text requests/suggestions.

modules/installation-mirror-repository.adoc Outdated Show resolved Hide resolved
modules/installation-mirror-repository.adoc Outdated Show resolved Hide resolved
modules/installation-mirror-repository.adoc Outdated Show resolved Hide resolved
modules/installation-mirror-repository.adoc Outdated Show resolved Hide resolved
modules/installation-mirror-repository.adoc Outdated Show resolved Hide resolved
@kalexand-rh
Copy link
Contributor

@jianlinliu, will you PTAL?

@wking
Copy link
Member Author

wking commented Jan 24, 2020

Suggestions all committed; thanks :). Will squash back down to one commit shortly.

@mrobson
Copy link

mrobson commented Jan 24, 2020

@kalexand-rh Per Luke, 4.1.30+ also changes the arch, so this should go back into 4.1 as well.

wking added a commit to wking/snc that referenced this pull request Jan 24, 2020
Add a MIRROR variable in case folks want to point at a different
mirror for whatever reason.  If neither OPENSHIFT_VERSION nor a Git
tag are available, use the latest/ entry on the mirror to fetch the
latest release (note that this is not pinned to a particular 4.y, it's
just the latest stable release).  Drop get_openshift_version and just
set up OPENSHIFT_RELEASE_VERSION instead; no need to delay that.  Echo
the chosen release version and how it was determined to be more
transparent and reduce surprise.  Drop empty-OPENSHIFT_RELEASE_VERSION
checks from the rest of the script, now that OPENSHIFT_RELEASE_VERSION
is always set.

Pull the release pullspec from the mirror's release.txt for the chosen
version.  This decouples us from the Quay tagging scheme [1,2],
decouples us from the particular repository (doesn't even have to get
pushed to Quay), and protects us from malicious registries that could
point the mutable tag at a malicious image.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1794466
[2]: openshift/openshift-docs#19266
@wking
Copy link
Member Author

wking commented Jan 24, 2020

Squashed back down to a single commit.

@@ -22,38 +22,41 @@ Complete the following steps on the bastion host:
. Review the
link:https://access.redhat.com/downloads/content/290/[{product-title} downloads page]
to determine the version of {product-title} that you want to install.
The image release advisory includes the pullspec repository and digest. For example, the link:https://access.redhat.com/errata/product/290/ver=4.3/rhel---7/x86_64/RHBA-2020:0062[put the title of the page that loads] release uses `quay.io/openshift-release-dev/ocp-release` for the pullspec repository and `sha256:3a516480dfd68e0f87f702b4d7bdd6f6a0acfdac5cd2e9767b838ceede34d70d` for the x86-64 4.3.0 release image digest.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both image tag and image digest is showing on the same advisory page, why here we have to replace image tag with image digest. We just need tell user to get correct image tag form advisory page, no need to modify the following detailed steps. Do I miss something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need tell user to get correct image tag form advisory page, no need to modify the following detailed steps.

Then you have to worry about divergence between the advisory and actual tag names. Obviously we'd like to keep those synchronized, but didn't always do so over the multi-arch transition. Also, users got used to the tag == version pattern we had been using for amd64 and were surprised when arch suffixes broke that pattern. By-digest flows avoid implying a tag-name pattern. Personally, I'd rather remove tag names from Errata text entirely, and this change sets the stage for that too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tag name is easier reading, and I saw this in another PR - #19380.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tag name is easier reading...

Do you need to read this? You just copy/paste out of the errata.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am a customer, I want to get know which version my cluster is running on, which version my cluster will go forward to. But digest can not clearly tell me that. Of course, this is only my personal thoughts and preference.

Copy link

@jianlinliu jianlinliu Feb 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this change make me remember my original testing mentioned in https://issues.redhat.com/browse/CORS-1105?focusedCommentId=13851937&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13851937. Do you mean this change will not cause that issue any more?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...But digest can not clearly tell me that...

You just copied the digest off a specific release's errata page. So you should remember for the duration of the copy/paste.

And this change make me remember my original testing...

Copying out here, your issue seems to have been:

# oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release@sha256:ce9cc2ba29ab270091266f1f20240cfe6e6dcb11ccc61d5c717ca6b812925613 --force
Updating to release image registry.svc.ci.openshift.org/ocp/release@sha256:ce9cc2ba29ab270091266f1f20240cfe6e6dcb11ccc61d5c717ca6b812925613
# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-08-26-202352   True        False         8m55s   Cluster version is registry.svc.ci.openshift.org/ocp/release@sha256:ce9cc2ba29ab270091266f1f20240cfe6e6dcb11ccc61d5c717ca6b812925613

First, in case anyone is tempted to copy/paste, that should use --allow-explicit-upgrade and not --force. About the digest in the status, seems like a CVO bug to me. As you can see from the version column, the CVO is able to retrieve the version string compiled into the release image. And that compiled-in version is covered by the image signature, so it's more reliable than the tag, which is not. Would addressing the status-string issue in the CVO cover your concerns with this doc change?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would addressing the status-string issue in the CVO cover your concerns with this doc change?

Yeah, if the status string can clearly show version info too, that would be better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 4.6.0-0.nightly-2020-09-28-125756 -> 4.6.0-0.nightly-2020-09-28-171716, the e2e logs have:

Sep 28 18:04:05.430: INFO: Starting upgrade to version= image=registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-09-28-171716
STEP: continuously hitting pods through the service's LoadBalancer
Sep 28 18:04:35.732: INFO: cluster upgrade is Progressing: Working towards 4.6.0-0.nightly-2020-09-28-171716: 0% complete

And the final ClusterVersion includes:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1310630479774355456/artifacts/e2e-aws-upgrade/clusterversion.json | jq '.items[] | {spec: {desiredUpdate: .spec.desiredUpdate}, status}'
{
  "spec": {
    "desiredUpdate": {
      "force": true,
      "image": "registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-09-28-171716",
      "version": ""
    }
  },
  "status": {
    "availableUpdates": null,
    "conditions": [
      {
        "lastTransitionTime": "2020-09-28T18:00:46Z",
        "message": "Done applying 4.6.0-0.nightly-2020-09-28-171716",
        "status": "True",
        "type": "Available"
      },
      {
        "lastTransitionTime": "2020-09-28T18:00:46Z",
        "status": "False",
        "type": "Failing"
      },
      {
        "lastTransitionTime": "2020-09-28T18:13:49Z",
        "message": "Cluster version is 4.6.0-0.nightly-2020-09-28-171716",
        "status": "False",
        "type": "Progressing"
      },
      {
        "lastTransitionTime": "2020-09-28T17:31:47Z",
        "message": "Unable to retrieve available updates: currently reconciling cluster version 4.6.0-0.nightly-2020-09-28-171716 not found in the \"stable-4.6\" channel",
        "reason": "VersionNotFound",
        "status": "False",
        "type": "RetrievedUpdates"
      }
    ],
    "desired": {
      "image": "registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-09-28-171716",
      "version": "4.6.0-0.nightly-2020-09-28-171716"
    },
    "history": [
      {
        "completionTime": "2020-09-28T18:13:49Z",
        "image": "registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-09-28-171716",
        "startedTime": "2020-09-28T18:04:33Z",
        "state": "Completed",
        "verified": false,
        "version": "4.6.0-0.nightly-2020-09-28-171716"
      },
      {
        "completionTime": "2020-09-28T18:00:46Z",
        "image": "registry.svc.ci.openshift.org/ocp/release@sha256:5721030240b241996401f633506b91c7ad1f61acb0c45b669489df3b3917cc8c",
        "startedTime": "2020-09-28T17:31:47Z",
        "state": "Completed",
        "verified": false,
        "version": "4.6.0-0.nightly-2020-09-28-125756"
      }
    ],
    "observedGeneration": 2,
    "versionHash": "WtyYPY_PXQ0="
  }
}

So I think the --to-image version name handling was fixed by openshift/cluster-version-operator#419.

praveenkumar pushed a commit to crc-org/snc that referenced this pull request Feb 12, 2020
Add a MIRROR variable in case folks want to point at a different
mirror for whatever reason.  If neither OPENSHIFT_VERSION nor a Git
tag are available, use the latest/ entry on the mirror to fetch the
latest release (note that this is not pinned to a particular 4.y, it's
just the latest stable release).  Drop get_openshift_version and just
set up OPENSHIFT_RELEASE_VERSION instead; no need to delay that.  Echo
the chosen release version and how it was determined to be more
transparent and reduce surprise.  Drop empty-OPENSHIFT_RELEASE_VERSION
checks from the rest of the script, now that OPENSHIFT_RELEASE_VERSION
is always set.

Pull the release pullspec from the mirror's release.txt for the chosen
version.  This decouples us from the Quay tagging scheme [1,2],
decouples us from the particular repository (doesn't even have to get
pushed to Quay), and protects us from malicious registries that could
point the mutable tag at a malicious image.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1794466
[2]: openshift/openshift-docs#19266
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 16, 2020
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 28, 2020
@wking
Copy link
Member Author

wking commented Sep 28, 2020

/remove-lifecycle stale

I still think we want this, but I need to look at version reporting in --to-image updates.

@openshift-ci-robot openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 28, 2020
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 28, 2020
@wking
Copy link
Member Author

wking commented Sep 28, 2020

Rebased onto master and updated to include changes to the very-similar modules/update-mirror-repository.adoc with 3649282d2d -> 241b8a515c.

@kalexand-rh
Copy link
Contributor

@jianlinliu, is this ready to merge now?

@jianlinliu
Copy link

jianlinliu commented Sep 30, 2020

QE already switched tag name to digest in some part of internal testing, but several CEE guys have asked some questions, about digest not easy reading, not having arch info and version info from its naming, where to get digest (if not read from advisories), which is similar concern raised in my previous comments.
I am okay with this. Maybe we can publish this officially. let us see what customer say.

@wking
Copy link
Member Author

wking commented Sep 30, 2020

...not having arch info and version info from its naming...

These can be extracted with oc adm release info $PULLSPEC.

...where to get digest (if not read from advisories)...

For updates, clusters extract this from Update Service Cincinnati graphs, e.g.

$ curl -sH Accept:application/json 'https://api.openshift.com/api/upgrades_info/v1/graph?channel=fast-4.5&arch=amd64' | jq -r '.nodes[] | select(.version == "4.4.23").payload'
quay.io/openshift-release-dev/ocp-release@sha256:0455e0201f475a836f2474d4af7864a55208a33eb6932027f63109bbbd821b65

But advisory content from the mirrors is signed with GPG, which is more secure than trusting that the Update Service is uncompromised (even if you trust the TLS between you and the Update Service). Still, it's an option, and if you follow up and check that the by-digest pullspec has a valid GPG signature and the name/arch you expected, then you are protected from any Update Service shenanigans. In this space is the in-flight openshift/cluster-version-operator#431.

@jianlinliu
Copy link

Thanks for your detailed explanation. As QE, I know how to get the image version/name/arch info from the pullspec, such as: you mentioned oc amd release info.
I just raised these concern standing from customer perspective. Maybe they even did not know too much about these tricks, heard form some customer bug, sometimes customer even did not install oc command, especially in their planing phase.
Maybe we can list these tricks on our official document for better friendly work experience.

The OCP_RELEASE stuff landed in ced98c6 (osdocs-626 preparing for
disconnected installation, 2019-08-26, openshift#16678).  Since then, 4.2.14
and later (including some 4.1 releases) have been suffixing the Quay
tags with architecture information [1].  By pivoting to digests pulled
from the release errata, we decouple these docs from tag names.  Also
pull the repo portion of the pullspec out of the errata, so there's
less hard-coding in the docs.

Checking the to-disk flow using the 4.5.11 oc to mirror the 4.5.11
release image:

  $ oc version --client
  Client Version: 4.5.11
  $ oc adm release mirror --to-dir=/tmp/mirror quay.io/openshift-release-dev/ocp-release@sha256:4d048ae1274d11c49f9b7e70713a072315431598b2ddbb512aee4027c422fe3e
  info: Mirroring 110 images to file://openshift/release ...
  ...
  Update image:  openshift/release:4.5.11

  To upload local images to a registry, run:

      oc image mirror --from-dir=/tmp/mirror 'file://openshift/release:4.5.11*' REGISTRY/REPOSITORY

  Configmap signature file /tmp/mirror/config/signature-sha256-4d048ae1274d11c4.yaml created

I've gone with :4.* in the 'image mirror' examples, because I don't
want folks to have to bother creating an environment variable, and we
have no roadmap for a backwards-incompatible OpenShift v5 today.

The reference to 'oc adm release info ...' should mitigate concerns
about folks who want to map from by-digest pullspecs back to the
release metadata [2].  I'm not linking to an existing section on that
command, because the only section we have around that today is
_unused_topics/exploring-cvo.adoc.  There are some other mentions,
e.g. with --changelog in modules/cli-administrator-other.adoc and with
--commits in modules/querying-operator-status-after-installation.adoc,
but an inline comment seemed simpler here than trying to link to any
of those locations with enough guidance for the reader to re-orient
once they followed the link.

I also updated the very similar modules/update-mirror-repository.adoc
with content that minimally diverged from the installation module.  It
might be worth unifying these modules with conditionals for
imageContentSources vs. ImageContentSourcePolicy, but I'm punting on
that for now.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1794466
[2]: openshift#19266 (comment)
@wking
Copy link
Member Author

wking commented Sep 30, 2020

Maybe they even did not know too much about these tricks...

I've pushed 241b8a515c -> 62bad59 to mention oc adm release info ... after both OCP_RELEASE_DIGEST definitions.

...sometimes customer even did not install oc command...

Both of these flows are setting up oc adm release mirror ... invocations, so I think we can safely assume folks have installed oc. Since 4.6, folks who encounter the by-digest pullspec in other locations should also be receiving the version string in close proximity. If there are remaining use-cases where you expect folks to encounter by-digest release image pullspecs without the context of a version number, can you list them?

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 13, 2021
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 12, 2021
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

@wking: This pull request references Bugzilla bug 1794466. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state.

In response to this:

Bug 1794466: modules/installation-mirror-repository: Pivot to by-digest pullspecs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch/enterprise-4.3 branch/enterprise-4.4 branch/enterprise-4.5 branch/enterprise-4.6 bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants