Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCO-395: daemon: Completely remove imageInspect code #3821

Closed

Conversation

cgwalters
Copy link
Member

Actually, I don't believe any of this logic is necessary anymore.

  • We can now unambiguously rely on the container image being bootable;
    if it's not, it should be rpm-ostree that errors out! There's
    no value in us "second guessing" it and having a duplicate fetch
    path; it just creates more problems
  • Now crucially, checkOS had a special case for comparing the
    ostree commit digest, I think to try to deal with the firstboot
    problem. But actually, what we've been doing for a while for
    scaleup of old bootimages is a double reboot, and hence we don't
    need special case logic for this!

This entirely deletes a lot of unnecessary code, more fully
deferring OS updates to rpm-ostree.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 25, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 25, 2023
@cgwalters
Copy link
Member Author

cgwalters commented Jul 25, 2023

This one builds on #3820

This is definitely deleting code which is not dead - we call checkOS a lot. But as I say in the commit message, I think the "compare the ostree commit against remote image" logic was something we developed in the 4.12 development cycle, where we struggled with the "4.11 -> 4.12+" upgrade flow.

Digging through git a bit, we can see the code here was introduced in 9c6b021
"Make verification pass if commit hashes match"
CommitDate: Tue Sep 27 13:58:43 2022 -0500

But...this was pretty close in time to b63b319
"MCO-356: daemon/firstboot: Do a secondary in-place update if rpm-ostree is too old"
CommitDate: Tue Sep 27 13:58:43 2022 -0500

Ah yes, these ended up being rolled into the same PR in #3317

But I would bet that we didn't realize at the time that the latter logic of doing an extra reboot obviates the first commit.

And with all this combined, the MCD is not forking off skopeo/podman inspect at all anymore!

@cgwalters
Copy link
Member Author

Verifying this needs testing scaleup from old bootimages specifically.

@cgwalters
Copy link
Member Author

Interesting, in that e2e-gcp-op run, the test passed but it was just the gather-extra phase which timed out:

�[36mINFO�[0m[2023-07-25T23:11:11Z] Running step e2e-gcp-op-test.                
�[36mINFO�[0m[2023-07-26T00:36:25Z] Step e2e-gcp-op-test succeeded after 1h25m14s. 
�[36mINFO�[0m[2023-07-26T00:36:25Z] Step phase test succeeded after 1h25m14s.    
�[36mINFO�[0m[2023-07-26T00:36:25Z] Running multi-stage phase post               
�[36mINFO�[0m[2023-07-26T00:36:25Z] Running step e2e-gcp-op-gather-gcp-console.  
�[36mINFO�[0m[2023-07-26T00:36:59Z] Step e2e-gcp-op-gather-gcp-console succeeded after 34s. 
�[36mINFO�[0m[2023-07-26T00:36:59Z] Running step e2e-gcp-op-gather-must-gather.  
�[36mINFO�[0m[2023-07-26T00:42:19Z] Step e2e-gcp-op-gather-must-gather succeeded after 5m19s. 
�[36mINFO�[0m[2023-07-26T00:42:19Z] Running step e2e-gcp-op-gather-extra.        
{"component":"entrypoint","file":"k8s.io/test-infra/prow/entrypoint/run.go:164","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 4h0m0s timeout","severity":"error","time":"2023-07-26T00:43:09Z"}
�[36mINFO�[0m[2023-07-26T00:43:09Z] Received signal.                              �[36msignal�[0m=interrupt
�[36mINFO�[0m[2023-07-26T00:43:09Z] error: Process interrupted with signal interrupt, cancelling execution... 

Doesn't seem like it can be related
/retest

@cgwalters
Copy link
Member Author

/retest

@cgwalters
Copy link
Member Author

xref openshift/enhancements#1432 which I think wants this

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 19, 2023
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 21, 2023
Actually, I don't believe any of this logic is necessary anymore.

- We can now unambiguously rely on the container image being bootable;
  if it's not, it should be rpm-ostree that errors out!  There's
  no value in us "second guessing" it and having a duplicate fetch
  path; it just creates more problems
- Now crucially, `checkOS` had a special case for comparing the
  ostree commit digest, I think to try to deal with the firstboot
  problem.  But actually, what we've been doing for a while for
  scaleup of old bootimages is a double reboot, and hence we don't
  need special case logic for this!

This entirely deletes a lot of unnecessary code, more fully
deferring OS updates to rpm-ostree.
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 6, 2023

@cgwalters: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cgwalters
Copy link
Member Author

There are no hits in CI search for "technically in the right image" so I think this is safe.

@cgwalters cgwalters changed the title daemon: Completely remove imageInspect code MCO-395: daemon: Completely remove imageInspect code Sep 28, 2023
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 28, 2023
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 28, 2023

@cgwalters: This pull request references MCO-395 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.15.0" version, but no target version was set.

In response to this:

Actually, I don't believe any of this logic is necessary anymore.

  • We can now unambiguously rely on the container image being bootable;
    if it's not, it should be rpm-ostree that errors out! There's
    no value in us "second guessing" it and having a duplicate fetch
    path; it just creates more problems
  • Now crucially, checkOS had a special case for comparing the
    ostree commit digest, I think to try to deal with the firstboot
    problem. But actually, what we've been doing for a while for
    scaleup of old bootimages is a double reboot, and hence we don't
    need special case logic for this!

This entirely deletes a lot of unnecessary code, more fully
deferring OS updates to rpm-ostree.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 28, 2023
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 27, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 27, 2024
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Feb 27, 2024
Copy link
Contributor

openshift-ci bot commented Feb 27, 2024

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants