Run storage upgrade pre and post master upgrade #4476

mtnbikenc · 2017-06-16T14:23:18Z

Storage upgrades (oc adm migrate storage) is performed prior to master upgrade as well as after. This PR also moves the task into a common playbook for all versions of upgrades, currently 3.3 through 3.6.

See: openshift/origin#14625 (comment)

@smarterclayton Are these CLI options still valid and does this need to be run for all versions of upgrades?

Trello: https://trello.com/c/x2fUWNbK

sdodson · 2017-06-16T14:37:50Z

Implicit here is that oc adm migrate storage does the right thing assuming that the version of the command being executed is matched to the version running in the environment. So on a 3.5 to 3.6 upgrade it will run 3.5 version prior to upgrading to 3.6 and then it will run 3.6 version after upgrading.

It seems to me if we can't rely on that this is a critical flaw in the migrate command.

sdodson

LGTM, need clayton to verify options for migrate command.

mtnbikenc · 2017-06-16T14:39:31Z

Since this is not covered under CI, I'm testing each version of the upgrade playbooks currently in the repo.

sdodson · 2017-06-16T14:48:04Z

Ok, I'd focus first on 3.5 to 3.6 then 3.4 to 3.5 but this change wouldn't land there unless we backport it anyway. I wouldn't bother going any further back than that.

mtnbikenc · 2017-06-16T15:23:11Z

aos-ci-test

sdodson · 2017-06-16T16:27:32Z

@mfojtik you're the only other significant contributor to oadm migrate storage can you validate we're doing the right thing here?

mfojtik · 2017-06-16T16:38:40Z

@enj fyi

…

On 16 June 2017 at 18:27:34, Scott Dodson ***@***.***) wrote: @mfojtik <https://github.com/mfojtik> you're the only other significant contributor to oadm migrate storage can you validate we're doing the right thing here? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4476 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACsaDvneAFTFqsDMjKzoCc2HPpLphLLks5sEqz2gaJpZM4N8hat> .

openshift-bot · 2017-06-16T19:20:06Z

success: "aos-ci-jenkins/OS_3.5_NOT_containerized, aos-ci-jenkins/OS_3.5_NOT_containerized_e2e_tests" for b51d9e9 (logs)

openshift-bot · 2017-06-16T19:21:44Z

success: "aos-ci-jenkins/OS_3.5_containerized, aos-ci-jenkins/OS_3.5_containerized_e2e_tests" for b51d9e9 (logs)

openshift-bot · 2017-06-16T19:21:56Z

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for b51d9e9 (logs)

openshift-bot · 2017-06-16T19:23:32Z

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for b51d9e9 (logs)

mtnbikenc · 2017-06-16T19:48:45Z

Tested 3.5->3.6 upgrade:

2017-06-16 14:40:18,457 p=17669 u=rteague |  PLAY [Pre master upgrade - Upgrade job storage] **********************************************************************************************
2017-06-16 14:40:18,520 p=17669 u=rteague |  TASK [Gathering Facts] ***********************************************************************************************************************
2017-06-16 14:40:19,524 p=17669 u=rteague |  ok: [ec2-52-204-32-41.compute-1.amazonaws.com]
2017-06-16 14:40:19,555 p=17669 u=rteague |  META: ran handlers
2017-06-16 14:40:19,561 p=17669 u=rteague |  TASK [Upgrade job storage] *******************************************************************************************************************
2017-06-16 14:40:19,561 p=17669 u=rteague |  task path: /home/rteague/dev/clusters/aws-c1/openshift-ansible/playbooks/common/openshift-cluster/upgrades/upgrade_control_plane.yml:11
2017-06-16 14:40:21,761 p=17669 u=rteague |  changed: [ec2-52-204-32-41.compute-1.amazonaws.com] => {
    "changed": true,
    "cmd": [
        "oc",
        "adm",
        "--config=/etc/origin/master/admin.kubeconfig",
        "migrate",
        "storage",
        "--include=jobs",
        "--confirm"
    ],
    "delta": "0:00:01.198273",
    "end": "2017-06-16 14:40:21.725065",
    "rc": 0,
    "start": "2017-06-16 14:40:20.526792"
}

STDOUT:

summary: total=0 errors=0 ignored=0 unchanged=0 migrated=0

smarterclayton · 2017-06-16T21:33:56Z

oadm migrate storage does nothing clientside. Instead, it talks to the server and does a blind update. So options would only apply to filtering out things, and should be both reentrant and safe.

I'm not sure why you got 0 total.

smarterclayton · 2017-06-16T21:35:08Z

--include=jobs does not do what you think it does. It only runs migrations on jobs.

smarterclayton · 2017-06-16T21:35:38Z

You want to run oadm migrate storage --include=*,jobs

enj · 2017-06-17T14:42:10Z

if we can't rely on that this is a critical flaw in the migrate command

No, it is just a bug, like any other programmer mistake. There is no way for us to guarantee that we will know everything in advance, but I suppose we can always backport fixes (I assume upgrades would be run with the latest matching minor version).

smarterclayton · 2017-06-18T00:55:30Z

There is a bug fix going to origin which corrects where you see resources already modified - the serialization order of protobuf wasn't stable, which was also a bug in upstream.

sdodson · 2017-06-18T20:21:00Z

I just meant that as a pattern, the playbooks will run the current version of the storage migration before upgrading and the current version after upgrading.

smarterclayton · 2017-06-19T00:43:56Z

I was wrong before, we should not be using the --include flag at all for now. We want to switch to dynamic discovery to control what gets migrated, and so the CLI should let the master decide.

…

On Sun, Jun 18, 2017 at 4:21 PM, Scott Dodson ***@***.***> wrote: I just meant that as a pattern, the installer will run the current version of the storage migration before upgrading and the current version after upgrading. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4476 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p0k-DfCLfng4aWEiCCSrZpZE2h_0ks5sFYaugaJpZM4N8hat> .

sdodson · 2017-06-19T01:02:34Z

ACK, i'll update the pr and get it merged asap. thanks

smarterclayton · 2017-06-19T04:31:21Z

Well the good news is that this caught a bug:

https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/2382/testReport/junit/(root)/Post%20master%20upgrade%20-%20Upgrade%20job%20storage/_localhost__Upgrade_job_storage/

The bad news is that it's now blocking the merge queue and the test queue. @deads2k needs to take a look at the symptom and help assess whether ClusterPolicy should be:

excluded from migration (it's just broken and end users should do manual steps)
automatically updated to not have this error (in practice a user manually has to do this, which is why we have migrations)
require the user to decide what to do before upgrading.

If I'm not mistaken, the error here will block any declarative flow an end user might have (i.e. if you apply your cluster policy, you're broken as well), and this generally fails our "we don't break old APIs on upgrade" rule.

In the meantime we should probably disable this task on those jobs temporarily. @stevekuznetsov

smarterclayton · 2017-06-19T04:32:12Z

It's also possible the post upgrade errors should be flagged for user attention but not blocking, because the cluster is not necessarily broken at this point, just operating as best it can.

enj · 2017-06-19T04:37:50Z

So the error is caused by the tighter validation I added to policy objects. It is not supposed to complain about attribute restrictions if the old and new objects are the same, but the specifics of the ratcheting are complex...

smarterclayton · 2017-06-19T04:39:10Z

This is on upgrade from 3.5 - were any fields defaulted? I don't know if this is before or after reconcile.

sdodson · 2017-06-19T11:30:11Z

@smarterclayton you mean disable the upgrade test or revert this PR temporarilly?

deads2k · 2017-06-19T11:35:36Z

So the error is caused by the tighter validation I added to policy objects. It is not supposed to complain about attribute restrictions if the old and new objects are the same

Must be a bug in the logic somewhere. Maybe it only fails on the container Policy objects? Also, @smarterclayton I really hate virtual storage.

sdodson · 2017-06-19T12:27:15Z

#4491 reverts back to just migrating jobs

smarterclayton · 2017-06-19T14:13:32Z

I know you hate virtual storage :)

mtnbikenc requested review from sdodson and dgoodwin June 16, 2017 14:23

sdodson approved these changes Jun 16, 2017

View reviewed changes

Run storage upgrade pre and post master upgrade

ff1d1ee

sdodson force-pushed the storage-upgrade branch from b51d9e9 to ff1d1ee Compare June 19, 2017 01:13

sdodson changed the title ~~[WIP] Run storage upgrade pre and post master upgrade~~ Run storage upgrade pre and post master upgrade Jun 19, 2017

sdodson merged commit 9545204 into openshift:master Jun 19, 2017

deads2k mentioned this pull request Jun 19, 2017

Migrate cluster policy storage is broken openshift/origin#14743

Closed

mtnbikenc deleted the storage-upgrade branch June 19, 2017 13:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run storage upgrade pre and post master upgrade #4476

Run storage upgrade pre and post master upgrade #4476

mtnbikenc commented Jun 16, 2017

sdodson commented Jun 16, 2017 •

edited

Loading

sdodson left a comment

mtnbikenc commented Jun 16, 2017

sdodson commented Jun 16, 2017

mtnbikenc commented Jun 16, 2017

sdodson commented Jun 16, 2017

mfojtik commented Jun 16, 2017 via email

openshift-bot commented Jun 16, 2017

openshift-bot commented Jun 16, 2017

openshift-bot commented Jun 16, 2017

openshift-bot commented Jun 16, 2017

mtnbikenc commented Jun 16, 2017

smarterclayton commented Jun 16, 2017

smarterclayton commented Jun 16, 2017

smarterclayton commented Jun 16, 2017 •

edited

Loading

enj commented Jun 17, 2017

smarterclayton commented Jun 18, 2017

sdodson commented Jun 18, 2017 •

edited

Loading

smarterclayton commented Jun 19, 2017 via email

sdodson commented Jun 19, 2017

smarterclayton commented Jun 19, 2017

smarterclayton commented Jun 19, 2017 •

edited

Loading

enj commented Jun 19, 2017

smarterclayton commented Jun 19, 2017

sdodson commented Jun 19, 2017

deads2k commented Jun 19, 2017

sdodson commented Jun 19, 2017

smarterclayton commented Jun 19, 2017

Run storage upgrade pre and post master upgrade #4476

Run storage upgrade pre and post master upgrade #4476

Conversation

mtnbikenc commented Jun 16, 2017

sdodson commented Jun 16, 2017 • edited Loading

sdodson left a comment

Choose a reason for hiding this comment

mtnbikenc commented Jun 16, 2017

sdodson commented Jun 16, 2017

mtnbikenc commented Jun 16, 2017

sdodson commented Jun 16, 2017

mfojtik commented Jun 16, 2017 via email

openshift-bot commented Jun 16, 2017

openshift-bot commented Jun 16, 2017

openshift-bot commented Jun 16, 2017

openshift-bot commented Jun 16, 2017

mtnbikenc commented Jun 16, 2017

smarterclayton commented Jun 16, 2017

smarterclayton commented Jun 16, 2017

smarterclayton commented Jun 16, 2017 • edited Loading

enj commented Jun 17, 2017

smarterclayton commented Jun 18, 2017

sdodson commented Jun 18, 2017 • edited Loading

smarterclayton commented Jun 19, 2017 via email

sdodson commented Jun 19, 2017

smarterclayton commented Jun 19, 2017

smarterclayton commented Jun 19, 2017 • edited Loading

enj commented Jun 19, 2017

smarterclayton commented Jun 19, 2017

sdodson commented Jun 19, 2017

deads2k commented Jun 19, 2017

sdodson commented Jun 19, 2017

smarterclayton commented Jun 19, 2017

sdodson commented Jun 16, 2017 •

edited

Loading

smarterclayton commented Jun 16, 2017 •

edited

Loading

sdodson commented Jun 18, 2017 •

edited

Loading

smarterclayton commented Jun 19, 2017 •

edited

Loading