Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move etcd to openshift-etcd #648

Merged
merged 1 commit into from
Apr 23, 2019

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented Apr 19, 2019

Moves etcd to openshift-etcd namespace. This is related to API freeze, where we commit to names and namespaces for the foreseeable future.

After this, operators make the switch permanently (many are bilingual right now), and we remove the old dns names from the cert.

10 successful CI installs so far

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 19, 2019
@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 19, 2019
@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 19, 2019
@deads2k
Copy link
Contributor Author

deads2k commented Apr 19, 2019

@runcom is there any log on the bootstrap host (or better yet a directory) containing the content used to create the initial hash to make this easier to debug? If we had a directory, we'd be able to pull it out with install-gather.

@runcom
Copy link
Member

runcom commented Apr 19, 2019

@runcom is there any log on the bootstrap host (or better yet a directory) containing the content used to create the initial hash to make this easier to debug? If we had a directory, we'd be able to pull it out with install-gather.

directories in this file is what you need https://github.com/openshift/machine-config-operator/blob/master/manifests/bootstrap-pod-v2.yaml

basically /etc/mcc/boostrap, /etc/mcs/boostrap, /etc/ssl/mcs and /etc/mcs/kubeconfig

everything in those dirs is what the bootstrap serves so if you have the rendered in cluster MC and the one in bootstrap you can diff

Also, I believe that with #612, if, from now on, there will be a drift between current bootstrap and current in-cluster, one could simple $(diff /var/machine-config-daemon/currentconfig rendered-<pool>-<hash>) to find out what drifted

@deads2k
Copy link
Contributor Author

deads2k commented Apr 19, 2019

Also, I believe that with #612, if, from now on, there will be a drift between current bootstrap and current in-cluster, one could simple $(diff /var/machine-config-daemon/currentconfig rendered-<pool>-<hash>) to find out what drifted

@runcom @sdodson That seems like it would be really valuable for install-gather.

@runcom
Copy link
Member

runcom commented Apr 19, 2019

@runcom @sdodson That seems like it would be really valuable for install-gather.

@deads2k what you quoted must be gathered on nodes, not bootstrap. The other dirs I've showed above are what's on the boostrap (and can be gathered with install-gather)

@deads2k
Copy link
Contributor Author

deads2k commented Apr 19, 2019

@deads2k what you quoted must be gathered on nodes, not bootstrap. The other dirs I've showed above are what's on the boostrap (and can be gathered with install-gather)

@runcom install-gather punches through and collects from the nodes too.

@runcom
Copy link
Member

runcom commented Apr 19, 2019

@runcom install-gather punches through and collects from the nodes too.

oh ok cool, I'm sure the dirs from the bootstrap node would definitely contain what we need to debug when there's a skew tho. I'm not entirely sure that dir on the nodes contains anything if we have the drift right at boot. I think gathering /etc/mcc/boostrap, /etc/mcs/boostrap, /etc/ssl/mcs and /etc/mcs/kubeconfigfrom the bootstrap would definitely help as that's really what it's used/served at bootstrap. Diffing that with what's been generated in cluster would shed some light.

@deads2k
Copy link
Contributor Author

deads2k commented Apr 19, 2019

oh ok cool, I'm sure the dirs from the bootstrap node would definitely contain what we need to debug when there's a skew tho. I'm not entirely sure that dir on the nodes contains anything if we have the drift right at boot. I think gathering /etc/mcc/boostrap, /etc/mcs/boostrap, /etc/ssl/mcs and /etc/mcs/kubeconfigfrom the bootstrap would definitely help as that's really what it's used/served at bootstrap. Diffing that with what's been generated in cluster would shed some light.

Thanks. It'll keep until Monday, but I have ended up in this spot many times and it's always been a guess and check mode of getting out of it. In this case, it must be something to do with etcd pod or certs that has namespace references and isn't in the files I touched here. They clearly come out correctly because everything directly using etcd is working fine.

@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 20, 2019
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 20, 2019
@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 22, 2019
@kikisdeliveryservice kikisdeliveryservice removed the request for review from ashcrow April 22, 2019 20:00
@deads2k
Copy link
Contributor Author

deads2k commented Apr 23, 2019

New app flake only

/retest

@deads2k
Copy link
Contributor Author

deads2k commented Apr 23, 2019

Unauthorized

/retest

@deads2k
Copy link
Contributor Author

deads2k commented Apr 23, 2019

network flake I've seen in other pulls

            "message": "(combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_installer-1-ip-10-0-153-95.ec2.internal_openshift-kube-controller-manager_ad847397-65ba-11e9-8292-129bab5f859e_0(28f64991ec9ef53ca109bd7e19880dfb19867338856c95cd7a4696a0dd0feddc): netplugin failed but error parsing its diagnostic message \"\": unexpected end of JSON input",

/retest

@deads2k
Copy link
Contributor Author

deads2k commented Apr 23, 2019

green and conflicts. rebasing

3 green CI installs so far.

@deads2k
Copy link
Contributor Author

deads2k commented Apr 23, 2019

@deads2k
Copy link
Contributor Author

deads2k commented Apr 23, 2019

another install success

/test e2e-aws

@sdodson
Copy link
Member

sdodson commented Apr 23, 2019

infra timeout
/test e2e-aws

@deads2k
Copy link
Contributor Author

deads2k commented Apr 23, 2019

/test e2e-aws-op

@deads2k deads2k changed the title [WIP] experiment with location tweaks move etcd to openshift-etcd Apr 23, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 23, 2019
@kikisdeliveryservice
Copy link
Contributor

@deads2k is there a BZ/open issue somewhere for this change?

@deads2k
Copy link
Contributor Author

deads2k commented Apr 23, 2019

@deads2k is there a BZ/open issue somewhere for this change?

It's related to API freeze and committing to names and namespaces "forever".

/hold

waiting for more e2e results.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 23, 2019
@kikisdeliveryservice
Copy link
Contributor

@deads2k great! can we add that to the commit message body?? 😃

@deads2k
Copy link
Contributor Author

deads2k commented Apr 23, 2019

@deads2k great! can we add that to the commit message body??

@kikisdeliveryservice It's you again! :) Sure, after these tests complete.

Moves etcd to openshift-etcd namespace. This is related to API freeze, where we commit to names and namespaces for the foreseeable future.

After this, operators make the switch permanently (many are bilingual right now), and we remove the old dns names from the cert.
@deads2k
Copy link
Contributor Author

deads2k commented Apr 23, 2019

Now with updated commit message.

@runcom ptal. I think you lgtm'd via slack before. I plan to release the hold for merge early tomorrow AM.

@deads2k
Copy link
Contributor Author

deads2k commented Apr 23, 2019

/hold cancel
/assign @runcom

tag/merge at will in your tomorrow (wednesday) AM.

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 23, 2019
@runcom
Copy link
Member

runcom commented Apr 23, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 23, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 23, 2019
@runcom
Copy link
Member

runcom commented Apr 23, 2019

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 23, 2019
@deads2k
Copy link
Contributor Author

deads2k commented Apr 23, 2019

Service catalog pull is ready and lgtm'd, notifications made, got the ok to merge tonight. Let's go!

/hold cancel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants