-
Notifications
You must be signed in to change notification settings - Fork 410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcd boostrapping using dns #143
etcd boostrapping using dns #143
Conversation
a53a6ad
to
18f9e4e
Compare
a7cb916
to
3da81ae
Compare
/test e2e-aws |
3da81ae
to
2785da5
Compare
46b57dd
to
b96fee2
Compare
/cc @crawford @aaronlevy This seem reasonable? |
FWIW @sdemos opened up a request as to why e2e-aws hangs for 2+ hours and then fails. We're seeing it on other PRs as well over the last 3 days. |
/test e2e-aws |
b96fee2
to
1ef60f8
Compare
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest |
/test e2e-aws |
/retest |
c07c9b8
to
45318ed
Compare
Using the guide https://github.com/etcd-io/etcd/blob/583763261f1c843e07c1bf7fea5fb4cfb684fe87/Documentation/op-guide/clustering.md#dns-discovery Also adds `setup-etcd-environment.service` that uses the setup-etcd-environment cli to setup the discovery params.
45318ed
to
4cc7988
Compare
e2e-aws tests completed, teardown hung :( 2018/10/27 07:48:54 Container setup in pod e2e-aws completed successfully
2018/10/27 08:01:07 Container test in pod e2e-aws completed successfully
2018/10/27 10:41:04 Copying artifacts from e2e-aws into /logs/artifacts/e2e-aws
2018/10/27 10:41:05 error: unable to signal to artifacts container to terminate in pod e2e-aws, triggering deletion: could not run remote command: unable to upgrade connection: container not found ("artifacts")
2018/10/27 10:41:05 error: unable to retrieve artifacts from pod e2e-aws: could not read gzipped artifacts: unable to upgrade connection: container not found ("artifacts")
E1027 10:41:09.973225 11 event.go:200] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:".1561711212047510", GenerateName:"", Namespace:"ci-op-zp6jy1rz", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, InvolvedObject:v1.ObjectReference{Kind:"", Namespace:"ci-op-zp6jy1rz", Name:"", UID:"", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"CiJobFailed", Message:"Running job pull-ci-openshift-machine-config-operator-master-e2e-aws for PR https://github.com/openshift/machine-config-operator/pull/143 in namespace ci-op-zp6jy1rz from author abhinavdahiya", Source:v1.EventSource{Component:"ci-op-zp6jy1rz", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbeed2e1179670310, ext:11024912575830, loc:(*time.Location)(0x1973400)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbeed2e1179670310, ext:11024912575830, loc:(*time.Location)(0x1973400)}}, Count:1, Type:"Warning", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events ".1561711212047510" is forbidden: unable to create new content in namespace ci-op-zp6jy1rz because it is being terminated' (will not retry!) /retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya, crawford The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
level=fatal msg="Error executing openshift-install: RequestLimitExceeded: Request limit exceeded.\n\tstatus code: 503, request id: 363ae6fb-81bc-40c2-b837-cc5b03692f4e"
2018/10/27 19:21:20 Container setup in pod e2e-aws failed, exit code 1, reason Error
Another process exited Whats up with our ci? :( /retest |
Yeah something is up. We've been trying to merge a PR for a bit over a week :-(. |
…ent binary openshift/machine-config-operator#143 added a new binary that is used to do etcd bootstrapping by MachineConfigOperator. Need to build this new component through ci pipeline.
The last consumer was removed by 4cc7988 (server: remove etcd_index GET param, 2018-10-26, openshift#143).
The last consumer was removed by 4cc7988 (server: remove etcd_index GET param, 2018-10-26, openshift#143).
This outputs an environment file with envs that allow etcd to discover its ip address
and the corresponding dns name from the discovery-srv records.
It looks up the
_etcd-server-ssl._tcp.<domain specified by --discovery-srv>
for reverse lookupof its own dns name.
An example of such file is
when the
--discovery-srv
wastt.testing
The command retries reverse lookup until 5 minutes to allow for dns to be available.
Using the guide https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/clustering.md#dns-discovery
Also adds
setup-etcd-environment.service
that uses the setup-etcd-environment cli to setup the discovery params.This requires openshift/installer#526