Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd boostrapping using dns #143

Merged
merged 5 commits into from
Oct 27, 2018

Conversation

abhinavdahiya
Copy link
Contributor

@abhinavdahiya abhinavdahiya commented Oct 23, 2018

  1. cmd: add setup-etcd-environment

This outputs an environment file with envs that allow etcd to discover its ip address
and the corresponding dns name from the discovery-srv records.

It looks up the _etcd-server-ssl._tcp.<domain specified by --discovery-srv> for reverse lookup
of its own dns name.

An example of such file is

$ cat /etc/etcd.env/etcd-environment
ETCD_IPV4_ADDRESS=192.168.126.11
ETCD_DNS_NAME=adahiya-0-etcd-0.tt.testing

when the --discovery-srv was tt.testing

dig +noall +answer _etcd-server-ssl._tcp.tt.testing SRV
_etcd-server-ssl._tcp.tt.testing. 0 IN  SRV     0 10 2380 adahiya-0-etcd-0.tt.testing.

The command retries reverse lookup until 5 minutes to allow for dns to be available.

  1. templates: bootstrap etcd using dns discovery

Using the guide https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/clustering.md#dns-discovery

Also adds setup-etcd-environment.service that uses the setup-etcd-environment cli to setup the discovery params.

  1. cleanup across controller, server and operator to drop etcd index, etcd initial count.

This requires openshift/installer#526

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 23, 2018
@openshift openshift deleted a comment from openshift-ci-robot Oct 23, 2018
@ashcrow
Copy link
Member

ashcrow commented Oct 24, 2018

/test e2e-aws

@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 24, 2018
@abhinavdahiya abhinavdahiya changed the title WIP: etcd boostrapping using dns etcd boostrapping using dns Oct 24, 2018
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 24, 2018
@abhinavdahiya
Copy link
Contributor Author

/cc @crawford @aaronlevy

This seem reasonable?

@ashcrow
Copy link
Member

ashcrow commented Oct 25, 2018

FWIW @sdemos opened up a request as to why e2e-aws hangs for 2+ hours and then fails. We're seeing it on other PRs as well over the last 3 days.

@ashcrow
Copy link
Member

ashcrow commented Oct 26, 2018

/test e2e-aws

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 27, 2018
@openshift-merge-robot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-merge-robot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@abhinavdahiya
Copy link
Contributor Author

/retest

@abhinavdahiya
Copy link
Contributor Author

/test e2e-aws

@abhinavdahiya
Copy link
Contributor Author

/retest

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Oct 27, 2018
@abhinavdahiya
Copy link
Contributor Author

e2e-aws tests completed, teardown hung :(

2018/10/27 07:48:54 Container setup in pod e2e-aws completed successfully
2018/10/27 08:01:07 Container test in pod e2e-aws completed successfully
2018/10/27 10:41:04 Copying artifacts from e2e-aws into /logs/artifacts/e2e-aws
2018/10/27 10:41:05 error: unable to signal to artifacts container to terminate in pod e2e-aws, triggering deletion: could not run remote command: unable to upgrade connection: container not found ("artifacts")
2018/10/27 10:41:05 error: unable to retrieve artifacts from pod e2e-aws: could not read gzipped artifacts: unable to upgrade connection: container not found ("artifacts")
E1027 10:41:09.973225      11 event.go:200] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:".1561711212047510", GenerateName:"", Namespace:"ci-op-zp6jy1rz", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, InvolvedObject:v1.ObjectReference{Kind:"", Namespace:"ci-op-zp6jy1rz", Name:"", UID:"", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"CiJobFailed", Message:"Running job pull-ci-openshift-machine-config-operator-master-e2e-aws for PR https://github.com/openshift/machine-config-operator/pull/143 in namespace ci-op-zp6jy1rz from author abhinavdahiya", Source:v1.EventSource{Component:"ci-op-zp6jy1rz", Host:""}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbeed2e1179670310, ext:11024912575830, loc:(*time.Location)(0x1973400)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbeed2e1179670310, ext:11024912575830, loc:(*time.Location)(0x1973400)}}, Count:1, Type:"Warning", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events ".1561711212047510" is forbidden: unable to create new content in namespace ci-op-zp6jy1rz because it is being terminated' (will not retry!)

/retest

@crawford
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 27, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, crawford

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,crawford]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@abhinavdahiya
Copy link
Contributor Author

level=fatal msg="Error executing openshift-install: RequestLimitExceeded: Request limit exceeded.\n\tstatus code: 503, request id: 363ae6fb-81bc-40c2-b837-cc5b03692f4e"
2018/10/27 19:21:20 Container setup in pod e2e-aws failed, exit code 1, reason Error
Another process exited

Whats up with our ci? :(

/retest

@ashcrow
Copy link
Member

ashcrow commented Oct 27, 2018

Yeah something is up. We've been trying to merge a PR for a bit over a week :-(.

@openshift-merge-robot openshift-merge-robot merged commit 038d67d into openshift:master Oct 27, 2018
@abhinavdahiya abhinavdahiya deleted the etcd_dns branch October 27, 2018 21:07
abhinavdahiya added a commit to abhinavdahiya/release that referenced this pull request Oct 29, 2018
…ent binary

openshift/machine-config-operator#143 added a new binary that is
used to do etcd bootstrapping by MachineConfigOperator. Need to build this new component through ci
pipeline.
wking added a commit to wking/machine-config-operator that referenced this pull request Dec 16, 2018
The last consumer was removed by 4cc7988 (server: remove etcd_index
GET param, 2018-10-26, openshift#143).
wking added a commit to wking/machine-config-operator that referenced this pull request Dec 17, 2018
The last consumer was removed by 4cc7988 (server: remove etcd_index
GET param, 2018-10-26, openshift#143).
osherdp pushed a commit to osherdp/machine-config-operator that referenced this pull request Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants