Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange enpoint target during kubespray installation #14145

Closed
webchi opened this issue Jun 23, 2022 · 9 comments
Closed

Strange enpoint target during kubespray installation #14145

webchi opened this issue Jun 23, 2022 · 9 comments

Comments

@webchi
Copy link

webchi commented Jun 23, 2022

What happened?

Can't get etcd working witch strange hash into endpoint - etcd-endpoints://0xc0000ea700/127.0.0.1:2379 is it ok?

etcdctl member list
{"level":"warn","ts":"2022-06-22T18:47:58.361+0300","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000ea700/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection closed"}
Error: context deadline exceeded

What did you expect to happen?

Healthy etcd cluster

How can we reproduce it (as minimally and precisely as possible)?

Install etcd with kubespray

Anything else we need to know?

No response

Etcd version (please run commands below)

$ etcd --version
etcd Version: 3.5.3
Git SHA: 0452feec7
Go Version: go1.16.15
Go OS/Arch: linux/amd64


$ etcdctl version
# paste output here
etcdctl version: 3.5.3
API version: 3.5

Etcd configuration (command line flags or environment variables)

Environment file for etcd v3.5.3

ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://10.1.0.100:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://10.1.0.100:2380
ETCD_INITIAL_CLUSTER_STATE=new
ETCD_METRICS=basic
ETCD_LISTEN_CLIENT_URLS=https://10.1.0.100:2379,https://127.0.0.1:2379
ETCD_ELECTION_TIMEOUT=5000
ETCD_HEARTBEAT_INTERVAL=250
ETCD_INITIAL_CLUSTER_TOKEN=k8s_etcd
ETCD_LISTEN_PEER_URLS=https://10.1.0.100:2380
ETCD_NAME=etcd1
ETCD_PROXY=off
ETCD_INITIAL_CLUSTER=etcd1=https://10.1.0.100:2380,etcd2=https://10.1.0.101:2380,etcd3=https://10.1.0.102:2380
ETCD_AUTO_COMPACTION_RETENTION=8
ETCD_SNAPSHOT_COUNT=10000

Flannel need etcd v2 API

ETCD_ENABLE_V2=true

TLS settings

ETCD_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
ETCD_CERT_FILE=/etc/ssl/etcd/ssl/member-kub-ctr1.pem
ETCD_KEY_FILE=/etc/ssl/etcd/ssl/member-kub-ctr1-key.pem
ETCD_CLIENT_CERT_AUTH=true

ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
ETCD_PEER_CERT_FILE=/etc/ssl/etcd/ssl/member-kub-ctr1.pem
ETCD_PEER_KEY_FILE=/etc/ssl/etcd/ssl/member-kub-ctr1-key.pem
ETCD_PEER_CLIENT_CERT_AUTH=True

CLI settings

ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-kub-ctr1-key.pem
ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-kub-ctr1.pem

ETCD 3.5.x issue

https://groups.google.com/a/kubernetes.io/g/dev/c/B7gJs88XtQc/m/rSgNOzV2BwAJ?utm_medium=email&utm_source=footer

ETCD_EXPERIMENTAL_INITIAL_CORRUPT_CHECK=True

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ etcdctl member list -w table
etcdctl member list -w table
{"level":"warn","ts":"2022-06-23T10:42:00.234+0300","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0000ea700/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection closed"}
Error: context deadline exceeded


$ etcdctl --endpoints=<member list> endpoint status -w table
{"level":"warn","ts":"2022-06-23T10:42:43.300+0300","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003be700/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection closed"}
Failed to get the status of endpoint 127.0.0.1:2379 (context deadline exceeded)
+----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+----------+----+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Relevant log output

No response

@ahrtr
Copy link
Member

ahrtr commented Jun 24, 2022

Please make sure a correct list of endpoints are configured in ETCDCTL_ENDPOINTS or the command-line parameter --endpoints.

CLI settings

ETCDCTL_ENDPOINTS=https://127.0.0.1:2379

Can you confirm the endpoint https://127.0.0.1:2379 is reachable?

@vivekpatani
Copy link
Contributor

vivekpatani commented Jul 13, 2022

I'm seeing something similar to the user error mentioned above. I'm using the go client library. Here's an easy way I was able to reproduce the issue. The programme just creates N=300 clients and each client creates 1 key.

ETCDCTL_ENDPOINTS is empty (not that it should matter)

 ➜ echo $ETCDCTL_ENDPOINTS

go.mod

module example.com/m/v2

go 1.18

require (
	go.etcd.io/etcd/client/v3 v3.5.4
	go.uber.org/zap v1.21.0
)

main.go

package main

import (
	"context"
	"go.etcd.io/etcd/client/v3"
	"go.uber.org/zap"
	"strconv"
	"time"
)

const (
	// dialTimeout sets an expiration
	defaultDialTimeout = 4 * time.Second
	// N number of clients
	N = 300
	// ep endpoint
	ep = "https://localhost:2379"
)

var (
	// lg logger
	lg *zap.Logger
)

func main() {
	lg = zap.NewExample()
	lg.Info("starting...")

	for i := 0; i < N; i++ {
		lg.Info(strconv.Itoa(i))
		go func(i int) {
			cli := createCli()
			lg.Info("creating client", zap.Int("number", i))
			put, err := cli.Put(context.TODO(), strconv.Itoa(i), "")
			if err != nil {
				lg.Error("cannot put...")
			} else {
				lg.Info(put.Header.String())
			}
		}(i)
	}
}

// createCli creates a new client
func createCli() clientv3.KV {
	// Create a cli and return
	cli, err := clientv3.New(clientv3.Config{
		DialTimeout: defaultDialTimeout,
		Endpoints:   []string{ep},
		Logger:      lg,
	})
	if cli == nil || err != nil {
		lg.Error("error creating etcd client")
	}
	// return
	kvc := clientv3.NewKV(cli)
	return kvc
}

log.out (pasted an excerpt)

{"level":"debug","msg":"retrying of unary invoker","target":"etcd-endpoints://0x14000ac6700/localhost:2379","attempt":0}
{"level":"info","msg":"creating client","number":289}
{"level":"debug","msg":"retrying of unary invoker","target":"etcd-endpoints://0x140007116c0/localhost:2379","attempt":0}
{"level":"debug","msg":"retrying of unary invoker","target":"etcd-endpoints://0x1400061c540/localhost:2379","attempt":0}
{"level":"info","msg":"creating client","number":287}
{"level":"debug","msg":"retrying of unary invoker","target":"etcd-endpoints://0x140009216c0/localhost:2379","attempt":0}
{"level":"debug","msg":"retrying of unary invoker","target":"etcd-endpoints://0x14000711880/localhost:2379","attempt":0}
{"level":"info","msg":"creating client","number":127}
{"level":"debug","msg":"retrying of unary invoker","target":"etcd-endpoints://0x14000c90a80/localhost:2379","attempt":0}
{"level":"debug","msg":"retrying of unary invoker","target":"etcd-endpoints://0x14000750e00/localhost:2379","attempt":0}
{"level":"debug","msg":"retrying of unary invoker","target":"etcd-endpoints://0x14000863180/localhost:2379","attempt":0}
{"level":"info","msg":"creating client","number":139}
{"level":"debug","msg":"retrying of unary invoker","target":"etcd-endpoints://0x14000494e00/localhost:2379","attempt":0}
{"level":"info","msg":"creating client","number":296}
{"level":"debug","msg":"retrying of unary invoker","target":"etcd-endpoints://0x140009bf180/localhost:2379","attempt":0}
{"level":"info","msg":"297"}
{"level":"info","msg":"298"}
{"level":"info","msg":"299"}
{"level":"info","msg":"creating client","number":89}
{"level":"debug","msg":"retrying of unary invoker","target":"etcd-endpoints://0x140006948c0/localhost:2379","attempt":0}
➜ ./bin/etcdctl put 1 hello
OK

etcd server: goreman -f Procfile start
server logs (pasted an excerpt)

17:56:50 etcd1 | {"level":"debug","ts":"2022-07-12T17:56:50.112-0700","caller":"v3rpc/interceptor.go:182","msg":"request stats","start time":"2022-07-12T17:56:50.071-0700","time spent":"41.152958ms","remote":"127.0.0.1:63810","response type":"/etcdserverpb.KV/Put","request count":1,"request size":10,"response count":0,"response size":28,"request content":"key:\"1\" value_size:5 "}

Please let me know if you need more debug info, thanks. @ahrtr

Maybe not a bug or maybe a misunderstanding: https://github.com/etcd-io/etcd/blob/0bac6096325c20b0709a3565fbf5808aa7fbf917/CHANGELOG/CHANGELOG-3.5.md#package-clientv3-1

@vivekpatani
Copy link
Contributor

I could not reproduce this behaviour in 3.3.27

go.mod

module github.pie.apple.com/vpatani/etcd-load-cluster

go 1.17

require go.uber.org/zap v1.21.0

require go.etcd.io/etcd v3.3.27+incompatible

require (
	github.com/coreos/bbolt v0.0.0-00010101000000-000000000000 // indirect
	github.com/coreos/etcd v3.3.27+incompatible // indirect
	github.com/coreos/go-semver v0.3.0 // indirect
	github.com/coreos/go-systemd v0.0.0-20191104093116-d3cd4ed1dbcf // indirect
	github.com/coreos/pkg v0.0.0-20220709002704-04386ae12ed0 // indirect
	github.com/dgrijalva/jwt-go v3.2.0+incompatible // indirect
	github.com/dustin/go-humanize v1.0.0 // indirect
	github.com/gogo/protobuf v1.3.2 // indirect
	github.com/golang/protobuf v1.5.2 // indirect
	github.com/google/uuid v1.3.0 // indirect
	github.com/grpc-ecosystem/go-grpc-middleware v1.3.0 // indirect
	github.com/grpc-ecosystem/go-grpc-prometheus v1.2.0 // indirect
	github.com/grpc-ecosystem/grpc-gateway v1.16.0 // indirect
	github.com/jonboulle/clockwork v0.3.0 // indirect
	github.com/pkg/errors v0.9.1 // indirect
	github.com/prometheus/client_golang v1.12.2 // indirect
	github.com/soheilhy/cmux v0.1.5 // indirect
	github.com/tmc/grpc-websocket-proxy v0.0.0-20220101234140-673ab2c3ae75 // indirect
	github.com/xiang90/probing v0.0.0-20190116061207-43a291ad63a2 // indirect
	go.uber.org/atomic v1.7.0 // indirect
	go.uber.org/multierr v1.6.0 // indirect
	golang.org/x/net v0.0.0-20211123203042-d83791d6bcd9 // indirect
	golang.org/x/sys v0.0.0-20220114195835-da31bd327af9 // indirect
	golang.org/x/text v0.3.6 // indirect
	google.golang.org/genproto v0.0.0-20200825200019-8632dd797987 // indirect
	google.golang.org/grpc v1.33.1 // indirect
	google.golang.org/protobuf v1.26.0 // indirect
	gopkg.in/yaml.v2 v2.4.0 // indirect
	sigs.k8s.io/yaml v1.3.0 // indirect
)

replace github.com/coreos/bbolt => go.etcd.io/bbolt v1.3.6
replace google.golang.org/grpc => google.golang.org/grpc v1.19.0

main.go

package main

import (
	"context"
	"go.etcd.io/etcd/clientv3"
	"go.uber.org/zap"
	"strconv"
	"time"
)

const (
	// dialTimeout sets an expiration
	defaultDialTimeout = 4 * time.Second
	// N number of clients
	N = 300
	// ep endpoint
	ep = "localhost:2379"
	// offset off set
	offset = 100
)

var (
	// lg logger
	lg *zap.Logger
	// defaultEndpoints default endpoints
	//defaultEndpoints = []string{"http://localhost:2379", "http://localhost:22379", "http://localhost:32379"}
)

func main() {
	lg = zap.NewExample()
	lg.Info("starting...")

	for i := 0; i < N; i++ {
		cli := createCli()
		lg.Info("creating client", zap.Int("number", i))
		put, err := cli.Put(context.Background(), strconv.Itoa(i+offset), "")
		lg.Info("requested")
		if err != nil {
			lg.Error("cannot put...")
		} else {
			lg.Info(put.Header.String())
		}
	}
}

// createCli creates a new client
func createCli() clientv3.KV {
	// Create a cli and return
	cli, err := clientv3.New(clientv3.Config{
		DialTimeout: defaultDialTimeout,
		Endpoints:   []string{ep},
	})
	if cli == nil || err != nil {
		lg.Error("error creating etcd client")
	}
	// return
	kvc := clientv3.NewKV(cli)
	return kvc
}

The run was successful.

@ahrtr
Copy link
Member

ahrtr commented Jul 13, 2022

@vivekpatani I do not see any issue at all, what you provided are just debug log messages.

@vivekpatani
Copy link
Contributor

@ahrtr the first line of the log posted above, the endpoint is prepended with "target":"etcd-endpoints://0x14000ac6700", which is why can't connect to an endpoint of etcd (I'm assuming).

"target":"etcd-endpoints://0x14000ac6700/localhost:2379","

@serathius
Copy link
Member

serathius commented Jul 15, 2022

This is a custom schema that grpc loadbalancer which should get translated to http://localhost:2379. It should not cause connection problems.

Breakdown: etcd-endpoints://0x14000ac6700/localhost:2379

  • etcd-endpoints - replaces http and is the custom schema that etcd use to override the endpoint resolve algorithm adding custom loadbalancing logic
  • 0x14000ac6700 - is just a unique identifier (address of client struct)
  • localhost:2379 is the real target.

Yes, this is horrible, but we cannot implement it another way due to grpc limitations.

@vivekpatani
Copy link
Contributor

@webchi feel free to close this, works as intended.

@webchi webchi closed this as completed Jul 21, 2022
@serathius
Copy link
Member

Note: We reported this issue to grpc some time ago. If someone is interested they could check if this workaround is no longer needed.

@serathius
Copy link
Member

Context:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants