Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clientv3: clientv3.New() won't return error when no endpoint is available #9877

Closed
yudai opened this issue Jun 21, 2018 · 7 comments
Closed

Comments

@yudai
Copy link
Contributor

yudai commented Jun 21, 2018

What's observed

With clientv3 on the master branch (7648841), clientv3.New() won't return any error even when no endpoint is available to connect.

The following reproduction code doesn't panic with master, while the same code with v3.3.x panics.

conf := clientv3.Config{
    Endpoints:   []string{"invalid_endpoint:2379"},
    DialTimeout: 10 * time.Second,
}
client, err := clientv3.New(conf)
if err != nil {
    panic("connection failed")
}

Breaking Change?

I assume this behavior change has been introduced along with the new load balancer. It makes sense that the load balancer internally handles connection errors and hide them from users, because the load balancer is responsible for endpoint selection and reconnection.
(I myself is fine with this behavior, since following requests will fail when no endpoint is available and we can detect the problem at that timing)

However, this looks a breaking change for users. I assume some code has different error handlers when no endpoint is available at the startup of applications (like just die instead of retry).

I therefore think that at least this change is mentioned in the CHANGELOG as a breaking change.

@gyuho
Copy link
Contributor

gyuho commented Jun 21, 2018

@yudai Yeah, thanks for pointing out. We may have overlooked that change. Now balancer has internal routine for endpoint resolver, that could delay initial connection start. For now, you may need DialOptions: []grpc.DialOption{grpc.WithBlock()},. Will improve docs and change logs. /cc @jpbetz

@jpbetz
Copy link
Contributor

jpbetz commented Jun 21, 2018

grpc.WithBlock() is going to be required for sure. Should we set that by default to attempt to preserve 3.3- behavior?

That said, I'm not certain the right thing happens even with that option set.

@gyuho
Copy link
Contributor

gyuho commented Jun 21, 2018

Confirmed DialOptions: []grpc.DialOption{grpc.WithBlock()}, would return errors like old client, but without this dial option, connect happens in background returning the client object before initial connection.

We plan to document this clearly (ref #9829). Also, need more research about enabling it by default (since you cannot disable WithBlock once configured).

@gyuho gyuho added this to the etcd-v3.4 milestone Jun 25, 2018
mitsutaka pushed a commit to cybozu-go/cke that referenced this issue Jul 27, 2018
- master HEAD has a bug that DialTimeout does not work

  etcd-io/etcd#9829
  etcd-io/etcd#9877
@philippgille
Copy link
Contributor

Using DialOptions: []grpc.DialOption{grpc.WithBlock()}, in my code leads to the following compile error: cannot use []"google.golang.org/grpc".DialOption literal (type []"google.golang.org/grpc".DialOption) as type []"go.etcd.io/etcd/vendor/google.golang.org/grpc".DialOption in field value (note: I'm a Go beginner)

@jpbetz
Copy link
Contributor

jpbetz commented Nov 25, 2018

@philippgille Looks like you've hit an issue with go dependency management in your project. In particular, it looks like a vendoring issue since google.golang.org/grpc.DialOption is the correct type for DialOption but I see it referenced in the compile error as go.etcd.io/etcd/vendor/google.golang.org/grpc.DialOption. Not knowing your you're doing go dependency management, I can't say much more, but ensure you always importing with the name google.golang.org/grpc and then review how your managing your dependencies if that doesn't fix it.

@bblay
Copy link

bblay commented Jan 29, 2019

@philippgille, based on this comment, I solved the problem like this:

	clientConfig := clientv3.Config{
		Endpoints:   []string{"http://" + *etcdHost},
		DialTimeout: 2 * time.Second,
	}

	etcd, err := clientv3.New(clientConfig)
	if err != nil {
		return nil, errors.Wrap(err, "could not connect to etcd")
	}

	timeoutCtx, cancel := context.WithTimeout(context.Background(), 2 * time.Second)
	defer cancel()
	_, err = etcd.Status(timeoutCtx, clientConfig.Endpoints[0])
	if err != nil {
		return nil, errors.Wrapf(err, "error checking etcd status: %v", err)
	}

funlake pushed a commit to funlake/gopkg that referenced this issue Apr 18, 2019
@gyuho gyuho modified the milestones: etcd-v3.4, etcd-v3.5 Aug 5, 2019
@gyuho
Copy link
Contributor

gyuho commented Aug 5, 2019

Closing via 4b0af5b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants