Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support helm install for k3s and minikube without a values override to use 1 replica for Consul server #1234

Closed
pankaj-dahiya-devops opened this issue May 23, 2022 · 8 comments
Labels
area/chart-only Related to changes that simply require yaml Helm chart changes, e.g. exposing a new field type/enhancement New feature or request

Comments

@pankaj-dahiya-devops
Copy link

pankaj-dahiya-devops commented May 23, 2022

Just two steps -

  1. helm pull hashicorp/consul --untar
  2. helm install consul consul

Error logs:

k logs consul-consul-server-0
==> Starting Consul agent...
           Version: '1.12.0'
           Node ID: '32edf32e-be0e-8a78-c3ec-fc8576644f20'
         Node name: 'consul-consul-server-0'
        Datacenter: 'dc1' (Segment: '<all>')
            Server: true (Bootstrap: false)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
      Cluster Addr: 10.42.0.38 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

2022-05-23T19:44:24.387Z [WARN]  agent: bootstrap_expect > 0: expecting 3 servers
2022-05-23T19:44:24.492Z [WARN]  agent.auto_config: bootstrap_expect > 0: expecting 3 servers
2022-05-23T19:44:24.588Z [INFO]  agent.server.raft: initial configuration: index=0 servers=[]
2022-05-23T19:44:24.588Z [INFO]  agent.server.raft: entering follower state: follower="Node at 10.42.0.38:8300 [Follower]" leader=
2022-05-23T19:44:24.589Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: consul-consul-server-0.dc1 10.42.0.38
2022-05-23T19:44:24.590Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: consul-consul-server-0 10.42.0.38
2022-05-23T19:44:24.591Z [INFO]  agent.router: Initializing LAN area manager
2022-05-23T19:44:24.592Z [INFO]  agent.server: Adding LAN server: server="consul-consul-server-0 (Addr: tcp/10.42.0.38:8300) (DC: dc1)"
2022-05-23T19:44:24.593Z [INFO]  agent.server: Handled event for server in area: event=member-join server=consul-consul-server-0.dc1 area=wan
2022-05-23T19:44:24.593Z [WARN]  agent: [core]grpc: addrConn.createTransport failed to connect to {dc1-10.42.0.38:8300 consul-consul-server-0 <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.42.0.38:8300: operation was canceled". Reconnecting...
2022-05-23T19:44:24.687Z [INFO]  agent.server.autopilot: reconciliation now disabled
2022-05-23T19:44:24.689Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=udp
2022-05-23T19:44:24.690Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=tcp
2022-05-23T19:44:24.691Z [INFO]  agent: Starting server: address=[::]:8500 network=tcp protocol=http
2022-05-23T19:44:24.787Z [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
2022-05-23T19:44:24.787Z [INFO]  agent: Joining cluster...: cluster=LAN
2022-05-23T19:44:24.787Z [INFO]  agent: (LAN) joining: lan_addresses=[consul-consul-server.default.svc:8301]
2022-05-23T19:44:24.787Z [INFO]  agent: started state syncer
2022-05-23T19:44:24.787Z [INFO]  agent: Consul agent running!
2022-05-23T19:44:24.933Z [INFO]  agent: (LAN) joined: number_of_nodes=1
2022-05-23T19:44:24.933Z [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=1
2022-05-23T19:44:25.491Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: ubuntu 10.42.0.36
2022-05-23T19:44:30.436Z [WARN]  agent.server.raft: no known peers, aborting election
2022-05-23T19:44:31.839Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2022-05-23T19:44:50.534Z [ERROR] agent: Coordinate update error: error="No cluster leader"
2022-05-23T19:45:06.207Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2022-05-23T19:45:20.475Z [ERROR] agent: Coordinate update error: error="No cluster leader"
2022-05-23T19:45:33.835Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No cluster leader"
2022-05-23T19:45:50.287Z [ERROR] agent: Coordinate update error: error="No cluster leader"




k logs consul-consul-client-nkp65
==> Starting Consul agent...
           Version: '1.12.0'
           Node ID: '72a19962-621b-dfd3-8b77-033e1dac07ea'
         Node name: 'ubuntu'
        Datacenter: 'dc1' (Segment: '')
            Server: false (Bootstrap: false)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
      Cluster Addr: 10.42.0.36 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

2022-05-23T19:44:25.193Z [INFO]  agent.client.serf.lan: serf: EventMemberJoin: ubuntu 10.42.0.36
2022-05-23T19:44:25.194Z [INFO]  agent.router: Initializing LAN area manager
2022-05-23T19:44:25.194Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=udp
2022-05-23T19:44:25.194Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=tcp
2022-05-23T19:44:25.388Z [INFO]  agent: Starting server: address=[::]:8500 network=tcp protocol=http
2022-05-23T19:44:25.388Z [INFO]  agent: Started gRPC server: address=[::]:8502 network=tcp
2022-05-23T19:44:25.388Z [INFO]  agent: Retry join is supported for the following discovery methods: cluster=LAN discovery_methods="aliyun aws azure digitalocean gce k8s linode mdns os packet scaleway softlayer tencentcloud triton vsphere"
2022-05-23T19:44:25.388Z [INFO]  agent: Joining cluster...: cluster=LAN
2022-05-23T19:44:25.388Z [INFO]  agent: (LAN) joining: lan_addresses=[consul-consul-server-0.consul-consul-server.default.svc:8301, consul-consul-server-1.consul-consul-server.default.svc:8301, consul-consul-server-2.consul-consul-server.default.svc:8301]
2022-05-23T19:44:25.388Z [WARN]  agent.router.manager: No servers available
2022-05-23T19:44:25.388Z [ERROR] agent.http: Request error: method=GET url=/v1/kv/vault/core/migration from=10.42.0.1:59314 error="No known Consul servers"
2022-05-23T19:44:25.487Z [INFO]  agent: started state syncer
2022-05-23T19:44:25.487Z [INFO]  agent: Consul agent running!
2022-05-23T19:44:25.488Z [WARN]  agent.router.manager: No servers available
2022-05-23T19:44:25.488Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2022-05-23T19:44:25.491Z [INFO]  agent.client.serf.lan: serf: EventMemberJoin: consul-consul-server-0 10.42.0.38
2022-05-23T19:44:25.492Z [INFO]  agent.client: adding server: server="consul-consul-server-0 (Addr: tcp/10.42.0.38:8300) (DC: dc1)"
2022-05-23T19:44:25.612Z [WARN]  agent.client.memberlist.lan: memberlist: Failed to resolve consul-consul-server-1.consul-consul-server.default.svc:8301: lookup consul-consul-server-1.consul-consul-server.default.svc on 10.43.0.10:53: no such host
2022-05-23T19:44:25.664Z [WARN]  agent.client.memberlist.lan: memberlist: Failed to resolve consul-consul-server-2.consul-consul-server.default.svc:8301: lookup consul-consul-server-2.consul-consul-server.default.svc on 10.43.0.10:53: no such host
2022-05-23T19:44:25.664Z [INFO]  agent: (LAN) joined: number_of_nodes=1
2022-05-23T19:44:25.664Z [INFO]  agent: Join cluster completed. Synced with initial agents: cluster=LAN num_agents=1
2022-05-23T19:44:34.229Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:44:34.229Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
2022-05-23T19:44:34.687Z [ERROR] agent.client: RPC failed to server: method=KVS.Get server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:44:34.687Z [ERROR] agent.http: Request error: method=GET url=/v1/kv/vault/core/migration from=10.42.0.1:59314 error="rpc error making call: No cluster leader"
2022-05-23T19:44:44.009Z [ERROR] agent.client: RPC failed to server: method=KVS.Get server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:44:44.009Z [ERROR] agent.http: Request error: method=GET url=/v1/kv/vault/core/migration from=10.42.0.1:59314 error="rpc error making call: No cluster leader"
2022-05-23T19:44:53.100Z [ERROR] agent.client: RPC failed to server: method=KVS.Get server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:44:53.100Z [ERROR] agent.http: Request error: method=GET url=/v1/kv/vault/core/migration from=10.42.0.1:59314 error="rpc error making call: No cluster leader"
2022-05-23T19:44:55.510Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:44:55.510Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
2022-05-23T19:45:00.955Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:45:00.955Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
2022-05-23T19:45:02.430Z [ERROR] agent.client: RPC failed to server: method=KVS.Get server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:45:02.430Z [ERROR] agent.http: Request error: method=GET url=/v1/kv/vault/core/migration from=10.42.0.1:59314 error="rpc error making call: No cluster leader"
2022-05-23T19:45:11.490Z [ERROR] agent.client: RPC failed to server: method=KVS.Get server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:45:11.491Z [ERROR] agent.http: Request error: method=GET url=/v1/kv/vault/core/migration from=10.42.0.1:59314 error="rpc error making call: No cluster leader"
2022-05-23T19:45:20.506Z [ERROR] agent.client: RPC failed to server: method=KVS.Get server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:45:20.507Z [ERROR] agent.http: Request error: method=GET url=/v1/kv/vault/core/migration from=10.42.0.1:59314 error="rpc error making call: No cluster leader"
2022-05-23T19:45:23.385Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:45:23.385Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
2022-05-23T19:45:29.565Z [ERROR] agent.client: RPC failed to server: method=KVS.Get server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:45:29.566Z [ERROR] agent.http: Request error: method=GET url=/v1/kv/vault/core/migration from=10.42.0.1:59314 error="rpc error making call: No cluster leader"
2022-05-23T19:45:36.741Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:45:36.741Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
2022-05-23T19:45:38.592Z [ERROR] agent.client: RPC failed to server: method=KVS.Get server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:45:38.592Z [ERROR] agent.http: Request error: method=GET url=/v1/kv/vault/core/migration from=10.42.0.1:59314 error="rpc error making call: No cluster leader"
2022-05-23T19:45:45.572Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:45:45.572Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
2022-05-23T19:45:47.699Z [ERROR] agent.client: RPC failed to server: method=KVS.Get server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:45:47.699Z [ERROR] agent.http: Request error: method=GET url=/v1/kv/vault/core/migration from=10.42.0.1:59314 error="rpc error making call: No cluster leader"
2022-05-23T19:45:56.779Z [ERROR] agent.client: RPC failed to server: method=KVS.Get server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:45:56.779Z [ERROR] agent.http: Request error: method=GET url=/v1/kv/vault/core/migration from=10.42.0.1:59314 error="rpc error making call: No cluster leader"
2022-05-23T19:46:05.986Z [ERROR] agent.client: RPC failed to server: method=KVS.Get server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:46:05.986Z [ERROR] agent.http: Request error: method=GET url=/v1/kv/vault/core/migration from=10.42.0.1:59314 error="rpc error making call: No cluster leader"
2022-05-23T19:46:09.544Z [ERROR] agent.client: RPC failed to server: method=Catalog.NodeServiceList server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:46:09.544Z [ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: No cluster leader"
2022-05-23T19:46:15.045Z [ERROR] agent.client: RPC failed to server: method=KVS.Get server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:46:15.045Z [ERROR] agent.http: Request error: method=GET url=/v1/kv/vault/core/migration from=10.42.0.1:59314 error="rpc error making call: No cluster leader"
2022-05-23T19:46:22.054Z [ERROR] agent.client: RPC failed to server: method=Coordinate.Update server=10.42.0.38:8300 error="rpc error making call: No cluster leader"
2022-05-23T19:46:22.054Z [ERROR] agent: Coordinate update error: error="rpc error making call: No cluster leader"
@pankaj-dahiya-devops pankaj-dahiya-devops added the type/bug Something isn't working label May 23, 2022
@pankaj-dahiya-devops
Copy link
Author

One should be seriously questioned who broke this consul chart!. We aren't talking about some unmanaged/unpopular/random chart. We are using the "official" hem chart for consul and guess what, it is broken!!

@kschoche
Copy link
Contributor

Hi @pankaj-dahiya thanks for filing this issue and I'm sorry that you're running into trouble using it.
I was able to reproduce your issue by using the following commands on a single node Kubernetes cluster:

helm repo update hashicorp
helm install consul hashicorp/consul

In this case our default installation attempts to install a 3 server replica Consul cluster and due to podAntiAffinity rules 2 of the server pods are unable to be scheduled. [link to values.yaml explaining the behaviour]

You can find more information about this behaviour under the server.replicas stanza in the helm chart however this setting has not changed since 2018.

I see in your logs that you are also running on a single node in which case you should see the same problem:
2022-05-23T19:44:24.933Z [INFO] agent: (LAN) joined: number_of_nodes=1

The following command should get you up and running:

helm install consul hashicorp/consul --set server.replicas=1

If that does not work we'd be happy to help out, in which case could you please also attach any custom changes to the helm chart or values.yaml overrides along with logs of the servers and output from kubectl get pods to confirm the readiness of the pods?

Thanks!
~Kyle

@kschoche kschoche added the waiting-reply Waiting on the issue creator for a response before taking further action label May 23, 2022
@pankaj-dahiya-devops
Copy link
Author

Okay, got it. But I guess the default should be 1 with a recommendation to 3. So that the chart should always run with default values.yaml and any number of nodes.

@kschoche kschoche added area/chart-only Related to changes that simply require yaml Helm chart changes, e.g. exposing a new field and removed type/bug Something isn't working waiting-reply Waiting on the issue creator for a response before taking further action labels May 27, 2022
@david-yu
Copy link
Contributor

Hi @pankaj-dahiya we do have 1 replicas set for our examples with Minikube as shown here: https://learn.hashicorp.com/tutorials/consul/kubernetes-minikube#create-a-values-file. Could you tell me a little more about your environment you are installing on (we do ask that you provide some more details about your environment in our issue template)? I believe it's tricky to determine the actual environment you are installing on by just using Helm so you would need to know ahead of time that this is perhaps a demo environment and tune the values file appropriately. We do want to default to 3 because that is how we recommend running servers in production. How could we make the guidance for turning replicas to 1 more discoverable for you?

@pankaj-dahiya-devops
Copy link
Author

pankaj-dahiya-devops commented May 27, 2022

Well under this line-
We're looking for feedback on how folks are using Consul on Kubernetes. Please fill out our brief [survey]

Write in bold - This helm chart will fail in less than 3 nodes cluster while using default values so ditch this and start using bitnami helm chart!

See, sarcasm apart, a helm chart should be able to run in as many environments with its default values. So rather than defaulting "3" as replica set values, "1" is better with a recommendation of using at least 3 for production. Because this is damn obvious that some DevOps employee running this chart in the production environment will be increasing the replica sets and even if he forgot to do so, then it is not a bit task to increase the replica set number.
People for testing purposes are now using k3s on their personal machines and doing just helm install with default values so that they can test their app. These people are no experts of helm charts so it should be our main moto is to let people from other backgrounds run charts with default values in their favourite envs.

@david-yu
Copy link
Contributor

I agree your point of the default experience on k3s (which is your environment) or minikube is less than optimal without a value overrides file that specifies 1 replica, and most folks that want to try out the initial experience my try things out locally. Just for completeness it does sound like your method of discovery for installation steps would be targeted to the README.

@david-yu david-yu changed the title Consul not starting at day-zero Support helm install for k3s and minikube without a values override to use 1 replica for Consul server May 27, 2022
@david-yu david-yu added the type/enhancement New feature or request label May 27, 2022
@david-yu
Copy link
Contributor

Hi @pankaj-dahiya we'll be defaulting to 1 server replica in a future release. Thanks for your patience on this one, and I do agree that it is a better getting started experience for users. This will be a breaking change so will happen in time with our next major Consul release.

@david-yu
Copy link
Contributor

david-yu commented Oct 7, 2022

Closing as is addressed by #1551

@david-yu david-yu closed this as completed Oct 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/chart-only Related to changes that simply require yaml Helm chart changes, e.g. exposing a new field type/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants