-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lightweight consul agent #9534
Comments
Hi @ltagliamonte-dd, This is something we've been talking about on the team for quite a while now. I'd like to address some of your points, as there's a lot to unpack.
Before diving too far into this, I want to ask - It sounds like your motivation here is to make Consul agents better able to run inside each pod. Is that right?
I think the issues here are less about "scale" than about pod ephemerality or churn. A lot of the things you mentioned removing are powered by Serf which we've found to be highly scalable across many large deployments. We have had users running 5k+ nodes in a Consul DC in prod for years in its current architectural design, and not that many folks run more than 5-10k pods in a cluster. That said, Kuberenetes brings its own challenges over VMs, and we’re open to assessing other, lighter-weight solutions as we go forward! So we don't recommend running the Consul agent as a sidecar for every pod due to churn. 5k node clusters are fine when they are mostly unchanging, but 5k pods can cause stability issues, like during deployments or rolling outages. We see this as an issue in very short lived containers causing constant gossip churn today. That's why our helm chart installs the Consul Agent as a DaemonSet. We do know of a few pretty large users who are successfully running Consul agents inside pods at scale, though so it's possible, but not what we suggest. The DaemonSet pattern has a whole host of issues though and we certainly do want to move to a more lightweight architecture for Kubernetes in the near future. The exact design of that is something we're thinking about right now. Your proposal here is one option we are considering, but may not be the best next step considering some of the other issues we also need to solve that are closely related. For example, the fact that in Consul currently, Agents are the source of truth for service registrations. Some things you can only configure if you can talk directly to the agent, rather than through a central API like Kube operators are used to. That said, other options can potentially provide an even cleaner and lighter solution. We're considering how to address all of these problems and what sequence to iterate to provide the most value and solve as many problems as possible etc. If you have any more context on scaling issues you've seen, or how you imagine this could work we'd love to hear that! Looking forward to hearing back from you on this! |
@jsosulska thank you for getting back to me, Today I ran consul as daemonset and my environment is very dynamic all my k8s clusters (several) are in autoscaling, and we make a lot of deploys a day (microservice owners can deploy at their will) and as you notice this dynamicity bring in stress on the consul infra, both client and servers. Today I use consul DNS interface to power the service discovery among my kubernetes clusters (we use AWS CNI so our network is flat and we run multiple clusters as it is a single big one). I believe that dropping part of this entropy would help in scaling consul solutions even further, and they aren't really features I would use in a kubernetes environment.
this doesn't apply in kubernetes, sync-catalog takes care of reg/dereg services. I'm really happy that the group is active on this front, and I'd like to help the project in better supporting larger scale and different use cases like i have. |
Hello @jsosulska any updates to share from you internal discussions with the team? |
@jsosulska i'd like to hear from you if you have any news. |
Hi @ltagliamonte-dd, we are definitely interested in supporting a lightweight / no agent deployment model. I'd be happy to chat with you further to better understand your requirements. On a related note, we recently discovered the |
Hello @blake thank you for the follow up, thank you for pointing me to the I ran several k8s clusters in a flat network, and i use the consul domain "to merge" all the clusters in a single addressable domain. For scalability i'd like to turn off everything that isn't local service cache in the agents. |
Nowadays especially in Kubernets based infrastructures there are some features that the consul agents/servers implements that imho are just a scalability burden in big consul installation (2-3k nodes and above).
In the specific would be nice to have a consul-agent that just offers the http/grpc API cache and the DNS interface and drop everything around:
Basically have an agent that acts just like a local DB that tools like service mesh/discovery can leverage to read from.
i'd love to start a discussion with the team here for understanding if something like a lightweight agent could be even implemented or just adding features gates flags to the existing agent is a possible alternative
The text was updated successfully, but these errors were encountered: