-
Notifications
You must be signed in to change notification settings - Fork 14.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add "grpc load balancing on kubernetes without tears" blog post (#10891)
* add grpc load balancing with linkerd blog post * remove date from filenames I can follow instructions. * update links with new date-less paths * Update grpc-load-balancing-with-linkerd.md
- Loading branch information
1 parent
d65e179
commit 08d0e1b
Showing
7 changed files
with
172 additions
and
0 deletions.
There are no files selected for viewing
172 changes: 172 additions & 0 deletions
172
content/en/blog/_posts/grpc-load-balancing-with-linkerd.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,172 @@ | ||
--- | ||
layout: blog | ||
title: 'gRPC Load Balancing on Kubernetes without Tears' | ||
date: 2018-11-07 | ||
--- | ||
|
||
**Author**: William Morgan (Buoyant) | ||
|
||
Many new gRPC users are surprised to find that Kubernetes's default load | ||
balancing often doesn't work out of the box with gRPC. For example, here's what | ||
happens when you take a [simple gRPC Node.js microservices | ||
app](https://github.com/sourishkrout/nodevoto) and deploy it on Kubernetes: | ||
|
||
![](/images/blog/grpc-load-balancing-with-linkerd/Screenshot2018-11-0116-c4d86100-afc1-4a08-a01c-16da391756dd.34.36.png) | ||
|
||
While the `voting` service displayed here has several pods, it's clear from | ||
Kubernetes's CPU graphs that only one of the pods is actually doing any | ||
work—because only one of the pods is receiving any traffic. Why? | ||
|
||
In this blog post, we describe why this happens, and how you can easily fix it | ||
by adding gRPC load balancing to any Kubernetes app with | ||
[Linkerd](https://linkerd.io), a [CNCF](https://cncf.io) service mesh and service sidecar. | ||
|
||
# Why does gRPC need special load balancing? | ||
|
||
First, let's understand why we need to do something special for gRPC. | ||
|
||
gRPC is an increasingly common choice for application developers. Compared to | ||
alternative protocols such as JSON-over-HTTP, gRPC can provide some significant | ||
benefits, including dramatically lower (de)serialization costs, automatic type | ||
checking, formalized APIs, and less TCP management overhead. | ||
|
||
However, gRPC also breaks the standard connection-level load balancing, | ||
including what's provided by Kubernetes. This is because gRPC is built on | ||
HTTP/2, and HTTP/2 is designed to have a single long-lived TCP connection, | ||
across which all requests are *multiplexed*—meaning multiple requests can be | ||
active on the same connection at any point in time. Normally, this is great, as | ||
it reduces the overhead of connection management. However, it also means that | ||
(as you might imagine) connection-level balancing isn't very useful. Once the | ||
connection is established, there's no more balancing to be done. All requests | ||
will get pinned to a single destination pod, as shown below: | ||
|
||
![](/images/blog/grpc-load-balancing-with-linkerd/Mono-8d2e53ef-b133-4aa0-9551-7e36a880c553.png) | ||
|
||
# Why doesn't this affect HTTP/1.1? | ||
|
||
The reason why this problem doesn't occur in HTTP/1.1, which also has the | ||
concept of long-lived connections, is because HTTP/1.1 has several features | ||
that naturally result in cycling of TCP connections. Because of this, | ||
connection-level balancing is "good enough", and for most HTTP/1.1 apps we | ||
don't need to do anything more. | ||
|
||
To understand why, let's take a deeper look at HTTP/1.1. In contrast to HTTP/2, | ||
HTTP/1.1 cannot multiplex requests. Only one HTTP request can be active at a | ||
time per TCP connection. The client makes a request, e.g. `GET /foo`, and then | ||
waits until the server responds. While that request-response cycle is | ||
happening, no other requests can be issued on that connection. | ||
|
||
Usually, we want lots of requests happening in parallel. Therefore, to have | ||
concurrent HTTP/1.1 requests, we need to make multiple HTTP/1.1 connections, | ||
and issue our requests across all of them. Additionally, long-lived HTTP/1.1 | ||
connections typically expire after some time, and are torn down by the client | ||
(or server). These two factors combined mean that HTTP/1.1 requests typically | ||
cycle across multiple TCP connections, and so connection-level balancing works. | ||
|
||
# So how do we load balance gRPC? | ||
|
||
Now back to gRPC. Since we can't balance at the connection level, in order to | ||
do gRPC load balancing, we need to shift from connection balancing to *request* | ||
balancing. In other words, we need to open an HTTP/2 connection to each | ||
destination, and balance *requests* across these connections, as shown below: | ||
|
||
![](/images/blog/grpc-load-balancing-with-linkerd/Stereo-09aff9d7-1c98-4a0a-9184-9998ed83a531.png) | ||
|
||
In network terms, this means we need to make decisions at L5/L7 rather than | ||
L3/L4, i.e. we need to understand the protocol sent over the TCP connections. | ||
|
||
How do we accomplish this? There are a couple options. First, our application | ||
code could manually maintain its own load balancing pool of destinations, and | ||
we could configure our gRPC client to [use this load balancing | ||
pool](https://godoc.org/google.golang.org/grpc/balancer). This approach gives | ||
us the most control, but it can be very complex in environments like Kubernetes | ||
where the pool changes over time as Kubernetes reschedules pods. Our | ||
application would have to watch the Kubernetes API and keep itself up to date | ||
with the pods. | ||
|
||
Alternatively, in Kubernetes, we could deploy our app as [headless | ||
services](https://kubernetes.io/docs/concepts/services-networking/service/#headless-services). | ||
In this case, Kubernetes [will create multiple A | ||
records](https://kubernetes.io/docs/concepts/services-networking/service/#headless-services) | ||
in the DNS entry for the service. If our gRPC client is sufficiently advanced, | ||
it can automatically maintain the load balancing pool from those DNS entries. | ||
But this approach restricts us to certain gRPC clients, and it's rarely | ||
possible to only use headless services. | ||
|
||
Finally, we can take a third approach: use a lightweight proxy. | ||
|
||
# gRPC load balancing on Kubernetes with Linkerd | ||
|
||
[Linkerd](https://linkerd.io) is a [CNCF](https://cncf.io)-hosted *service | ||
mesh* for Kubernetes. Most relevant to our purposes, Linkerd also functions as | ||
a *service sidecar*, where it can be applied to a single service—even without | ||
cluster-wide permissions. What this means is that when we add Linkerd to our | ||
service, it adds a tiny, ultra-fast proxy to each pod, and these proxies watch | ||
the Kubernetes API and do gRPC load balancing automatically. Our deployment | ||
then looks like this: | ||
|
||
![](/images/blog/grpc-load-balancing-with-linkerd/Linkerd-8df1031c-cdd1-4164-8e91-00f2d941e93f.io.png) | ||
|
||
Using Linkerd has a couple advantages. First, it works with services written in | ||
any language, with any gRPC client, and any deployment model (headless or not). | ||
Because Linkerd's proxies are completely transparent, they auto-detect HTTP/2 | ||
and HTTP/1.x and do L7 load balancing, and they pass through all other traffic | ||
as pure TCP. This means that everything will *just work.* | ||
|
||
Second, Linkerd's load balancing is very sophisticated. Not only does Linkerd | ||
maintain a watch on the Kubernetes API and automatically update the load | ||
balancing pool as pods get rescheduled, Linkerd uses an *exponentially-weighted | ||
moving average* of response latencies to automatically send requests to the | ||
fastest pods. If one pod is slowing down, even momentarily, Linkerd will shift | ||
traffic away from it. This can reduce end-to-end tail latencies. | ||
|
||
Finally, Linkerd's Rust-based proxies are incredibly fast and small. They | ||
introduce <1ms of p99 latency and require <10mb of RSS per pod, meaning that | ||
the impact on system performance will be negligible. | ||
|
||
# gRPC Load Balancing in 60 seconds | ||
|
||
Linkerd is very easy to try. Just follow the steps in the [Linkerd Getting | ||
Started Instructions](https://linkerd.io/2/getting-started/)—install the | ||
CLI on your laptop, install the control plane on your cluster, and "mesh" your | ||
service (inject the proxies into each pod). You'll have Linkerd running on your | ||
service in no time, and should see proper gRPC balancing immediately. | ||
|
||
Let's take a look at our sample `voting` service again, this time after | ||
installing Linkerd: | ||
|
||
![](/images/blog/grpc-load-balancing-with-linkerd/Screenshot2018-11-0116-24b8ee81-144c-4eac-b73d-871bbf0ea22e.57.42.png) | ||
|
||
As we can see, the CPU graphs for all pods are active, indicating that all pods | ||
are now taking traffic—without having to change a line of code. Voila, | ||
gRPC load balancing as if by magic! | ||
|
||
Linkerd also gives us built-in traffic-level dashboards, so we don't even need | ||
to guess what's happening from CPU charts any more. Here's a Linkerd graph | ||
that's showing the success rate, request volume, and latency percentiles of | ||
each pod: | ||
|
||
![](/images/blog/grpc-load-balancing-with-linkerd/Screenshot2018-11-0212-15ed0448-5424-4e47-9828-20032de868b5.08.38.png) | ||
|
||
We can see that each pod is getting around 5 RPS. We can also see that, while | ||
we've solved our load balancing problem, we still have some work to do on our | ||
success rate for this service. (The demo app is built with an intentional | ||
failure—as an exercise to the reader, see if you can figure it out by | ||
using the Linkerd dashboard!) | ||
|
||
# Wrapping it up | ||
|
||
If you're interested in a dead simple way to add gRPC load balancing to your | ||
Kubernetes services, regardless of what language it's written in, what gRPC | ||
client you're using, or how it's deployed, you can use Linkerd to add gRPC load | ||
balancing in a few commands. | ||
|
||
There's a lot more to Linkerd, including security, reliability, and debugging | ||
and diagnostics features, but those are topics for future blog posts. | ||
|
||
Want to learn more? We’d love to have you join our rapidly-growing community! | ||
Linkerd is a [CNCF](https://cncf.io) project, [hosted on | ||
GitHub](https://github.com/linkerd/linkerd2), and has a thriving community | ||
on [Slack](https://slack.linkerd.io), [Twitter](https://twitter.com/linkerd), | ||
and the [mailing lists](https://lists.cncf.io/g/cncf-linkerd-users). Come and | ||
join the fun! |
Binary file added
BIN
+33.2 KB
...load-balancing-with-linkerd/Linkerd-8df1031c-cdd1-4164-8e91-00f2d941e93f.io.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+20.9 KB
.../grpc-load-balancing-with-linkerd/Mono-8d2e53ef-b133-4aa0-9551-7e36a880c553.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+121 KB
...h-linkerd/Screenshot2018-11-0116-24b8ee81-144c-4eac-b73d-871bbf0ea22e.57.42.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+118 KB
...h-linkerd/Screenshot2018-11-0116-c4d86100-afc1-4a08-a01c-16da391756dd.34.36.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+86.1 KB
...h-linkerd/Screenshot2018-11-0212-15ed0448-5424-4e47-9828-20032de868b5.08.38.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+22.1 KB
...rpc-load-balancing-with-linkerd/Stereo-09aff9d7-1c98-4a0a-9184-9998ed83a531.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.