-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Servers with a gRPC proxyProtocol and HTTPRoutes cannot be upgraded past edge-24.7.1 #13507
Comments
We are also experimenting with changing the proxy protocol for grpc to http/2. Although we are not fully sure of the implications. From what we understand, load balancing http/2 should be the same as load balancing grpc. There are some differences in metrics. In particular the grpc_status_code label would not be published. The classification label would work the same as far as we understand, since grpc error status codes map to http 4xx or 5xx status codes. |
Thanks for reporting this @andrew-gropyus. We're looking into this so that we can advise you on the best way to do this upgrade without downtime. |
Hey @andrew-gropyus thanks for your patience on this one. I would recommend switching the proxyPrototol to HTTP/2 as you identified. This will allow you to continue to use the HTTPRoute resources that you have configured and it won't change the way that failure classification or load balancing works. Once the proxyProtocol is set to HTTP/2, you should be free to upgrade Linkerd versions without issue. Once upgraded, you can either keep the HTTPRotues and HTTP/2 proxy protocol or switch over to gRPC. If you do switch over, make sure to change the proxyProtocol to gRPC and to create GRPCRoute resources to replace your current HTTPRoute resources all in one operation. If done at the same time, you won't incur any downtime. Note that if you have both GRPCRoute and HTTPRoute resources targeting the same Server, the GRPCRoute resources will take precedence over the HTTPRoutes but if the proxyProtocol is still set to HTTP/2, this means that all routes will be ignored. This is why it is important to change the proxyProtocol to gRPC at the same time as creating the GRPCRoutes. Alternatively, there is no major downside to continuing to use the HTTPRoutes and HTTP/2 proxyProtocol after upgrading. |
Thanks for the investigation @adleong. We're going to take the proxyProtocol: HTTP2 path. It's also good to know that there is a path to switch back over to grpc without downtime if needed. |
What is the issue?
We are running edge 24.5.3 and upgrading to 24.11.8.
We use a default deny policy.
We currently run a number of grpc servers (Servers that have proxyProtocol: "gRPC") combined with http routes and authorization policies for fine grained authorization.
In 24.5.3 the behavior of linkerd when building the server model internally was to collect the http routes for grpc server.
In #12785 (released in edge-24.7.1) this behavior was changed to use grpc routes for a grpc server:
When a grpc server does not have any grpc routes, it defaults to a default grpc route.
The same PR also added support for adding grpc routes as target refs for authorization policies. Before this PR, an authorization policy that had a target ref pointing to a grpc route would fail in the policy admission controller.
As I understand it, this means that the following combination of components can not be upgraded with out downtime:
How can it be reproduced?
Before upgrading to 24.7.1 have a server, http route, and authorization policy that look like the following:
Then upgrade to or past 24.7.1.
Note that the server has not detected the http route, and is instead relying on a default route, with a default policy to authorize requests.
Logs, error output, etc
No logs
output of
linkerd check -o short
Only up to date warnings
Environment
Possible solution
Additional context
We are looking for an online migration path, we would rather not bring services down while we switched them from http routes to grpc routes.
Would you like to work on fixing this bug?
None
The text was updated successfully, but these errors were encountered: