Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.42.0 regression in XDS #4924

Closed
howardjohn opened this issue Nov 2, 2021 · 8 comments
Closed

1.42.0 regression in XDS #4924

howardjohn opened this issue Nov 2, 2021 · 8 comments

Comments

@howardjohn
Copy link
Contributor

What version of gRPC are you using?

v1.42.0

What version of Go are you using (go version)?

1.17

What operating system (Linux, Windows, …) and version?

Linux

What did you do?

Ran Istio tests against our proxyless gRPC implementation.

What did you expect to see?

Behavior is the same as 1.41 (or really v1.42.0-dev.0.20211015201449-4757d0249e2d which we were on previously)

What did you see instead?

Regression in functionality caused by https://github.com/grpc/grpc-go/pull/4909/files. We verified setting GRPC_XDS_EXPERIMENTAL_RBAC=false resolves the issue; however an environment variable is not feasible for us since we cannot configure it on the control plane side.

The root cause seems to be some issues in the route processing.

Our HCMs do not set a RouteSpecifier: https://github.com/istio/istio/blob/c017654753cd8f4254abbe0567ff9c728d2034d2/pilot/pkg/networking/grpcgen/lds.go#L156

I think routeNamesToWatch is empty because of this. So handleRouteUpdate is called, so rdsUpdates is never set and route configuration pointer is nil is triggered is triggered. The code comments indicate this should never happen.

It seems like we can fix this by adding:

					RouteSpecifier: &hcm.HttpConnectionManager_RouteConfig{
						RouteConfig: &route.RouteConfiguration{
							Name: "inbound",
							VirtualHosts: []*route.VirtualHost{{
								Domains: []string{"*"},
								Routes: []*route.Route{{
									Match: &route.RouteMatch{
										PathSpecifier: &route.RouteMatch_Prefix{Prefix: "/"},
									},
									Action: &route.Route_NonForwardingAction{},
								}}}}},
					},

but I don't think we should be required to add this when it was not previously.

cc @zasweq @stevenctl

@zasweq
Copy link
Contributor

zasweq commented Nov 2, 2021

Hey, I messaged @ejona86 who is leading the RBAC effort. My code is faulty in its current state because a nil route specifier would lead to the error you see. Thus, I wanted to know if a nil route specifier was even allowed. According to the proto documentation, it is a required field: https://github.com/envoyproxy/envoy/blob/56e8c45b1b340c4a4f8f02ec2488354c31806d59/api/envoy/extensions/filters/network/http_connection_manager/v3/http_connection_manager.proto#L317. Thus, I will switch the behavior of a nil route specifier on server side to the same as it was on client side, and NACK any resource that comes in server side with a nil route specifier.

@howardjohn
Copy link
Contributor Author

It seems to require Route_NonForwardingAction which wasn't even added to Envoy until 1.19 (3 mo ago). We shouldn't expect the control plane is always newer than clients and can keep up with all of the changing requirements

@zasweq
Copy link
Contributor

zasweq commented Nov 2, 2021

In regards to the change for the configuration server side needed (aka empty RDS, unlike previously where empty route configuration would lead to RPC's proceeding as normal on server side): "If they don't have RouteConfiguration, then all RPCs fail, but the config is ACKed
That mainly happens when using RDS and RDS fails."

@ejona86
Copy link
Member

ejona86 commented Nov 2, 2021

I'm confused by this issue. It sounds like you are surprised. We spoke specifically about this issue on Friday with you in the internal email thread.

CC @lidizheng

@howardjohn
Copy link
Contributor Author

That one was about empty HTTP filter list, which broke us + was fixed a few weeks ago (since we were tracking the master branch). Very similar issue though, certainly.

This one needs something on gRPC side I think - now looking into it seems like a more explicit NACK (#4925) may be appropriate, but currently we hit code that says "this should never happen". We can/will also update the route config we send

@zasweq
Copy link
Contributor

zasweq commented Nov 2, 2021

Explicit NACK, and then some way for users to know that with this change RouteConfiguration needs to be valid and incoming RPCs need to match to type NFA on Server Side now that RBAC HTTP Filter is merged seems reasonable to me.

@ejona86
Copy link
Member

ejona86 commented Nov 2, 2021

That one was about empty HTTP filter list

Ah. You're right. Yeah, both are the Router/L7 plumbing.

was fixed a few weeks ago (since we were tracking the master branch)

Huh. I wonder why this wasn't caught sooner then. I'd have expected them to be noticed around the same time.

now looking into it seems like a more explicit NACK (#4925) may be appropriate

Yeah, @zasweq mentioned above it should NACK but isn't:

Thus, I will switch the behavior of a nil route specifier on server side to the same as it was on client side, and NACK any resource that comes in server side with a nil route specifier.

@dfawley
Copy link
Member

dfawley commented Nov 10, 2021

This should be addressed by #4925

@dfawley dfawley closed this as completed Nov 10, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants