-
Notifications
You must be signed in to change notification settings - Fork 134
Description
When running the EndpointPicker as an Envoy route filter under Istio Gateway, the EndpointPicker does not get the ResponseBody gRPC messages. The code expects these messages to gather metrics, when the model isn't streaming results.
This happens due to the Envoy configuration generated by The Istio Gateway code.
This can be seen by connecting to port 15000 (the admin port) of the Istio pod and looking at a dump of the config, in which one sees:
"typed_per_filter_config": {
"envoy.filters.http.ext_proc": {
"@type": "type.googleapis.com/envoy.extensions.filters.http.ext_proc.v3.ExtProcPerRoute",
"overrides": {
"processing_mode": {
"request_header_mode": "SEND",
"request_body_mode": "FULL_DUPLEX_STREAMED"
},
"grpc_service": {
"envoy_grpc": {
"cluster_name": "outbound|9002||endpoint-picker.default.svc.cluster.local"
}
}
}
}
}
The EndpointPicker gets the ResponseHeader gRPC messages as the default for "request_header_mode" is "SEND". However the default for "response_body_mode" is "NONE".
It should also be noted that as the "allow_mode_override" was not specified, its value defaults to false, and the EndpointPicker can not override this if we wanted to conditionally get the response for processing.
The above configuration matches the updates made to Istio's Gateway code in the branch experimental-gwapi-inference-extension in the file pilot/pkg/networking/core/route/route.go
in lines 527 to 531.
I assume we want this changed. I think the question is do we always want to receive the body or only sometimes, for example if make it optional to gather the statistics or there are plugins that want to process the response (an optimization to prefix aware routing, is to add the response to the already indexed prefix, as in the case of chats it will be part of future requests).