Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc_http1_reverse_bridge fails to propagate connection errors in responses #37713

Open
epk opened this issue Dec 17, 2024 · 3 comments
Open
Labels

Comments

@epk
Copy link

epk commented Dec 17, 2024

grpc_http1_reverse_bridge fails to propagate connection errors in responses

Description:

grpc_http1_reverse_bridge when configured with either of the following options will fail to propagate connection errors in responses:

  • content_type to set anything other than application/grpc
  • response_size_header config option set

Repro steps:

  • Run Envoy with the example configurations in this issue (it is pointing to an non existent upstream socket)
  • Invoke a dummy gRPC client pointed at envoy:
    package main
    
    import (
        "context"
        "crypto/tls"
        "fmt"
        "log"
        "time"
    
        "google.golang.org/grpc"
        "google.golang.org/grpc/credentials"
        "google.golang.org/grpc/health/grpc_health_v1"
    )
    
    func main() {
        // Set up a connection to the server
        conn, err := grpc.NewClient("0.0.0.0:8443",
      	  grpc.WithTransportCredentials(credentials.NewTLS(&tls.Config{})),
      	  grpc.WithAuthority("envoy.shopifycloud.io"),
        )
        if err != nil {
      	  log.Fatalf("failed to connect: %v", err)
        }
        defer conn.Close()
    
        conn.Connect()
        // Create a health client
        healthClient := grpc_health_v1.NewHealthClient(conn)
    
        // Set up a context with a timeout
        ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
        defer cancel()
    
        // Perform the health check
        resp, err := healthClient.Check(ctx, &grpc_health_v1.HealthCheckRequest{})
        if err != nil {
      	  log.Fatalf("failed to execute gRPC call: %v", err)
        }
    
        // Check the status
        if resp.Status == grpc_health_v1.HealthCheckResponse_SERVING {
      	  fmt.Println("Service is healthy")
        } else {
      	  fmt.Printf("Service is not healthy. Status: %v\n", resp.Status)
        }
    }

content_type: application/protobuf and response_size_header set

Config:

envoy --version
envoy  version: 7b8baff1758f0a584dcc3cb657b5032000bcb3d7/1.31.0/Distribution/RELEASE/BoringSSL
static_resources:
  listeners:
  - name: listener_tcp
    address:
      socket_address:
        protocol: TCP
        address: 0.0.0.0
        port_value: 8443
    filter_chains:
    - transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
          common_tls_context:
            alpn_protocols: [ "h2,http/1.1" ]
            tls_certificates:
            - certificate_chain: {filename: "/opt/dev/var/ssl/shopifycloud.io/fullchain.cer"}
              private_key: {filename: "/opt/dev/var/ssl/shopifycloud.io/star.shopifycloud.io.key"}
      filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          codec_type: AUTO
          access_log:
            - name: envoy.access_loggers.stdout
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
                log_format:
                  text_format: |
                    [%START_TIME%] %REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% -> %RESPONSE_CODE% (%DURATION%ms) - %RESPONSE_FLAGS%
                    Response Headers:
                      grpc-status: %RESP(GRPC-STATUS)%
                      grpc-message: %RESP(GRPC-MESSAGE)%
                      grpc-status-details-bin: %RESP(GRPC-STATUS-DETAILS-BIN)%
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  prefix_rewrite: "/grpc/"
                  cluster: default
          http_filters:
          - name: envoy.filters.http.grpc_http1_reverse_bridge
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.grpc_http1_reverse_bridge.v3.FilterConfig
              content_type: application/protobuf
              response_size_header: "content-length"
              withhold_grpc_frames: true
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
  clusters:
  - name: default
    connect_timeout: 0.25s
    type: STATIC
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: default
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 1037

Logs:

[2024-12-17T19:54:10.123Z] POST /grpc.health.v1.Health/Check -> 200 (0ms) - UF
Response Headers:
  grpc-status: 2
  grpc-message: envoy reverse bridge: upstream responded with unsupported content-type application/grpc, status code 200
  grpc-status-details-bin: -

Client Logs:

go run main.go
2024/12/17 11:54:10 failed to execute gRPC call: rpc error: code = Unknown desc = envoy reverse bridge: upstream responded with unsupported content-type application/grpc, status code 200
exit status 1

content_type: application/grpc and response_size_header set

Config

          - name: envoy.filters.http.grpc_http1_reverse_bridge
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.grpc_http1_reverse_bridge.v3.FilterConfig
              content_type: application/grpc
              response_size_header: "content-length"
              withhold_grpc_frames: true

Logs

[2024-12-17T20:13:43.336Z] POST /grpc.health.v1.Health/Check -> 200 (0ms) - UF
Response Headers:
  grpc-status: 13
  grpc-message: envoy reverse bridge: upstream did not set content length
  grpc-status-details-bin: -

Client Logs:

go run main.go
2024/12/17 12:13:43 failed to execute gRPC call: rpc error: code = Internal desc = envoy reverse bridge: upstream did not set content length
exit status 1

content_type: application/grpc without response_size_header set

Config:

          - name: envoy.filters.http.grpc_http1_reverse_bridge
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.grpc_http1_reverse_bridge.v3.FilterConfig
              content_type: application/grpc
              withhold_grpc_frames: true

Logs:

^[[A[2024-12-17T20:21:25.136Z] POST /grpc.health.v1.Health/Check -> 200 (0ms) - UF
Response Headers:
  grpc-status: 14
  grpc-message: upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed connect error: Connection refused
  grpc-status-details-bin: -

Client Logs

go run ./hack/example-client
Service is not healthy. Status: UNKNOWN

I would expect the first two cases to behave similarly as the last one in terms of propagating the underlying connection errors to the client (UNAVAILABLE | 14)

@epk epk added bug triage Issue requires triage labels Dec 17, 2024
@epk
Copy link
Author

epk commented Dec 17, 2024

prototyping a fix in Lua and the following works:

          http_filters:
          - name: envoy.filters.http.grpc_http1_reverse_bridge
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.grpc_http1_reverse_bridge.v3.FilterConfig
              content_type: application/protobuf
              response_size_header: "content-length"
              withhold_grpc_frames: true
          - name: envoy.filters.http.lua
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua
              inline_code: |
                function envoy_on_response(response_handle)
                  local grpc_status = response_handle:headers():get("grpc-status")
                  local content_type = response_handle:headers():get("content-type")

                  -- if content type is application/grpc AND grpc-status is 14 (UNAVAILABLE), then
                  -- rewrite content-type to application/protobuf
                  -- add fixed content-length header
                  if content_type == "application/grpc" and grpc_status == "14" then
                    response_handle:headers():replace("content-type", "application/protobuf")
                    response_handle:headers():replace("content-length", "0")
                  end
                end
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

@adisuissa adisuissa removed the triage Issue requires triage label Dec 18, 2024
@adisuissa
Copy link
Contributor

cc @zuercher @mattklein123 as codeowners

@zuercher
Copy link
Member

zuercher commented Dec 19, 2024

I don't think there's a good way to detect that the upstream error was generated by Envoy, which is really the case where you'd want to change the behavior.

Absent that I think it would be ok to change the filter to return a grpc status code based on the upstream server's HTTP status code when the response length header isn't set or when the content-type doesn't match the expected value and return the content-type and response length header errors only for otherwise-successful responses. The reason is that for upstream-generated errors, we should continue to proxy the response as it might contain a google.rpc.Status or other valid encoded content. Probably needs to be done with a feature flag as well since the current behavior may now be expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants