Skip to content

golang.org/x/net/http2/h2c: GCP Cloud Run Sporadic 502 Bad Gateway #73343

Closed as not planned
@RPGillespie6

Description

@RPGillespie6

Go version

go1.23.6 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/user/.cache/go-build'
GOENV='/home/user/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/user/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/user/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.6'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/user/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build460801396=/tmp/go-build -gno-record-gcc-switches'

What did you do?

I have many Cloud Run applications in GCP that are all golang. I have Cloud Run HTTP2 enabled which means my go servers need h2c.

Around January of this year, all of my go apps started sporadically producing 502 Bad Gateway responses. This indicates something changed on Google's side. However, through trial and error I've narrowed down at least part of the issue to golang's h2c package. I opened a GCP community discussion here but the jist of it is:

  • The 502 Bad Gateways mainly seem to happen when uploading files (POST + request body)
  • Size of the request body seems to exacerbate issue
  • Issue doesn't happen frequently on my or any other user's PCs, maybe 1 out of 20 uploads or less
  • The issue happens extremely frequently in GCP's "Cloud Build" environment (sometimes 50% of the time or more)
  • This is how I discovered the issue - starting January of this year uploads in our Cloud Build pipelines started failing with 502s at an annoyingly high rate (50% or more)
  • Cloud Build has very fast upload speeds (1Gbps), so I wonder if that could be a factor
  • If I use HTTP1 the problem goes away and I can't get any 502s
  • If I use Node.js instead of golang the problem also goes away and I can't get any 502s no matter how big of files I upload

Here's the go code I deployed to Cloud Run:

package main

import (
	"io"
	"log"
	"net/http"

	"golang.org/x/net/http2"
	"golang.org/x/net/http2/h2c"
)

func main() {
	http.HandleFunc("/upload", uploadDataHandler)
	server := &http.Server{
		Addr:    ":8080",
		Handler: h2c.NewHandler(http.DefaultServeMux, &http2.Server{}),
	}
	log.Fatal(server.ListenAndServe())
}

func uploadDataHandler(w http.ResponseWriter, r *http.Request) {
	if r.Method != http.MethodPost {
		http.Error(w, "Invalid request method", http.StatusMethodNotAllowed)
		return
	}

	maxUploadSize := int64(100 * 1024 * 1024) // 100MB
	err := r.ParseMultipartForm(maxUploadSize)
	if err != nil {
		http.Error(w, "Failed to parse multipart form", http.StatusBadRequest)
		log.Println("Error parsing form:", err)
		return
	}

	file, _, err := r.FormFile("file")
	if err != nil {
		http.Error(w, "Could not read 'file' form value in payload", http.StatusBadRequest)
		log.Println("Error reading form file:", err)
		return
	}
	defer file.Close()

	data, err := io.ReadAll(file)
	if err != nil {
		http.Error(w, "Failed to read file", http.StatusInternalServerError)
		log.Println("Error reading file:", err)
		return
	}

	log.Printf("Received file of size %d bytes\n", len(data))

	w.WriteHeader(http.StatusOK)
	w.Write([]byte(`{"message": "file uploaded successfully"}`))
}

Dockerfile:

from scratch

ADD ./h2cissue /h2cissue
EXPOSE 8080
ENTRYPOINT ["/h2cissue"]

Command to run from Cloud Build to trigger failure:

dd if=/dev/urandom of=random_file bs=1M count=5
curl -X POST -F "file=@random_file" https://h2cissue-664297616426.us-central1.run.app/upload

About 20% of the time you'll get:

upstream connect error or disconnect/reset before headers. reset reason: protocol error

This Node.js implementation, however, never fails:

const http2 = require('http2');
const PORT = process.env.PORT || 8080;

const server = http2.createServer();

server.on('stream', (stream, headers) => {
  const method = headers[':method']
  const path = headers[':path']

  if (method === 'POST' && path === '/upload') {
    // Read in form data file and print the length
    const chunks = [];
    stream.on('data', chunk => {
      chunks.push(chunk);
    });
    stream.on('end', () => {
      const data = Buffer.concat(chunks);
      console.log(`Received ${data.length} bytes of data`);
      stream.respond({ ':status': 200 });
      stream.end('File received');
    });
  }
});

server.listen(PORT, () => {
  console.log(`Server is listening on port ${PORT}`);
})

What did you see happen?

I'm seeing sporadic protocol errors floating out of golang's h2c implementation

What did you expect to see?

I expected golang to be more robust than Node.js! In all seriousness though, I am eager to help investigate/debug further, any tips would be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReportIssues describing a possible bug in the Go implementation.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions