Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading from v1.3.11 to v1.4.2 causes a goroutine stack overflow #9267

Closed
ybstaked opened this issue Jul 23, 2021 · 11 comments
Closed

Upgrading from v1.3.11 to v1.4.2 causes a goroutine stack overflow #9267

ybstaked opened this issue Jul 23, 2021 · 11 comments
Assignees
Labels
Bug Something isn't working

Comments

@ybstaked
Copy link
Contributor

🐞 Bug Report

Description

Upgrading from v1.3.11 to v1.4.2 causes a goroutine stack overflow

Has this worked before in a previous version?

Yes, the previous version in which this bug was not present was:

v1.3.11

🔬 Minimal Reproduction

🔥 Error




time="2021-07-23 14:55:54" level=info msg="Starting API middleware" API middleware address="0.0.0.0:3501" prefix=gateway
time="2021-07-23 14:55:54" level=info msg="Starting gRPC gateway" address="0.0.0.0:4001" prefix=gateway
time="2021-07-23 14:55:55" level=info msg="Connected to eth1 proof-of-work chain" endpoint="https://rpc.eth-goerli.staging-us-east-2.staked.cloud:443" prefix=powchain
runtime: goroutine stack exceeds 1000000000-byte limit
runtime: sp=0xc075f84340 stack=[0xc075f84000, 0xc095f84000]
fatal error: stack overflow

runtime stack:
runtime.throw(0x5ed89b, 0xe)
        GOROOT/src/runtime/panic.go:1117 +0x72
runtime.newstack()
        GOROOT/src/runtime/stack.go:1069 +0x7ed
runtime.morestack()
        GOROOT/src/runtime/asm_amd64.s:458 +0x8f

goroutine 316 [running]:
runtime.rawstringtmp(0xc075f84488, 0x2e, 0x0, 0x0, 0x0, 0x0, 0x0)
        GOROOT/src/runtime/string.go:126 +0xad fp=0xc075f84350 sp=0xc075f84348 pc=0x133a4cd
runtime.concatstrings(0xc075f84488, 0xc075f84430, 0x2, 0x2, 0x2e, 0x7f79297ac5b8)
        GOROOT/src/runtime/string.go:50 +0xc5 fp=0xc075f843e8 sp=0xc075f84350 pc=0x1339ee5
runtime.concatstring2(0xc075f84488, 0xc0107ade00, 0x2d, 0xc00a755954, 0x1, 0x2e, 0x0)
        GOROOT/src/runtime/string.go:59 +0x47 fp=0xc075f84428 sp=0xc075f843e8 pc=0x133a167
net/http.(*ServeMux).shouldRedirectRLocked(0xc0001fee40, 0xc0107ade00, 0x2d, 0xc00a755954, 0x1, 0x1)
        GOROOT/src/net/http/server.go:2347 +0x94 fp=0xc075f844e8 sp=0xc075f84428 pc=0x1652a94
net/http.(*ServeMux).redirectToPathSlash(0xc0001fee40, 0xc0107ade00, 0x2d, 0xc00a755954, 0x1, 0xc0107b4cf0, 0xc01c36f6e8, 0x2)
        GOROOT/src/net/http/server.go:2333 +0x6b fp=0xc075f84538 sp=0xc075f844e8 pc=0x165288b
net/http.(*ServeMux).Handler(0xc0001fee40, 0xc025d83b00, 0x1652f7c, 0xc0001fee40, 0xc0107ade00, 0x2d)
        GOROOT/src/net/http/server.go:2404 +0x10d fp=0xc075f84690 sp=0xc075f84538 pc=0x1652d8d
net/http.(*ServeMux).ServeHTTP(0xc0001fee40, 0x95d308, 0xc00d0002a0, 0xc025d83b00)
        GOROOT/src/net/http/server.go:2447 +0x17b fp=0xc075f846f0 sp=0xc075f84690 pc=0x16535db
github.com/rs/cors.(*Cors).Handler.func1(0x95d308, 0xc00d0002a0, 0xc025d83b00)
        external/com_github_rs_cors/cors.go:219 +0x1b9 fp=0xc075f84748 sp=0xc075f846f0 pc=0x2217a59
net/http.HandlerFunc.ServeHTTP(0xc0242d2c80, 0x95d308, 0xc00d0002a0, 0xc025d83b00)
        GOROOT/src/net/http/server.go:2069 +0x44 fp=0xc075f84770 sp=0xc075f84748 pc=0x1651684
github.com/prysmaticlabs/prysm/beacon-chain/gateway.DefaultConfig.func1(0x93f760, 0xc0242d2c80, 0x95d308, 0xc00d0002a0, 0xc025d83b00)
        beacon-chain/gateway/helpers.go:69 +0x4f fp=0xc075f847a0 sp=0xc075f84770 pc=0x222050f
github.com/prysmaticlabs/prysm/shared/gateway.(*Gateway).Start.func1(0x95d308, 0xc00d0002a0, 0xc025d83b00)
        shared/gateway/gateway.go:135 +0x59 fp=0xc075f847d8 sp=0xc075f847a0 pc=0x221f339
net/http.HandlerFunc.ServeHTTP(0xc0242d2ca0, 0x95d308, 0xc00d0002a0, 0xc025d83b00)
        GOROOT/src/net/http/server.go:2069 +0x44 fp=0xc075f84800 sp=0xc075f847d8 pc=0x1651684
net/http.(*ServeMux).ServeHTTP(0xc0001fee40, 0x95d308, 0xc00d0002a0, 0xc025d83b00)
        GOROOT/src/net/http/server.go:2448 +0x1ad fp=0xc075f84860 sp=0xc075f84800 pc=0x165360d
github.com/rs/cors.(*Cors).Handler.func1(0x95d308, 0xc00d0002a0, 0xc025d83b00)
        external/com_github_rs_cors/cors.go:219 +0x1b9 fp=0xc075f848b8 sp=0xc075f84860 pc=0x2217a59
net/http.HandlerFunc.ServeHTTP(0xc0242d2c80, 0x95d308, 0xc00d0002a0, 0xc025d83b00)

🌍 Your Environment

Operating System:

  
Kubernetes
  

What version of Prysm are you running? (Which release)

  
v1.4.2
  

Anything else relevant (validator index / public key)?

@ahadda5
Copy link
Contributor

ahadda5 commented Jul 23, 2021

#9217 this was caused by manually calling the gRPC server using curls with bogus params/format ,which the server couldn't handle.
So you get this from the get go as the log shows, correct?

@ybstaked
Copy link
Contributor Author

I am not calling any curl commands manually.... How to check for what params were passed or what was called? Seems like logging can be improved for better triaging.

@ahadda5
Copy link
Contributor

ahadda5 commented Jul 23, 2021

In such a case ignore what i said about the curl calls. this a concerning issue.

@rkapka
Copy link
Contributor

rkapka commented Jul 23, 2021

Thanks for reporting @ybstaked. Can you please provide the list of flags passed to the beacon node?

@rkapka rkapka added the Bug Something isn't working label Jul 23, 2021
@rkapka rkapka self-assigned this Jul 23, 2021
@ybstaked
Copy link
Contributor Author

@rkapka here you go:

exec beacon-chain --accept-terms-of-use --datadir /home/prysm/data --rpc-host 0.0.0.0 --rpc-port 4000 --grpc-gateway-host 0.0.0.0 --grpc-gateway-port 4001 --log-file /tmp/ethereum2.log --b │
│ lock-batch-limit=512 --http-web3provider=https://rpc.eth-goerli.staging-us-east-2.cloud:443 --slots-per-archive-point 128 --pyrmont

@nisdas
Copy link
Member

nisdas commented Jul 24, 2021

hey @ybstaked ,
were you running any other services/sidecars that might be pinging the gateway port ? For prysm, beacon node <-> validator communication is through grpc only, so any requests to the REST api , could only be done by an external process. I do agree we should have better logging for all incoming requests so its easier to debug issues like this.

@lenhartjames
Copy link

lenhartjames commented Jul 24, 2021

I'm getting a similar error:

runtime: goroutine stack exceeds 1000000000-byte limit
runtime: sp=0xc0a573a340 stack=[0xc0a573a000, 0xc0c573a000]
fatal error: stack overflow

runtime stack:
runtime.throw(0x5ed89b, 0xe)
	GOROOT/src/runtime/panic.go:1117 +0x72
runtime.newstack()
	GOROOT/src/runtime/stack.go:1069 +0x7ed
runtime.morestack()
	GOROOT/src/runtime/asm_amd64.s:458 +0x8f

goroutine 10319363 [running]:

After running:

./prysm.sh beacon-chain --http-web3provider=http://localhost:8545 --pyrmont --grpc-gateway-host=0.0.0.0 --rpc-host=0.0.0.0

I am connecting to it with a remote validator.
I tried restarting and no outside connection. Same result.

@lenhartjames
Copy link

Hi Guys - after some more troubleshooting I've managed to get it to run without issue.

Using the --grpc-gateway-host and --rpc-host tags required additional security and setup. My beacon node is on AWS, so I blocked all incoming requests over 4000 and whitelisted my local IP.

It's ran normally ever since even with my remote validator connected.

@rkapka
Copy link
Contributor

rkapka commented Jul 26, 2021

I am afraid that the stack overflow error is hiding another issue. @ybstaked / @lenhartjames , can either of you please run the not working node setup either from the docker image prysmaticlabs/prysm-beacon-chain:latest or from source using the develop branch? Both should have the overflow bug fixed.

@ybstaked
Copy link
Contributor Author

@rkapka unfortunately, we build our own images off of proper release tags in our setups, rather than latest/develop branches. Can you please update when this is tested and is in a proper release?

Relates to #9246 and #9264

@prestonvanloon
Copy link
Member

Duplicate of #9246

@prestonvanloon prestonvanloon marked this as a duplicate of #9246 Aug 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants