Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application Server panic while running the JavaScript message decoder #2966

Closed
adriansmares opened this issue Jul 23, 2020 · 1 comment
Closed
Assignees
Labels
bug Something isn't working c/application server This is related to the Application Server

Comments

@adriansmares
Copy link
Contributor

adriansmares commented Jul 23, 2020

Summary

The Application Server may panic while running the JavaScript message decoder. This is obviously not the intended behavior but tracking the bug has proven to be non-trivial.

References https://github.com/TheThingsIndustries/lorawan-stack/issues/2262

Steps to Reproduce

  1. Run an uplink through the JavaScript message decoder.
  2. Have a particularly bad day.
  3. The AS panics and no recovers help it anymore.

What do you see now?

As part of the panic report, the following frames stand out:

1595501567843,runtime: unexpected return pc for go.thethings.network/lorawan-stack/v3/pkg/messageprocessors/javascript.(*host).Decode called from 0x0,ttes/ttes/267215b77e66421386fdaf295b67b1e0
1595501567843,fatal error: unknown caller pc,ttes/ttes/267215b77e66421386fdaf295b67b1e0

What do you want to see instead?

No panic, obviously.

Environment

Cloud Hosted, 3.8.7

How do you propose to implement this?

I don't have a good reproduction for this, so no good fix for this.

Go 1.14 introduced some optimizations regarding defers . A follow up to these changes has been golang/go#37664, which describes an issue that looks like ours: a defer/recover chain, which we do have as part of the JavaScript message processor:

defer func() {
if caught := recover(); caught != nil {
switch val := caught.(type) {
case error:
err = errRuntime.WithCause(val)
default:
err = errRuntime
}
return
}
}()
vm.Interrupt = make(chan func(), 1)
ctx, cancel := context.WithTimeout(ctx, j.options.Timeout)
defer cancel()
go func() {
<-ctx.Done()
vm.Interrupt <- func() {
panic(context.DeadlineExceeded)
}
}()

(this sequence is used for execution time limits, and it's encouraged by otto - https://github.com/robertkrimen/otto#halting-problem)

The (temporary) fix there was to disable optimizations, by compiling the binaries using -gcflags="all=-N". Note that we can make this part of goreleaser:

diff --git a/.goreleaser.yml b/.goreleaser.yml
index 4bf53dbb4..7070cee86 100644
--- a/.goreleaser.yml
+++ b/.goreleaser.yml
@@ -16,6 +16,8 @@ builds:
     - -X go.thethings.network/lorawan-stack/v3/pkg/version.BuildDate={{.Date}}
     - -X go.thethings.network/lorawan-stack/v3/pkg/version.GitCommit={{.ShortCommit}}
     - -X go.thethings.network/lorawan-stack/v3/pkg/version.TTN={{.Version}}
+    gcflags:
+      - all=-N
     env:
       - CGO_ENABLED=0
     goos:

golang/go#37664 has been closed and fixed as part of go 1.14.1, and 3.8.7 has been compiled with go 1.14.5, which means that we are probably encountering another issue. But golang/go#35158 claims some remnants still exist, so I'm not sure if it's fully fixed - if we encounter this crash too often we may want to experiment with optimizations-disabled builds. prysmaticlabs/prysm#5131 (comment) is another reference.

How do you propose to test this?

This is a heisenbug. Unless we have a good reproduction system we cannot really test it reliably.

Can you do this yourself and submit a Pull Request?

Yes, when additional information becomes available (more crashes, more details from the Go team due to someone else encountering this).

@adriansmares adriansmares added bug Something isn't working c/application server This is related to the Application Server labels Jul 23, 2020
@adriansmares adriansmares added this to the Backlog milestone Jul 23, 2020
@johanstokking
Copy link
Member

References #2670; this problem might disappear when we switch from otto to goja

@htdvisser htdvisser removed this from the Backlog milestone Jun 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working c/application server This is related to the Application Server
Projects
None yet
Development

No branches or pull requests

3 participants