Add support for the `recover()` builtin function #2331

aykevl · 2021-11-29T20:33:18Z

Not all architectures are supported yet. In particular, the current design doesn't support WebAssembly. Other architectures (avr, xtensa, riscv64) can be supported but are left as a TODO.

This PR does result in an increase in code size if either defer or panic is used. There is no way to turn this feature off with this PR, but I don't think we should do that for the following reasons:

The code size increase only happens when defer or panic is used (especially defer). If you really care about code size, you may want to avoid those.
Having another knob to configure the compiler is a maintenance burden. We already have quite a few (-gc, -scheduler).

To give an idea of the code size change, here is the difference in binary size before and after the last commit of this PR:

before  after   diff
 10796  10796      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/adxl345/main.go
 13708  13708      0  0.00%  tinygo build -size short -o ./build/test.hex -target=pybadge ./examples/amg88xx
  9088   9088      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/apa102/main.go
 10108  10108      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/apa102/itsybitsy-m0/main.go
  6524   6524      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/at24cx/main.go
 10516  10516      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/bh1750/main.go
  9672   9672      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/blinkm/main.go
 13772  13772      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/bmp180/main.go
 12380  12380      0  0.00%  tinygo build -size short -o ./build/test.hex -target=trinket-m0 ./examples/bmp388/main.go
  6724   6724      0  0.00%  tinygo build -size short -o ./build/test.hex -target=bluepill ./examples/ds1307/sram/main.go
  4396   4396      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/easystepper/main.go
 11096  11096      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/hcsr04/main.go
  5596   5596      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/hd44780/customchar/main.go
  5580   5580      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/hd44780/text/main.go
 11360  11360      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/hd44780i2c/main.go
 15640  15640      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/hub75/main.go
 10780  10780      0  0.00%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/ili9341/basic
 11804  11804      0  0.00%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/ili9341/basic
 29628  29628      0  0.00%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/ili9341/pyportal_boing
 10832  10832      0  0.00%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/ili9341/scroll
 11884  11884      0  0.00%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/ili9341/scroll
 12752  12752      0  0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/lis3dh/main.go
 15304  15304      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/lsm303agr/main.go
 12780  12780      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/lsm6ds3/main.go
 13036  13036      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mag3110/main.go
 10500  10500      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mcp3008/main.go
 10540  10540      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mma8653/main.go
 10388  10388      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mpu6050/main.go
  4444   4444      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/pcd8544/setbuffer/main.go
  4452   4452      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/pcd8544/setpixel/main.go
  2468   2468      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino ./examples/servo
  8732   8732      0  0.00%  tinygo build -size short -o ./build/test.hex -target=pybadge ./examples/shifter/main.go
  5072   5072      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/ssd1306/i2c_128x32/main.go
  5332   5332      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/ssd1306/spi_128x64/main.go
  5116   5116      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/ssd1331/main.go
  5696   5696      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/st7735/main.go
  5448   5448      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/st7789/main.go
 14160  14160      0  0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/thermistor/main.go
  8088   8088      0  0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-bluefruit ./examples/tone
 10592  10592      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/tm1637/main.go
 17064  17064      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/vl53l1x/main.go
  5828   5828      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/waveshare-epd/epd2in13/main.go
  5364   5364      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/waveshare-epd/epd2in13x/main.go
  5684   5684      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/waveshare-epd/epd4in2/main.go
  8692   8692      0  0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/ws2812
  1420   1420      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino   ./examples/ws2812
   720    720      0  0.00%  tinygo build -size short -o ./build/test.hex -target=digispark ./examples/ws2812
 24788  24788      0  0.00%  tinygo build -size short -o ./build/test.hex -target=trinket-m0 ./examples/bme280/main.go
 13528  13528      0  0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/microphone/main.go
 13124  13124      0  0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/buzzer/main.go
 15136  15136      0  0.00%  tinygo build -size short -o ./build/test.hex -target=trinket-m0 ./examples/veml6070/main.go
  8664   8664      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/l293x/simple/main.go
 10072  10072      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/l293x/speed/main.go
  8632   8632      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/l9110x/simple/main.go
 10104  10104      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/l9110x/speed/main.go
 15264  15264      0  0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/lis2mdl/main.go
 10512  10512      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/max72xx/main.go
  6094   6094      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino ./examples/keypad4x4/main.go
  8952   8952      0  0.00%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/pcf8563/clkout/
  9976   9976      0  0.00%  tinygo build -size short -o ./build/test.hex -target=feather-m0 ./examples/ina260/main.go
  8656   8656      0  0.00%  tinygo build -size short -o ./build/test.hex -target=nucleo-l432kc ./examples/aht20/main.go
  7492   7496      4  0.05%  tinygo build -size short -o ./build/test.hex -target=hifive1b ./examples/ssd1351/main.go
 21300  21332     32  0.15%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/espat/espconsole/main.go
 22460  22492     32  0.14%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/espat/esphub/main.go
 22492  22524     32  0.14%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/espat/espstation/main.go
 10912  10952     40  0.37%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mcp23017/main.go
 11356  11396     40  0.35%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mcp23017-multiple/main.go
  6312   6352     40  0.63%  tinygo build -size short -o ./build/test.hex -target=nucleo-f103rb ./examples/shiftregister/main.go
 10016  10072     56  0.56%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/touch/resistive/fourwire/main.go
 12824  12880     56  0.44%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/touch/resistive/pyportal_touchpaint/main.go
  6124   6184     60  0.98%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/microbitmatrix/main.go
 15844  15992    148  0.93%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/wifinina/webclient/main.go
 53660  54460    800  1.49%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/bmi160/main.go
 58404  59316    912  1.56%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/flash/console/spi
 56408  57336    928  1.65%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/flash/console/qspi
 61000  61976    976  1.60%  tinygo build -size short -o ./build/test.hex -target=feather-m4 ./examples/sdcard/console/
247992 249480   1488  0.60%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/ili9341/slideshow
 20004  21596   1592  7.96%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/wifinina/udpstation/main.go
 61028  62724   1696  2.78%  tinygo build -size short -o ./build/test.hex -target=p1am-100 ./examples/p1am/main.go
 52840  54748   1908  3.61%  tinygo build -size short -o ./build/test.hex -target=feather-m0 ./examples/gps/i2c/main.go
 52560  54468   1908  3.63%  tinygo build -size short -o ./build/test.hex -target=feather-m0 ./examples/gps/uart/main.go
 15196  17328   2132 14.03%  tinygo build -size short -o ./build/test.hex -target=bluepill ./examples/ds1307/time/main.go
 29552  31752   2200  7.44%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/wifinina/ntpclient/main.go
 54996  57332   2336  4.25%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mcp2515/main.go
 51460  53876   2416  4.69%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/wifinina/tcpclient/main.go
 61624  64300   2676  4.34%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/pcf8563/alarm/
 60232  62924   2692  4.47%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/pcf8563/time/
 61528  64220   2692  4.38%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/pcf8563/timer/
 49948  52684   2736  5.48%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/adt7410/main.go
 52092  54828   2736  5.25%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/bmp280/main.go
 44932  47668   2736  6.09%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/sht3x/main.go
 63672  67104   3432  5.39%  tinygo build -size short -o ./build/test.hex -target=feather-m0 ./examples/dht/main.go
139744 144244   4500  3.22%  tinygo build -size short -o ./build/test.hex -target=feather-m4 ./examples/sdcard/tinyfs/
 57976  62588   4612  7.96%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/ds3231/main.go
 93948  99040   5092  5.42%  tinygo build -size short -o ./build/test.hex -target=wioterminal ./examples/rtl8720dn/mqttsub/
122832 128836   6004  4.89%  tinygo build -size short -o ./build/test.hex -target=wioterminal ./examples/rtl8720dn/webserver/
124396 130624   6228  5.01%  tinygo build -size short -o ./build/test.hex -target=wioterminal ./examples/rtl8720dn/webclient/
sum: 67968 (2.72%)

What you can see here:

The overall code size increase is 2.72%.
Many examples (about two thirds) don't change in binary size at all. Probably because they don't use defer or panic.
Examples that increased in binary size are usually large examples, >25kB or so. It looks like they import the fmt package, which seems to use defer and /recover.

niaow · 2021-11-29T21:34:59Z

compiler/defer.go

+	//   * The return value (eax, rax, r0, etc) is set to zero in the inline
+	//     assembly but set to an unspecified non-zero value when jumping using
+	//     a longjmp.
+	asmType := llvm.FunctionType(b.uintptrType, []llvm.Type{b.deferFrame.Type()}, false)


Are we sure that this is safe? As is, I think the compiler is still permitted to assume that this returns once.

I have tried several alternatives, and this was the most reliable one so far.

I've tried:

setjmp at the beginning of a function. LLVM is smart enough to optimize away some defer instructions later after a longjmp, so this is not usable.

Using exception handling instructions together with blockaddress. Basically, act as if we're unwinding as usual with invoke and landingpad, but instead of actually unwinding, return back to the function by capturing the stack pointer at function entry and doing longjmp by jumping back to the basic block with the landingpad instruction with a captured blockaddress. Doesn't work because LLVM moves the landingpad instruction to a different basic block after which the entire system falls down.

The only other alternative is using setjmp directly, which would be much more expensive than it is already (because we need to do it before every function call that might panic).
I think this is safe because this is what LLVM says about returns_twice:

This attribute indicates that this function can return twice. The C setjmp is an example of such a function. The compiler disables some optimizations (like tail calls) in the caller of these functions.

Tail calls are not relevant to inline assembly. I'm not 100% convinced, but it appears to be mostly safe.
(Sidenote: I might investigate using the callbr instruction instead, which might be slightly less expensive by avoiding a compare and branch).

... actually, this gives me an idea. It might be possible to change this scheme to do the setjmp after every defer instruction, instead of before every call. Not entirely sure whether that's safe though.

Does gollvm support defer/recover? How does it convince llvm to do the Right Thing? ?

It looks like gollvm uses full C++ exceptions. I don't think that is practical here.

I might start using C++ exceptions on some platforms (Windows, Linux, MacOS) but wanted to start with something simple and mostly portable.

@niaow I'm afraid you are correct. It looks like the remaining issue is a tail call on RISC-V, which would probably have been prevented with a returns_twice. Now thinking what the best solution to this is...

Well that was easy. Apparently it's possible to add returns_twice as a call site attribute, even to inline assembly.

} asm := llvm.InlineAsm(asmType, asmString, constraints, false, false, 0) result := b.CreateCall(asm, []llvm.Value{b.deferFrame}, "setjmp") + result.AddCallSiteAttribute(-1, b.ctx.CreateEnumAttribute(llvm.AttributeKindID("returns_twice"), 0)) isZero := b.CreateICmp(llvm.IntEQ, result, llvm.ConstInt(b.uintptrType, 0, false), "setjmp.result") continueBB := b.insertBasicBlock("") b.CreateCondBr(isZero, continueBB, b.landingpad)

This fixes the miscompilation on RISC-V.

(Sidenote: I might investigate using the callbr instruction instead, which might be slightly less expensive by avoiding a compare and branch).

It seems like that might avoid more than just a compare and branch. I think something like this could theoretically work:

callbr void asm returns_twice "", "=~{rax},..."() to label %fallthrough [label %indirect]

Seems a bit cursed though.

I'm not sure what the code does there? I see no assembly.
My biggest hope is that I can manage to avoid clobbering all registers for every function call, and instead can hopefully do that in the %lpad block. That will certainly improve code quality. But I'd like to do that later, first I'd like to get this PR actually working.

ysoldak · 2021-12-07T20:21:44Z

Thank you for working on this.

Verified, works for my use case: reset system on panic.
Board verified: Nano RP2040 Connect ("nano-rp2040" target)

Stats for example code below

   code  rodata    data     bss |   flash     ram | package
------------------------------- | --------------- | -------
   5029     663       4    2264 |    5696    2268 | total <-- w/o change (dev branch head)
   5524     668       4    2264 |    6196    2268 | total <-- w/  change (recover3 branch head)

Example code

//go:build cortexm
// +build cortexm

package main

import (
	"device/arm"
	"time"
)

// This example shows how to reset system on panic.
// In your application you may: reset, blink an LED to indicate failure, or do something else.

func main() {
	defer resetOnPanic()
	println("START")
	for i := 0; i < 5; i++ {
		println(".")
		time.Sleep(time.Second)
	}
	panic("AAA!!!111")
}

func resetOnPanic() {
	if r := recover(); r != nil {
		println("PANIC")
		time.Sleep(time.Second)
		arm.SystemReset()
	}
}

aykevl · 2021-12-09T19:43:26Z

Rebased and fixed two bugs, hopefully the tests pass this time.

Verified, works for my use case: reset system on panic.
Board verified: Nano RP2040 Connect ("nano-rp2040" target)

Note that this PR is only for when the code explicitly calls panic. It doesn't cover runtime panics or HardFault_Handler.

ysoldak · 2021-12-09T22:24:12Z

Note that this PR is only for when the code explicitly calls panic. It doesn't cover runtime panics or HardFault_Handler.

Oh, I see. Means I can’t use it yet, but thanks for the clarification!

aykevl · 2021-12-10T00:11:30Z

I've now also added a new flag: -unwind. It can be set to -unwind=none if you don't want to support recover or running deferred functions, but for supported platforms it defaults to -unwind=simple. With -unwind=none, nearly all binaries that I've tested remain at the same size so this can be useful for reduce binary size.
In the future, I might want to add different unwind strategies. For example WebAssembly exception handling, Linux libunwind, Windows SEH, etc.

Feel free to suggest different names/values for this flag :)

niaow · 2021-12-10T00:17:43Z

compiler/compiler_test.go

@@ -56,6 +56,7 @@ func TestCompiler(t *testing.T) {
 		{"float.go", "", ""},
 		{"interface.go", "", ""},
 		{"func.go", "", "coroutines"},
+		{"defer.go", "", ""},


Can we maybe do this on a target that can actually run recover so that we know that the code gen is actually working?

ysoldak · 2021-12-10T19:19:59Z

Feel free to suggest different names/values for this flag :)

For me, as an end user who knows little to nothing about compilers and tinygo internals, -defer=simple/none would be preferable. It clearly indicates that part of user-faced functionality is disabled.

Will defer at all work with "none" value, but "recover()" will be no-op? In that case probably -recover=simple/none would be more appropriate then...

aykevl · 2021-12-10T20:20:57Z

With -unwind=none, no deferred functions will run with a panic. Returning a value from recover() is simple, running deferred functions on panic is hard.

I'm not sure about -defer=none though. The defer keyword will still work with -unwind=recover (as it always has), it just won't work on panic.

So in summary, before this PR and with -unwind=none:

Smaller binaries.
defer works, but only in normal cases. Not when panicking.
recover() always returns nil.

With -unwind=simple:

Slightly bigger binaries when using defer and the panic builtin.
defer runs when it is supposed to: both when a function returns and on panic.
recover() works almost according to the Go spec (there is one relatively small exception).

So in summary, the flag controls whether jumping back up the stack and running defers works when panic() is called.

ysoldak · 2021-12-10T20:30:42Z

I see.

Any benefit of making defer no-op, size-wise or else?

If so, I can imagine -defer flag with following values:

-defer=none <-- defer no-op, completely disabled
-defer=simple <-- as it it now, equals -unwind=none (alternatively "nopanic")
-defer=extended <-- default? equals -unwind=simple
-defer=full <-- in some future when we support runtime panics and hard faults, if that is possible of course

aykevl · 2021-12-10T20:50:57Z

Any benefit of making defer no-op, size-wise or else?

Not much. I don't think we should do that anyway, because it diverges from the Go spec too much. If the code size increase is too much, then you shouldn't use defer.

-defer=full <-- in some future when we support runtime panics and hard faults, if that is possible of course

I would like to add support for recovering from most runtime panics, such as nil pointer dereference, slice out of bounds, etc.
I'm not so sure about hard faults. While it is technically possible to recover from them, I don't think this is something we should support. A hard fault only happens when something has gone terribly wrong, which means there is an unknown bug in the system.

My idea about the -unwind flag would be more like this:

-unwind=none: current behavior
-unwind=simple: this PR
-unwind=sjlj: (possible, probably unnecessary): use C setjmp and longjmp functions
-unwind=exception: use C++ exception mechanism, as appropriate for the target (SEH on Windows, DWARF unwinding on Linux and MacOS, WebAssembly exception handling proposal on WebAssembly, it can perhaps even be supported on MCUs using .ARM.exidx etc sections!). Probably larger code size, but is "zero-cost" (mostly) in the no-exception case.

But the only thing most people need to know is -unwind=none. The default should be chosen appropriately for the target. The -unwind=none flag can then be used

Alternative idea: add a possible value to the -panic= flag. Right now the -panic flag can have the values print (the default) and trap (don't print to reduce code size, instead trigger a SIGILL like signal). We could add unwind so that the default for most platforms is unwind, for some print, and people can change the default to print or trap depending on how much they use panics.

ysoldak · 2021-12-10T21:03:41Z

Add -panic=reset ? :)
You know my pain.

In microcontroller world, IoT specifically, it can be beneficial to have a possibility and make the tiny device restart on some unexpected state instead of hanging up.

Even big OSes restart, sometimes, on hard faults :D
Remember the times a faulty DRAM module would reboot my PC unexpectedly, haha.

aykevl · 2021-12-10T21:24:16Z

Add -panic=reset ? :)
You know my pain.

What kind of issue were you hitting, a panic or a hard fault? Those two are similar but different.

ysoldak · 2021-12-10T21:57:57Z

What kind of issue were you hitting, a panic or a hard fault? Those two are similar but different.

I believe runtime panics nowadays, like nil pointer. Mostly from network stack, I recon, wifinina driver, http client, all that.
These kind of bugs is hard to catch and reproduce. Happens once a day perhaps, sometimes once in three days at a remote site (~1h drive).

I did have hard faults all over due to lack of ram while prototyping on Nano 33 IoT.
Have no such issue anymore on Nano RP2040, but sporadic nil pointers (I believe) do happen still.

aykevl · 2021-12-11T01:10:08Z

I believe runtime panics nowadays, like nil pointer. Mostly from network stack, I recon, wifinina driver, http client, all that.
These kind of bugs is hard to catch and reproduce.

Hmm, yeah, I see.
I think it would be a good thing to show the PC of the panic. With that, you could map that PC back to a source location and maybe get an idea where it is coming from. You can already kind of do this but it requires an attached debugger (with a breakpoint set at runtime.abort or similar), which can be more inconvenient than a serial cable.

I did have hard faults all over due to lack of ram while prototyping on Nano 33 IoT.
Have no such issue anymore on Nano RP2040, but sporadic nil pointers (I believe) do happen still.

Lack of RAM should not result in a hard fault! It should result in an out of memory panic. However, a stack overflow might not be immediately obvious (it's checked at every blocking operation, not right away) and result in weird behavior.

ysoldak · 2021-12-12T22:09:22Z

You can already kind of do this but it requires an attached debugger

Not really viable. Would need to have the debugger attached to the board for days and even then catching the error requires some luck, chances are it be non-reproducible in the lab at all...
Out in the wild, though, any kind of disturbances may occur, heavy machinery around, not stable network out there, anything.
The correct and sound approach would be of course to try and reproduce the bug[s], but we have no resources for that just yet, still kind of in a prototype stage.
And rebooting on any unexpected state is totally OK, end devices ("sensors") are kind of dumb and don't have any state anyway, nothing to lose.
Losing the device due to panic and hanging is the worst that could happen in our case.

aykevl · 2021-12-17T11:06:56Z

Hmm, yeah maybe we should change the default from hanging on panic to rebooting. But that's a different issue :)

But what do you think of my -panic=unwind proposal? It's a bit less flexible than -unwind= but perhaps easier to understand.

ysoldak · 2021-12-17T11:45:56Z

Hmm, yeah maybe we should change the default from hanging on panic to rebooting. But that's a different issue :)

I can make a separate ticket to keep things organised.

But what do you think of my -panic=unwind proposal? It's a bit less flexible than -unwind= but perhaps easier to understand.

It's an improvement, for sure.
I can see a potential for confusion here though: both trap and print apply to all panics, while unwind would not apply to runtime panics, like nil dereference, correct? This is not directly obvious.

aykevl · 2021-12-28T12:36:33Z

I can see a potential for confusion here though: both trap and print apply to all panics, while unwind would not apply to runtime panics, like nil dereference, correct? This is not directly obvious.

Good point. At the moment, yes. But eventually, all panics (including nil pointer dereference panics) should be recoverable to match upstream Go. (But note that a HardFault is not a panic - at least not currently).

ysoldak · 2021-12-28T14:09:58Z

What's HardFault then? I'm kind of vague in terminology.
It is not a compile-time or runtime panic, but when does it happen? Have an example?

niaow · 2021-12-28T15:48:35Z

A panic is an error produced by Go code, and is sometimes recoverable.
A hard fault is an error produced by the hardware, and generally means that something has gone horribly wrong (undefined instruction, attempting to execute an invalid interrupt, etc.).

The upstream Go implementation does have runtime/debug.SetPanicOnFault to handle memory faults, but that is only to work around mmap and it does not work consistently (and I don't think we realistically could/should support it). (and that is only page faults, which arent a type of fault which usually exists on microcontrollers)

dkegel-fastly · 2022-01-17T18:50:19Z

I guess this is the subject of #891

ysoldak · 2022-06-01T21:00:22Z

So, how do we use this? What flag is implemented with what values? -unwind? -panic?
Shall not forget to document.

aykevl · 2022-06-02T11:11:29Z

@ysoldak there is no flag, if it is supported on a given platform it is enabled. I might add a flag in the future to control it.
I hope to get recover supported on currently unsupported architectures in separate PRs.

deadprogram · 2022-06-11T08:15:56Z

@aykevl looks like possibly legit test failure here: https://github.com/tinygo-org/tinygo/runs/6834101399?check_suite_focus=true#step:19:51

deadprogram · 2022-06-14T15:55:43Z

Any chance to look into this @aykevl seems like so close!

aykevl · 2022-06-14T18:53:05Z

I have reverted the suggestion from @niaow to see whether that fixes the issue. I can't see how but if it does, it at least reduces the scope a lot to search for this issue.

EDIT: it sadly does not.
I seem to remember however that it did pass before, I wonder whether anything changed since then?

deadprogram · 2022-06-15T08:28:52Z

I seem to remember however that it did pass before, I wonder whether anything changed since then?

It did pass before. Not sure what happened in the interim...

aykevl · 2022-06-15T13:12:11Z

Converted to draft while I'm trying to figure out what's going on...

aykevl · 2022-06-15T14:22:50Z

Mystery solved. The problem only happens in CI because GitHub Actions merges the branch with the dev branch before testing. Rebasing the recover3 branch on top of the dev branch locally results in the same issue locally.

Now I just need to figure out what changed recently in the dev branch to cause this issue...

aykevl · 2022-06-15T14:54:10Z

Found the offending commit: #2879
This is going to be fun...

aykevl · 2022-06-15T14:59:47Z

Interestingly #2884 also fails with a weird error in archive/zip, this could be related.

All the globals are between _etext and _end. We were scanning only between _edata and _end, which mainly consists of the .bss section. Scanning from _etext ensures that the .data section is also included. This bug didn't result in issues in CI, but did result in a bug in the recover branch: #2331. This patch fixes this bug.

aykevl · 2022-06-15T16:31:01Z

Found it! It's a very small bug with big implications: #2909

Previously we used to scan between _edata and _end. This is not correct: the .data section starts *before* _edata. Fixing this would mean changing _edata to _etext, but that didn't quite work either. It appears that there are inaccessible pages between _etext and _end on ARM. Therefore, a different solution was needed. What I've implemented is similar to Windows and MacOS: namely, finding writable segments by parsing the program header of the currently running program. It's a lot more verbose, but it should be correct on all architectures. It probably also reduces the globals to scan to those that _really_ need to be scanned. This bug didn't result in issues in CI, but did result in a bug in the recover branch: #2331. This patch fixes this bug.

llvm.AddBasicBlock should never be used. Instead, we should use the AddBasicBlock method of the current LLVM context. This didn't lead to any bugs... yet. But probably would, eventually.

For example, this commit moves the 'throw' branch of an assertion (nil check, slice index check, etc) to the end of the function while inserting the "continue" branch right after the insert location. This makes the resulting IR easier to follow. For some reason, this also reduces code size a bit on average. The TinyGo smoke tests saw a reduction of 0.22%, mainly from WebAssembly. The drivers repo saw little average change in code size (-0.01%). This commit also adds a few compiler tests for the defer keyword.

deadprogram · 2022-06-16T05:59:15Z

At last! Great work @aykevl and thank you everyone who helped get this PR completed.

Now merging.

Previously we used to scan between _edata and _end. This is not correct: the .data section starts *before* _edata. Fixing this would mean changing _edata to _etext, but that didn't quite work either. It appears that there are inaccessible pages between _etext and _end on ARM. Therefore, a different solution was needed. What I've implemented is similar to Windows and MacOS: namely, finding writable segments by parsing the program header of the currently running program. It's a lot more verbose, but it should be correct on all architectures. It probably also reduces the globals to scan to those that _really_ need to be scanned. This bug didn't result in issues in CI, but did result in a bug in the recover branch: #2331. This patch fixes this bug.

aykevl force-pushed the recover3 branch from 3a13497 to dc0467f Compare November 29, 2021 21:01

niaow reviewed Nov 29, 2021

View reviewed changes

aykevl force-pushed the recover3 branch from dc0467f to 1e569e8 Compare December 9, 2021 19:39

aykevl force-pushed the recover3 branch from 1e569e8 to cf5d5f8 Compare December 9, 2021 23:16

niaow reviewed Dec 10, 2021

View reviewed changes

aykevl force-pushed the recover3 branch 2 times, most recently from 3cb1da9 to 95011c6 Compare December 10, 2021 14:17

dgryski mentioned this pull request Dec 12, 2021

Improving TinyGo support hexops/vecty#269

Open

5 tasks

deadprogram mentioned this pull request Jan 7, 2022

arm: Explicitly disable unwind tables #2482

Merged

aykevl force-pushed the recover3 branch from 95011c6 to 69727ed Compare March 3, 2022 18:19

aykevl marked this pull request as ready for review March 3, 2022 18:21

aykevl force-pushed the recover3 branch from 6775814 to abb0b46 Compare June 10, 2022 14:53

aykevl force-pushed the recover3 branch from abb0b46 to 6775814 Compare June 14, 2022 18:51

aykevl marked this pull request as draft June 15, 2022 13:11

aykevl force-pushed the recover3 branch from 0ae8a52 to 5c874fe Compare June 15, 2022 13:27

aykevl mentioned this pull request Jun 15, 2022

runtime: include .data section in globals scan #2909

Closed

aykevl force-pushed the recover3 branch from 5c874fe to 7db2c4c Compare June 15, 2022 16:39

aykevl marked this pull request as ready for review June 15, 2022 16:40

aykevl added 4 commits June 16, 2022 01:07

compiler: fix basic block context

405dd87

llvm.AddBasicBlock should never be used. Instead, we should use the AddBasicBlock method of the current LLVM context. This didn't lead to any bugs... yet. But probably would, eventually.

compiler: implement recover() built-in function

5b3785c

aykevl force-pushed the recover3 branch from 7db2c4c to 5b3785c Compare June 15, 2022 23:24

deadprogram merged commit 8d6b210 into dev Jun 16, 2022

deadprogram deleted the recover3 branch June 16, 2022 05:59

Vilsol mentioned this pull request Jun 16, 2022

Implement recover for wasm architecture #2914

Open

aykevl mentioned this pull request Jun 17, 2022

avr: add support for recover() #2915

Merged

Add support for the recover() builtin function #2331

Add support for the recover() builtin function #2331

Conversation

aykevl commented Nov 29, 2021

niaow Nov 29, 2021

Choose a reason for hiding this comment

aykevl Nov 29, 2021 • edited Loading

Choose a reason for hiding this comment

dgryski Nov 29, 2021

Choose a reason for hiding this comment

niaow Dec 6, 2021

Choose a reason for hiding this comment

aykevl Dec 9, 2021

Choose a reason for hiding this comment

aykevl Dec 9, 2021

Choose a reason for hiding this comment

aykevl Dec 9, 2021 • edited Loading

Choose a reason for hiding this comment

niaow Dec 10, 2021 • edited Loading

Choose a reason for hiding this comment

aykevl Dec 10, 2021 • edited Loading

Choose a reason for hiding this comment

ysoldak commented Dec 7, 2021

aykevl commented Dec 9, 2021

ysoldak commented Dec 9, 2021

aykevl commented Dec 10, 2021

niaow Dec 10, 2021

Choose a reason for hiding this comment

aykevl Dec 10, 2021

Choose a reason for hiding this comment

ysoldak commented Dec 10, 2021

aykevl commented Dec 10, 2021 • edited Loading

ysoldak commented Dec 10, 2021 • edited Loading

aykevl commented Dec 10, 2021

ysoldak commented Dec 10, 2021

aykevl commented Dec 10, 2021

ysoldak commented Dec 10, 2021

aykevl commented Dec 11, 2021

ysoldak commented Dec 12, 2021 • edited Loading

aykevl commented Dec 17, 2021

ysoldak commented Dec 17, 2021 • edited Loading

aykevl commented Dec 28, 2021

ysoldak commented Dec 28, 2021

niaow commented Dec 28, 2021 • edited Loading

dkegel-fastly commented Jan 17, 2022

ysoldak commented Jun 1, 2022

aykevl commented Jun 2, 2022

deadprogram commented Jun 11, 2022

deadprogram commented Jun 14, 2022

aykevl commented Jun 14, 2022 • edited Loading

deadprogram commented Jun 15, 2022

aykevl commented Jun 15, 2022

aykevl commented Jun 15, 2022

aykevl commented Jun 15, 2022

aykevl commented Jun 15, 2022

aykevl commented Jun 15, 2022

deadprogram commented Jun 16, 2022

Add support for the `recover()` builtin function #2331

Add support for the `recover()` builtin function #2331

aykevl Nov 29, 2021 •

edited

Loading

aykevl Dec 9, 2021 •

edited

Loading

niaow Dec 10, 2021 •

edited

Loading

aykevl Dec 10, 2021 •

edited

Loading

aykevl commented Dec 10, 2021 •

edited

Loading

ysoldak commented Dec 10, 2021 •

edited

Loading

ysoldak commented Dec 12, 2021 •

edited

Loading

ysoldak commented Dec 17, 2021 •

edited

Loading

niaow commented Dec 28, 2021 •

edited

Loading

aykevl commented Jun 14, 2022 •

edited

Loading