Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the recover() builtin function #2331

Merged
merged 4 commits into from
Jun 16, 2022
Merged

Add support for the recover() builtin function #2331

merged 4 commits into from
Jun 16, 2022

Conversation

aykevl
Copy link
Member

@aykevl aykevl commented Nov 29, 2021

Not all architectures are supported yet. In particular, the current design doesn't support WebAssembly. Other architectures (avr, xtensa, riscv64) can be supported but are left as a TODO.

This PR does result in an increase in code size if either defer or panic is used. There is no way to turn this feature off with this PR, but I don't think we should do that for the following reasons:

  • The code size increase only happens when defer or panic is used (especially defer). If you really care about code size, you may want to avoid those.
  • Having another knob to configure the compiler is a maintenance burden. We already have quite a few (-gc, -scheduler).

To give an idea of the code size change, here is the difference in binary size before and after the last commit of this PR:

before  after   diff
 10796  10796      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/adxl345/main.go
 13708  13708      0  0.00%  tinygo build -size short -o ./build/test.hex -target=pybadge ./examples/amg88xx
  9088   9088      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/apa102/main.go
 10108  10108      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/apa102/itsybitsy-m0/main.go
  6524   6524      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/at24cx/main.go
 10516  10516      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/bh1750/main.go
  9672   9672      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/blinkm/main.go
 13772  13772      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/bmp180/main.go
 12380  12380      0  0.00%  tinygo build -size short -o ./build/test.hex -target=trinket-m0 ./examples/bmp388/main.go
  6724   6724      0  0.00%  tinygo build -size short -o ./build/test.hex -target=bluepill ./examples/ds1307/sram/main.go
  4396   4396      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/easystepper/main.go
 11096  11096      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/hcsr04/main.go
  5596   5596      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/hd44780/customchar/main.go
  5580   5580      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/hd44780/text/main.go
 11360  11360      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/hd44780i2c/main.go
 15640  15640      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/hub75/main.go
 10780  10780      0  0.00%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/ili9341/basic
 11804  11804      0  0.00%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/ili9341/basic
 29628  29628      0  0.00%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/ili9341/pyportal_boing
 10832  10832      0  0.00%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/ili9341/scroll
 11884  11884      0  0.00%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/ili9341/scroll
 12752  12752      0  0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/lis3dh/main.go
 15304  15304      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/lsm303agr/main.go
 12780  12780      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/lsm6ds3/main.go
 13036  13036      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mag3110/main.go
 10500  10500      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mcp3008/main.go
 10540  10540      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mma8653/main.go
 10388  10388      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mpu6050/main.go
  4444   4444      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/pcd8544/setbuffer/main.go
  4452   4452      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/pcd8544/setpixel/main.go
  2468   2468      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino ./examples/servo
  8732   8732      0  0.00%  tinygo build -size short -o ./build/test.hex -target=pybadge ./examples/shifter/main.go
  5072   5072      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/ssd1306/i2c_128x32/main.go
  5332   5332      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/ssd1306/spi_128x64/main.go
  5116   5116      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/ssd1331/main.go
  5696   5696      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/st7735/main.go
  5448   5448      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/st7789/main.go
 14160  14160      0  0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/thermistor/main.go
  8088   8088      0  0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-bluefruit ./examples/tone
 10592  10592      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/tm1637/main.go
 17064  17064      0  0.00%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/vl53l1x/main.go
  5828   5828      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/waveshare-epd/epd2in13/main.go
  5364   5364      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/waveshare-epd/epd2in13x/main.go
  5684   5684      0  0.00%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/waveshare-epd/epd4in2/main.go
  8692   8692      0  0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/ws2812
  1420   1420      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino   ./examples/ws2812
   720    720      0  0.00%  tinygo build -size short -o ./build/test.hex -target=digispark ./examples/ws2812
 24788  24788      0  0.00%  tinygo build -size short -o ./build/test.hex -target=trinket-m0 ./examples/bme280/main.go
 13528  13528      0  0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/microphone/main.go
 13124  13124      0  0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/buzzer/main.go
 15136  15136      0  0.00%  tinygo build -size short -o ./build/test.hex -target=trinket-m0 ./examples/veml6070/main.go
  8664   8664      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/l293x/simple/main.go
 10072  10072      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/l293x/speed/main.go
  8632   8632      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/l9110x/simple/main.go
 10104  10104      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/l9110x/speed/main.go
 15264  15264      0  0.00%  tinygo build -size short -o ./build/test.hex -target=circuitplay-express ./examples/lis2mdl/main.go
 10512  10512      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/max72xx/main.go
  6094   6094      0  0.00%  tinygo build -size short -o ./build/test.hex -target=arduino ./examples/keypad4x4/main.go
  8952   8952      0  0.00%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/pcf8563/clkout/
  9976   9976      0  0.00%  tinygo build -size short -o ./build/test.hex -target=feather-m0 ./examples/ina260/main.go
  8656   8656      0  0.00%  tinygo build -size short -o ./build/test.hex -target=nucleo-l432kc ./examples/aht20/main.go
  7492   7496      4  0.05%  tinygo build -size short -o ./build/test.hex -target=hifive1b ./examples/ssd1351/main.go
 21300  21332     32  0.15%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/espat/espconsole/main.go
 22460  22492     32  0.14%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/espat/esphub/main.go
 22492  22524     32  0.14%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/espat/espstation/main.go
 10912  10952     40  0.37%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mcp23017/main.go
 11356  11396     40  0.35%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mcp23017-multiple/main.go
  6312   6352     40  0.63%  tinygo build -size short -o ./build/test.hex -target=nucleo-f103rb ./examples/shiftregister/main.go
 10016  10072     56  0.56%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/touch/resistive/fourwire/main.go
 12824  12880     56  0.44%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/touch/resistive/pyportal_touchpaint/main.go
  6124   6184     60  0.98%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/microbitmatrix/main.go
 15844  15992    148  0.93%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/wifinina/webclient/main.go
 53660  54460    800  1.49%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/bmi160/main.go
 58404  59316    912  1.56%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/flash/console/spi
 56408  57336    928  1.65%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/flash/console/qspi
 61000  61976    976  1.60%  tinygo build -size short -o ./build/test.hex -target=feather-m4 ./examples/sdcard/console/
247992 249480   1488  0.60%  tinygo build -size short -o ./build/test.hex -target=pyportal ./examples/ili9341/slideshow
 20004  21596   1592  7.96%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/wifinina/udpstation/main.go
 61028  62724   1696  2.78%  tinygo build -size short -o ./build/test.hex -target=p1am-100 ./examples/p1am/main.go
 52840  54748   1908  3.61%  tinygo build -size short -o ./build/test.hex -target=feather-m0 ./examples/gps/i2c/main.go
 52560  54468   1908  3.63%  tinygo build -size short -o ./build/test.hex -target=feather-m0 ./examples/gps/uart/main.go
 15196  17328   2132 14.03%  tinygo build -size short -o ./build/test.hex -target=bluepill ./examples/ds1307/time/main.go
 29552  31752   2200  7.44%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/wifinina/ntpclient/main.go
 54996  57332   2336  4.25%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/mcp2515/main.go
 51460  53876   2416  4.69%  tinygo build -size short -o ./build/test.hex -target=arduino-nano33 ./examples/wifinina/tcpclient/main.go
 61624  64300   2676  4.34%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/pcf8563/alarm/
 60232  62924   2692  4.47%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/pcf8563/time/
 61528  64220   2692  4.38%  tinygo build -size short -o ./build/test.hex -target=xiao ./examples/pcf8563/timer/
 49948  52684   2736  5.48%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/adt7410/main.go
 52092  54828   2736  5.25%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/bmp280/main.go
 44932  47668   2736  6.09%  tinygo build -size short -o ./build/test.hex -target=microbit ./examples/sht3x/main.go
 63672  67104   3432  5.39%  tinygo build -size short -o ./build/test.hex -target=feather-m0 ./examples/dht/main.go
139744 144244   4500  3.22%  tinygo build -size short -o ./build/test.hex -target=feather-m4 ./examples/sdcard/tinyfs/
 57976  62588   4612  7.96%  tinygo build -size short -o ./build/test.hex -target=itsybitsy-m0 ./examples/ds3231/main.go
 93948  99040   5092  5.42%  tinygo build -size short -o ./build/test.hex -target=wioterminal ./examples/rtl8720dn/mqttsub/
122832 128836   6004  4.89%  tinygo build -size short -o ./build/test.hex -target=wioterminal ./examples/rtl8720dn/webserver/
124396 130624   6228  5.01%  tinygo build -size short -o ./build/test.hex -target=wioterminal ./examples/rtl8720dn/webclient/
sum: 67968 (2.72%)

What you can see here:

  • The overall code size increase is 2.72%.
  • Many examples (about two thirds) don't change in binary size at all. Probably because they don't use defer or panic.
  • Examples that increased in binary size are usually large examples, >25kB or so. It looks like they import the fmt package, which seems to use defer and /recover.

// * The return value (eax, rax, r0, etc) is set to zero in the inline
// assembly but set to an unspecified non-zero value when jumping using
// a longjmp.
asmType := llvm.FunctionType(b.uintptrType, []llvm.Type{b.deferFrame.Type()}, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that this is safe? As is, I think the compiler is still permitted to assume that this returns once.

Copy link
Member Author

@aykevl aykevl Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried several alternatives, and this was the most reliable one so far.

I've tried:

  1. setjmp at the beginning of a function. LLVM is smart enough to optimize away some defer instructions later after a longjmp, so this is not usable.
  2. Using exception handling instructions together with blockaddress. Basically, act as if we're unwinding as usual with invoke and landingpad, but instead of actually unwinding, return back to the function by capturing the stack pointer at function entry and doing longjmp by jumping back to the basic block with the landingpad instruction with a captured blockaddress. Doesn't work because LLVM moves the landingpad instruction to a different basic block after which the entire system falls down.

The only other alternative is using setjmp directly, which would be much more expensive than it is already (because we need to do it before every function call that might panic).
I think this is safe because this is what LLVM says about returns_twice:

This attribute indicates that this function can return twice. The C setjmp is an example of such a function. The compiler disables some optimizations (like tail calls) in the caller of these functions.

Tail calls are not relevant to inline assembly. I'm not 100% convinced, but it appears to be mostly safe.
(Sidenote: I might investigate using the callbr instruction instead, which might be slightly less expensive by avoiding a compare and branch).

... actually, this gives me an idea. It might be possible to change this scheme to do the setjmp after every defer instruction, instead of before every call. Not entirely sure whether that's safe though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does gollvm support defer/recover? How does it convince llvm to do the Right Thing? ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like gollvm uses full C++ exceptions. I don't think that is practical here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might start using C++ exceptions on some platforms (Windows, Linux, MacOS) but wanted to start with something simple and mostly portable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niaow I'm afraid you are correct. It looks like the remaining issue is a tail call on RISC-V, which would probably have been prevented with a returns_twice. Now thinking what the best solution to this is...

Copy link
Member Author

@aykevl aykevl Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that was easy. Apparently it's possible to add returns_twice as a call site attribute, even to inline assembly.

        }
        asm := llvm.InlineAsm(asmType, asmString, constraints, false, false, 0)
        result := b.CreateCall(asm, []llvm.Value{b.deferFrame}, "setjmp")
+       result.AddCallSiteAttribute(-1, b.ctx.CreateEnumAttribute(llvm.AttributeKindID("returns_twice"), 0))
        isZero := b.CreateICmp(llvm.IntEQ, result, llvm.ConstInt(b.uintptrType, 0, false), "setjmp.result")
        continueBB := b.insertBasicBlock("")
        b.CreateCondBr(isZero, continueBB, b.landingpad)

This fixes the miscompilation on RISC-V.

Copy link
Member

@niaow niaow Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sidenote: I might investigate using the callbr instruction instead, which might be slightly less expensive by avoiding a compare and branch).

It seems like that might avoid more than just a compare and branch. I think something like this could theoretically work:

callbr void asm returns_twice "", "=~{rax},..."()
            to label %fallthrough [label %indirect]

Seems a bit cursed though.

Copy link
Member Author

@aykevl aykevl Dec 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the code does there? I see no assembly.
My biggest hope is that I can manage to avoid clobbering all registers for every function call, and instead can hopefully do that in the %lpad block. That will certainly improve code quality. But I'd like to do that later, first I'd like to get this PR actually working.

@ysoldak
Copy link
Contributor

ysoldak commented Dec 7, 2021

Thank you for working on this.

Verified, works for my use case: reset system on panic.
Board verified: Nano RP2040 Connect ("nano-rp2040" target)

Stats for example code below

   code  rodata    data     bss |   flash     ram | package
------------------------------- | --------------- | -------
   5029     663       4    2264 |    5696    2268 | total <-- w/o change (dev branch head)
   5524     668       4    2264 |    6196    2268 | total <-- w/  change (recover3 branch head)

Example code

//go:build cortexm
// +build cortexm

package main

import (
	"device/arm"
	"time"
)

// This example shows how to reset system on panic.
// In your application you may: reset, blink an LED to indicate failure, or do something else.

func main() {
	defer resetOnPanic()
	println("START")
	for i := 0; i < 5; i++ {
		println(".")
		time.Sleep(time.Second)
	}
	panic("AAA!!!111")
}

func resetOnPanic() {
	if r := recover(); r != nil {
		println("PANIC")
		time.Sleep(time.Second)
		arm.SystemReset()
	}
}

@aykevl
Copy link
Member Author

aykevl commented Dec 9, 2021

Rebased and fixed two bugs, hopefully the tests pass this time.

Verified, works for my use case: reset system on panic.
Board verified: Nano RP2040 Connect ("nano-rp2040" target)

Note that this PR is only for when the code explicitly calls panic. It doesn't cover runtime panics or HardFault_Handler.

@ysoldak
Copy link
Contributor

ysoldak commented Dec 9, 2021

Note that this PR is only for when the code explicitly calls panic. It doesn't cover runtime panics or HardFault_Handler.

Oh, I see. Means I can’t use it yet, but thanks for the clarification!

@aykevl
Copy link
Member Author

aykevl commented Dec 10, 2021

I've now also added a new flag: -unwind. It can be set to -unwind=none if you don't want to support recover or running deferred functions, but for supported platforms it defaults to -unwind=simple. With -unwind=none, nearly all binaries that I've tested remain at the same size so this can be useful for reduce binary size.
In the future, I might want to add different unwind strategies. For example WebAssembly exception handling, Linux libunwind, Windows SEH, etc.

Feel free to suggest different names/values for this flag :)

@@ -56,6 +56,7 @@ func TestCompiler(t *testing.T) {
{"float.go", "", ""},
{"interface.go", "", ""},
{"func.go", "", "coroutines"},
{"defer.go", "", ""},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe do this on a target that can actually run recover so that we know that the code gen is actually working?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

@aykevl aykevl force-pushed the recover3 branch 2 times, most recently from 3cb1da9 to 95011c6 Compare December 10, 2021 14:17
@ysoldak
Copy link
Contributor

ysoldak commented Dec 10, 2021

Feel free to suggest different names/values for this flag :)

For me, as an end user who knows little to nothing about compilers and tinygo internals, -defer=simple/none would be preferable. It clearly indicates that part of user-faced functionality is disabled.

Will defer at all work with "none" value, but "recover()" will be no-op? In that case probably -recover=simple/none would be more appropriate then...

@aykevl
Copy link
Member Author

aykevl commented Dec 10, 2021

With -unwind=none, no deferred functions will run with a panic. Returning a value from recover() is simple, running deferred functions on panic is hard.

I'm not sure about -defer=none though. The defer keyword will still work with -unwind=recover (as it always has), it just won't work on panic.

So in summary, before this PR and with -unwind=none:

  • Smaller binaries.
  • defer works, but only in normal cases. Not when panicking.
  • recover() always returns nil.

With -unwind=simple:

  • Slightly bigger binaries when using defer and the panic builtin.
  • defer runs when it is supposed to: both when a function returns and on panic.
  • recover() works almost according to the Go spec (there is one relatively small exception).

So in summary, the flag controls whether jumping back up the stack and running defers works when panic() is called.

@ysoldak
Copy link
Contributor

ysoldak commented Dec 10, 2021

I see.

Any benefit of making defer no-op, size-wise or else?

If so, I can imagine -defer flag with following values:

-defer=none <-- defer no-op, completely disabled
-defer=simple <-- as it it now, equals -unwind=none (alternatively "nopanic")
-defer=extended <-- default? equals -unwind=simple
-defer=full <-- in some future when we support runtime panics and hard faults, if that is possible of course

@aykevl
Copy link
Member Author

aykevl commented Dec 10, 2021

Any benefit of making defer no-op, size-wise or else?

Not much. I don't think we should do that anyway, because it diverges from the Go spec too much. If the code size increase is too much, then you shouldn't use defer.

-defer=full <-- in some future when we support runtime panics and hard faults, if that is possible of course

I would like to add support for recovering from most runtime panics, such as nil pointer dereference, slice out of bounds, etc.
I'm not so sure about hard faults. While it is technically possible to recover from them, I don't think this is something we should support. A hard fault only happens when something has gone terribly wrong, which means there is an unknown bug in the system.

My idea about the -unwind flag would be more like this:

  • -unwind=none: current behavior
  • -unwind=simple: this PR
  • -unwind=sjlj: (possible, probably unnecessary): use C setjmp and longjmp functions
  • -unwind=exception: use C++ exception mechanism, as appropriate for the target (SEH on Windows, DWARF unwinding on Linux and MacOS, WebAssembly exception handling proposal on WebAssembly, it can perhaps even be supported on MCUs using .ARM.exidx etc sections!). Probably larger code size, but is "zero-cost" (mostly) in the no-exception case.

But the only thing most people need to know is -unwind=none. The default should be chosen appropriately for the target. The -unwind=none flag can then be used

Alternative idea: add a possible value to the -panic= flag. Right now the -panic flag can have the values print (the default) and trap (don't print to reduce code size, instead trigger a SIGILL like signal). We could add unwind so that the default for most platforms is unwind, for some print, and people can change the default to print or trap depending on how much they use panics.

@ysoldak
Copy link
Contributor

ysoldak commented Dec 10, 2021

Add -panic=reset ? :)
You know my pain.

In microcontroller world, IoT specifically, it can be beneficial to have a possibility and make the tiny device restart on some unexpected state instead of hanging up.

Even big OSes restart, sometimes, on hard faults :D
Remember the times a faulty DRAM module would reboot my PC unexpectedly, haha.

@aykevl
Copy link
Member Author

aykevl commented Dec 10, 2021

Add -panic=reset ? :)
You know my pain.

What kind of issue were you hitting, a panic or a hard fault? Those two are similar but different.

@ysoldak
Copy link
Contributor

ysoldak commented Dec 10, 2021

What kind of issue were you hitting, a panic or a hard fault? Those two are similar but different.

I believe runtime panics nowadays, like nil pointer. Mostly from network stack, I recon, wifinina driver, http client, all that.
These kind of bugs is hard to catch and reproduce. Happens once a day perhaps, sometimes once in three days at a remote site (~1h drive).

I did have hard faults all over due to lack of ram while prototyping on Nano 33 IoT.
Have no such issue anymore on Nano RP2040, but sporadic nil pointers (I believe) do happen still.

@aykevl
Copy link
Member Author

aykevl commented Dec 11, 2021

I believe runtime panics nowadays, like nil pointer. Mostly from network stack, I recon, wifinina driver, http client, all that.
These kind of bugs is hard to catch and reproduce.

Hmm, yeah, I see.
I think it would be a good thing to show the PC of the panic. With that, you could map that PC back to a source location and maybe get an idea where it is coming from. You can already kind of do this but it requires an attached debugger (with a breakpoint set at runtime.abort or similar), which can be more inconvenient than a serial cable.

I did have hard faults all over due to lack of ram while prototyping on Nano 33 IoT.
Have no such issue anymore on Nano RP2040, but sporadic nil pointers (I believe) do happen still.

Lack of RAM should not result in a hard fault! It should result in an out of memory panic. However, a stack overflow might not be immediately obvious (it's checked at every blocking operation, not right away) and result in weird behavior.

@ysoldak
Copy link
Contributor

ysoldak commented Dec 12, 2021

You can already kind of do this but it requires an attached debugger

Not really viable. Would need to have the debugger attached to the board for days and even then catching the error requires some luck, chances are it be non-reproducible in the lab at all...
Out in the wild, though, any kind of disturbances may occur, heavy machinery around, not stable network out there, anything.
The correct and sound approach would be of course to try and reproduce the bug[s], but we have no resources for that just yet, still kind of in a prototype stage.
And rebooting on any unexpected state is totally OK, end devices ("sensors") are kind of dumb and don't have any state anyway, nothing to lose.
Losing the device due to panic and hanging is the worst that could happen in our case.

@aykevl
Copy link
Member Author

aykevl commented Dec 17, 2021

Hmm, yeah maybe we should change the default from hanging on panic to rebooting. But that's a different issue :)

But what do you think of my -panic=unwind proposal? It's a bit less flexible than -unwind= but perhaps easier to understand.

@ysoldak
Copy link
Contributor

ysoldak commented Dec 17, 2021

Hmm, yeah maybe we should change the default from hanging on panic to rebooting. But that's a different issue :)

I can make a separate ticket to keep things organised.

But what do you think of my -panic=unwind proposal? It's a bit less flexible than -unwind= but perhaps easier to understand.

It's an improvement, for sure.
I can see a potential for confusion here though: both trap and print apply to all panics, while unwind would not apply to runtime panics, like nil dereference, correct? This is not directly obvious.

@aykevl
Copy link
Member Author

aykevl commented Dec 28, 2021

I can see a potential for confusion here though: both trap and print apply to all panics, while unwind would not apply to runtime panics, like nil dereference, correct? This is not directly obvious.

Good point. At the moment, yes. But eventually, all panics (including nil pointer dereference panics) should be recoverable to match upstream Go. (But note that a HardFault is not a panic - at least not currently).

@ysoldak
Copy link
Contributor

ysoldak commented Dec 28, 2021

What's HardFault then? I'm kind of vague in terminology.
It is not a compile-time or runtime panic, but when does it happen? Have an example?

@niaow
Copy link
Member

niaow commented Dec 28, 2021

A panic is an error produced by Go code, and is sometimes recoverable.
A hard fault is an error produced by the hardware, and generally means that something has gone horribly wrong (undefined instruction, attempting to execute an invalid interrupt, etc.).

The upstream Go implementation does have runtime/debug.SetPanicOnFault to handle memory faults, but that is only to work around mmap and it does not work consistently (and I don't think we realistically could/should support it). (and that is only page faults, which arent a type of fault which usually exists on microcontrollers)

@dkegel-fastly
Copy link
Contributor

I guess this is the subject of #891

@aykevl aykevl marked this pull request as ready for review March 3, 2022 18:21
@ysoldak
Copy link
Contributor

ysoldak commented Jun 1, 2022

So, how do we use this? What flag is implemented with what values? -unwind? -panic?
Shall not forget to document.

@aykevl
Copy link
Member Author

aykevl commented Jun 2, 2022

@ysoldak there is no flag, if it is supported on a given platform it is enabled. I might add a flag in the future to control it.
I hope to get recover supported on currently unsupported architectures in separate PRs.

@deadprogram
Copy link
Member

@deadprogram
Copy link
Member

Any chance to look into this @aykevl seems like so close!

@aykevl
Copy link
Member Author

aykevl commented Jun 14, 2022

I have reverted the suggestion from @niaow to see whether that fixes the issue. I can't see how but if it does, it at least reduces the scope a lot to search for this issue.

EDIT: it sadly does not.
I seem to remember however that it did pass before, I wonder whether anything changed since then?

@deadprogram
Copy link
Member

I seem to remember however that it did pass before, I wonder whether anything changed since then?

It did pass before. Not sure what happened in the interim...

@aykevl aykevl marked this pull request as draft June 15, 2022 13:11
@aykevl
Copy link
Member Author

aykevl commented Jun 15, 2022

Converted to draft while I'm trying to figure out what's going on...

@aykevl
Copy link
Member Author

aykevl commented Jun 15, 2022

Mystery solved. The problem only happens in CI because GitHub Actions merges the branch with the dev branch before testing. Rebasing the recover3 branch on top of the dev branch locally results in the same issue locally.

Now I just need to figure out what changed recently in the dev branch to cause this issue...

@aykevl
Copy link
Member Author

aykevl commented Jun 15, 2022

Found the offending commit: #2879
This is going to be fun...

@aykevl
Copy link
Member Author

aykevl commented Jun 15, 2022

Interestingly #2884 also fails with a weird error in archive/zip, this could be related.

aykevl added a commit that referenced this pull request Jun 15, 2022
All the globals are between _etext and _end. We were scanning only
between _edata and _end, which mainly consists of the .bss section.
Scanning from _etext ensures that the .data section is also included.

This bug didn't result in issues in CI, but did result in a bug in the
recover branch: #2331. This
patch fixes this bug.
@aykevl
Copy link
Member Author

aykevl commented Jun 15, 2022

Found it! It's a very small bug with big implications: #2909

@aykevl aykevl marked this pull request as ready for review June 15, 2022 16:40
Previously we used to scan between _edata and _end. This is not correct:
the .data section starts *before* _edata.

Fixing this would mean changing _edata to _etext, but that didn't quite
work either. It appears that there are inaccessible pages between _etext
and _end on ARM. Therefore, a different solution was needed.

What I've implemented is similar to Windows and MacOS: namely, finding
writable segments by parsing the program header of the currently running
program. It's a lot more verbose, but it should be correct on all
architectures. It probably also reduces the globals to scan to those
that _really_ need to be scanned.

This bug didn't result in issues in CI, but did result in a bug in the
recover branch: #2331. This
patch fixes this bug.
llvm.AddBasicBlock should never be used. Instead, we should use the
AddBasicBlock method of the current LLVM context.

This didn't lead to any bugs... yet. But probably would, eventually.
For example, this commit moves the 'throw' branch of an assertion (nil
check, slice index check, etc) to the end of the function while
inserting the "continue" branch right after the insert location. This
makes the resulting IR easier to follow.

For some reason, this also reduces code size a bit on average. The
TinyGo smoke tests saw a reduction of 0.22%, mainly from WebAssembly.
The drivers repo saw little average change in code size (-0.01%).

This commit also adds a few compiler tests for the defer keyword.
@deadprogram
Copy link
Member

At last! Great work @aykevl and thank you everyone who helped get this PR completed.

Now merging.

@deadprogram deadprogram merged commit 8d6b210 into dev Jun 16, 2022
@deadprogram deadprogram deleted the recover3 branch June 16, 2022 05:59
deadprogram pushed a commit that referenced this pull request Jun 16, 2022
Previously we used to scan between _edata and _end. This is not correct:
the .data section starts *before* _edata.

Fixing this would mean changing _edata to _etext, but that didn't quite
work either. It appears that there are inaccessible pages between _etext
and _end on ARM. Therefore, a different solution was needed.

What I've implemented is similar to Windows and MacOS: namely, finding
writable segments by parsing the program header of the currently running
program. It's a lot more verbose, but it should be correct on all
architectures. It probably also reduces the globals to scan to those
that _really_ need to be scanned.

This bug didn't result in issues in CI, but did result in a bug in the
recover branch: #2331. This
patch fixes this bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants