Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Stack overflow silently corrupts data and vice versa #255

Open
zetanumbers opened this issue Dec 28, 2021 · 22 comments
Open

[BUG] Stack overflow silently corrupts data and vice versa #255

zetanumbers opened this issue Dec 28, 2021 · 22 comments

Comments

@zetanumbers
Copy link
Contributor

zetanumbers commented Dec 28, 2021

WARNING: this bug is relevant for every language!

To detect stack overflow rustc places stack before anything else (see explanation).
As you may know, stack grows backwards. Therefore to detect stack overflow it is basically placed at the top of the module's memory, and if stack pointer underflows and then is dereferenced, it throws an exception. Because of the memory map being at the top, it is not possible to detect stack overflow and it allows for stack overflow to silently corrupt some data on its way or vice versa.

See solutions: #255 (comment)

I want to fix this bug, but we all need to agree on a solution.

@joshgoebel
Copy link
Contributor

I missed this whole conversation, but we pass in the stack size at compile time, do we not? (we do for other languages)... can't that just be compared against the stack pointer to find out if it's overflows? (not as nice as a 0 comparison for sure, but still)...

@zetanumbers
Copy link
Contributor Author

zetanumbers commented Dec 28, 2021

I missed this whole conversation, but we pass in the stack size at compile time, do we not? (we do for other languages)... can't that just be compared against the stack pointer to find out if it's overflows? (not as nice as a 0 comparison for sure, but still)...

Try to convince the compiler to insert these checks. Otherwise WASM-4 uses only one page for the whole RAM, so no fancy paging tricks there (if they even exist).

EDIT: also there's no reliable way i think to get stack pointer value from the runtime.

@joshgoebel
Copy link
Contributor

Couldn't you also do this with a sentinel tag at the end of the stack area? And just detect if it's overwritten each frame (it should never be touched)? You'd have to add this code manually, but if Rust isn't going to help us...

@zetanumbers
Copy link
Contributor Author

Couldn't you also do this with a sentinel tag at the end of the stack area? And just detect if it's overwritten each frame (it should never be touched)? You'd have to add this code manually, but if Rust isn't going to help us...

What if it is overwritten by the same value? Also by the end of the update call, the stack would shrink back.

@zetanumbers
Copy link
Contributor Author

The only way to fix this bug in backwards is to add ability to move memory map somewhere else.

We could move these memory maps to the end of the RAM, but it would break ABI, which may or may not be ok.

@zetanumbers
Copy link
Contributor Author

zetanumbers commented Dec 28, 2021

Proposed solutions

Export memMap global

Core idea

Export memMap (name subject to change) global of I32 type from a cartridge. It would represent a pointer to the statically allocated memory map by the compiler/linker.

Pros

  • Backwards compatible;
  • Mostly invisible to the user, so hard to mess up;
  • Can improve implementation's quality over time;
  • Memory map is valid even before start callback call;
  • Memory map position is a build detail, like stack position and size.

Cons

  • Hard to implement;
  • Old unsupported runtimes do not fail by itself (ABI versioning can help);
  • Implicit.

Implementation details

As far as i am aware, not every language supports generating a wasm export global. So we have to use different approach.

Almost every currently supported language uses WASI SDK or wasm-ld. I believe for these languages it is possible to link an object file. Wasm object files are files with LLVM IR bitcode. My idea is to write a source for object file in LLVM language, which would statically allocate memory map and export its base pointer as a wasm export global, then compile it using llvm-as or maybe some llvm library?

Ideally instead of embedding or compiling it in w4 tool, we can create WASM4_SDK instead of relying on WASI_SDK.

AssemblyScript is an exception as it uses Binaryen IR. Workaround is to explicitly declare memory map in the entry file of the template like this:

// Do not delete this
export const memMap = w4.MEM_MAP;

Import setMemMap function

Core idea

Add setMemMap function to the WASM-4 API, which changes memory map base pointer on the call.

Pros

  • Backwards compatible;
  • Easy to implement;
  • Old unsupported runtimes fail, due to undefined import function;
  • Explicit.

Cons

  • Mixes runtime and build:
    • For example in c, stack position and size in the Makefile assumes memory map is moved, so if user deletes the setMemMap call, cartridge corrupts pretty fast;
    • // Do not delete comments could help?
  • Runtime should assume the memory map base pointer is unknown before start callback call. This is a recommendation.

Implementation details

setMemMap functionality is similar to CR3 register on x86. Templates would call it in the start function.

start return type

Core idea

Handle start function returning an I32

Pros

  • Backwards compatible;
  • Easy to implement;
  • Return type for the template's start function may be a pointer or some other type with a same layout, which may help user to not mess things up by deleting a return value.
  • Explicit.

Cons

  • Some old unsupported runtimes may ignore start callback return value;
  • Runtime should assume the memory map base pointer is unknown before start callback returns. This is a recommendation;
  • start callback may have different function signatures.

Implementation details

Similar to setMemMap function.

Move memory map after stack and before global base

Core idea

Memory map base is now after the stack and before global base.

Pros

  • Easy to implement;
  • Simple.

Cons

  • NOT backwards compatible;
  • Stack size has to be the same everywhere;
  • Old unsupported runtimes do not fail by itself (ABI versioning can help);
  • May be impossible for some languages;
  • This solution assumes anything other than the stack comes after --global-base.

Implementation details

Stack comes first in RAM (--stack-first). Then it is memory map. And then we align global data with --global-base.

@joshgoebel
Copy link
Contributor

joshgoebel commented Dec 28, 2021

What if it is overwritten by the same value?

VERY, VERY unlikely, you'd use a sequence of bytes... 8 or 16 should be safe enough and make this as unlikely a monkey typing shakespeare... and this could be done easily without ANY special runtime support from Rust at all. Just a tiny call added to the end of update...

Also by the end of the update call, the stack would shrink back.

Doesn't matter, the sentinel value would have already been destroyed and that's what we're detecting, not the size of the stack at the moment, but rather the RESULTS of it having overflowed just a bit ago...

I'm not inventing this technique, it's a thing that's real and been used for exactly this type of thing in the past - sorry if I don't have a name for the concept off the top of my head.

@joshgoebel
Copy link
Contributor

joshgoebel commented Dec 28, 2021

Move memory map to the end of the memory

Please explain exactly what you are moving when you say "memory map"... do you mean the entire documented registers and VRAM, etc? That sounds like a no go just to make one language happy that's treating their start differently than everyone else.

@zetanumbers
Copy link
Contributor Author

Please explain exactly what you are moving when you say "memory map"... do you mean the entire documented registers and VRAM, etc?

The whole thing: https://wasm4.org/docs/reference/memory

@joshgoebel
Copy link
Contributor

IMHO: This is simply something our "hardware" doesn't allow because of our hard-wired RAM layout. I personally don't think we should change WASM-4 overall (or add complexity) to accommodate a single compiler option from a single language. (even IF it is an agreeably useful feature)

If it was to be considered at all it should be only after having exhausted all simpler options (that are already possible, and do not requires any changes to WASM-4).

As I pointed out there are already simple ways to do this easily enough (just like you'd do it in any of the other languages we support)... just nothing you get "for free"... of course it's not truly "for free" at all if it requires a lot of changes to the WASM-4 "hardware" just to support it...

@zetanumbers
Copy link
Contributor Author

zetanumbers commented Dec 28, 2021

@joshgoebel your fix is still bad. It's not always reliable and detects stack overflows only at the end of the update call. It makes harder to trace down a bug, because it runs until update call finishes and does not abort on the exact moment stack overflow happens. It may even enter a loop for eternity.

@zetanumbers
Copy link
Contributor Author

zetanumbers commented Dec 28, 2021

a single compiler option from a single language

@joshgoebel I have added a warning in the top comment for you.

@joshgoebel
Copy link
Contributor

It may even enter a loop eternity.

How?

WARNING: this bug is relevant for every language!

True, but the fix is currently only relevant for Rust as I understand it - correct me if I'm mistaken. ?

your fix is still bad.

It was put forth as a suggestion that does not require changing the entire WASM-4 architecture. So you have to judge in inside of that context, not in an absolute sense as I think you are. :)

@joshgoebel
Copy link
Contributor

joshgoebel commented Dec 28, 2021

True, but the fix is currently only relevant for Rust as I understand it - correct me if I'm mistaken. ?

Ok, so it seems the idea is that stack pointer would underflow (well it's likely a u32, so it'd overflow) and roll around to MAX_INT... at which case accessing the stack with that as a pointer would trigger a memory out of bounds runtime error (which one could debug)... that was not clear to me earlier, but it is now. It'd be "free" for any language that placed the stack in LOW ram.

So, my next question: which languages can even move the stack to low RAM (at compile time) other than Rust?

@zetanumbers
Copy link
Contributor Author

It may even enter a loop eternity.

How?

Something could override a return pointer.

your fix is still bad.

It was put forth as a suggestion that does not require changing the entire WASM-4 architecture. So you have to judge in inside of that context, not in an absolute sense as I think you are. :)

This was only my estimate of your solution.

@zetanumbers
Copy link
Contributor Author

which languages can even move the stack to low RAM (at compile time) other than Rust?

If you can pass arguments to the wasm-ld, you're good. Which is most languages i bet.

@joshgoebel
Copy link
Contributor

joshgoebel commented Dec 28, 2021

Something could override a return pointer.

The actual function call stack is in the WASM VM, not the memory... no? Though I suppose technically with indirect calls this might be possible if a function pointer was corrupted on the stack? So perhaps that's what you're referring to or I'm missing some nuance of WASM I don't grok yet.

If you can pass arguments to the wasm-ld, you're good. Which is most languages i bet.

This assumes someone has a current LLVM installed...? I'm using Zig (from website) and C/C++ (from Xcode tools) and I don't have wasm-ld (at least it's not in my path?)... though I suppose we could always decide to make that a requirement for building ALL languages... if it was decided to be that much of a win and people would just have to install it?

@zetanumbers
Copy link
Contributor Author

The actual function call stack is in the WASM VM, not the memory... no? Though I suppose technically with indirect calls this might be possible if a function pointer was corrupted on the stack? So perhaps that's what you're referring to or I'm missing some nuance of WASM I don't grok yet.

I invite you to read the WebAssembly specification. Small hint: wasm compiles down to assembly with runtime checks.

This assumes someone has a current LLVM installed...? I'm using Zig (from website) and C/C++ (from Xcode tools) and I don't have wasm-ld (at least it's not in my path?)... though I suppose we could always decide to make that a requirement for building ALL languages... if it was decided to be that much of a win and people would just have to install it?

I am not suggesting to install wasm-ld. If your compiler uses LLVM and compiles to wasm, then it is already there. We just need to tweak a couple of options passed to the linker. Zig, C, C++, Rust are all good i checked. If some language doesn't support it, well then, we could create an issue like this, otherwise this bug will persist there no mater what we want.

@joshgoebel
Copy link
Contributor

joshgoebel commented Dec 28, 2021

I'm not entirely unfamiliar... but the reason WASM has opcodes like local.get is to retrieve things from the WASM [outside] stack (I presume locals are on a stack or stack like data structure)... that's not the stack in memory... I'll refer to them as outside and inside stack as I don't know if there are better terms...

From MDN:

(func (param i32) (param f32) (local f64)
  local.get 0
  local.get 1
  local.get 2)

The instruction local.get 0 would get the i32 parameter, local.get 1 would get the f32 parameter, and local.get 2 would get the f64 local.

Those parameters are passed on the stack on the WASM side (outside stack), not inside the VM memory (from what I'm reading). If they were using the stack in RAM (inside stack) then you'd see code working with global 0 (the stack pointer in all the disassemblies I've compiled), something like:

    global.get 0
    i32.const 128
    i32.sub
    global.set 0

Except you'd have a load, etc... Often you'll see a local being used to hold the stack frame as well. In a real computer when subroutine is called typically the PC (program counter) will be pushed on the stack and then the RET (return) opcode would pop it... From what I've seen in WASM that's all dealt with on the WASM side (what I've been calling the outside stack)...

In my experience so far with Zig and WASM the inside stack is mostly being used for local memory allocation of large variables inside functions (like strings, large arrays, etc)... things that don't map super well to locals.

@zetanumbers
Copy link
Contributor Author

@joshgoebel let's not focus on it. It's not relevant. The bug is pretty bad anyway as it triggers UB.

@aduros
Copy link
Owner

aduros commented Jan 9, 2022

I'm not too crazy about any of the proposed solutions. I think having WASM-4's system memory at a variable address would introduce too much complexity. The proposal about moving the WASM-4 system memory after the stack but before the global base is interesting but not really feasible at least for LLVM-based languages (the --global-base option is ignored if --stack-first is used).

At the moment this is affecting primarily Rust because the linker there uses --stack-first. I think we should:

  1. Use --stack-first on all other languages too. It's probably better that a stack overflow corrupts the framebuffer rather than static program memory.
  2. Increase the stack-size to 8 KB + the 6560 bytes of WASM-4 system memory (since the stack envelopes WASM-4 memory when using --stack-first)
  3. Sentinel: Have an option to make the devtools watch around byte 6560 and if non-zero, raise an error. This isn't perfect for the reasons you mentioned but it would at least tell the user that an error has occured and they should either increase the stack size or change their program.

@zetanumbers
Copy link
Contributor Author

Yeah, these steps are do not break anything, so they're ok to implement. I wouldn't say relocatable memory map adds any complexity, compared to undefined behaviour on a stack overflow, but we could go for the hard solution. And one last thing...

the --global-base option is ignored if --stack-first is used

It's not. It just moves data segment, not the stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants