-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delegate address space allocation to the VM #306
Comments
Thanks for writing this up. I think you're asking questions we've tried to address before, but haven't resolved yet. I'll ask some questions and point at previous discussions to make sure we're on the same page. I'm not shooing you away! Just trying to speak the same language :-)
Could you clarify what you mean? Is that the OS process' virtual address space (shared with the DOM and JS engine), or is that the view the wasm module has of its own address space (effectively an extra layer of virtual addressing).
#53 and linked issues may be relevant.
#227 and linked issues may be relevant.
Amen. |
I was referring to how you can give an ELF or PE module a preferred base address. It's useful for native code to avoid relocations and maximize page sharing between processes, but only applies to WebAssembly data segments. IMO those should be loaded at random addresses anyway, so there's no escaping the relocations.
I think the mmap and munmap stuff already proposed is good, although given that they no longer correspond to the overloaded semantics of POSIX mmap, they should perhaps be called something more specific, like commit_pages or decommit_pages. In comparison to your proposal in the data segment linking thread (#302 (comment)), my proposal really only differs as the title suggests: by giving the VM the exclusive privilege of address space allocation, rather that allowing a toolchain/module to provide their own page allocator. The rest of the differences follow from that:
NaCl certainly proves a separate OS process is useful for efficient memory isolation. It looks like NaCl talks to the browser through asynchronous messages. Is that acceptable for WebAssembly? It would be possible to provide synchronous control flow between WASM and JS even if the WASM was running in a separate process, but IMO it would be preferable to provide an optional library for emulating that on top of a low-level interface that's fundamentally async. |
For browser scenarios, synchronous re-entrant calls are a requirement, i.e. JS -> wasm -> JS -> wasm. You could do this via cross-process remoting, but it's questionable whether it would be robust or performant. It's possible that over time the importance of those sync calls will be diminished because wasm will be able to access key APIs directly. In the MVP, though, you'll be bouncing out to JS for basically everything. In-process is also a (near-)requirement for direct DOM interaction, and that seems like something that is wanted for various use cases. |
I think we have a small disconnect: we're designing WebAssembly with the assumption that it doesn't control the entire process' address space. It shares it with other code, and only gets a small contiguous section of virtual memory. The wasm program doesn't know what that virtual address is, it sees its base as zero, and knows its extent, but cannot access other addresses in the same process because it's highly untrusted and therefore needs to be sandboxed. On 32-bits we further want to avoid exhausting virtual address space in the process, or causing excessive fragmentation.
I think we agree on this: we don't want to force physical reservation if we can avoid it, we only force virtual address space allocation. wasm doesn't necessarily mandate this, but doesn't prevent VMs from doing so. We do mandate that memory be zero initialized, but that can be done lazily on commit.
I think we agree here too?
That's where we disagree, but that's because we can't waste virtual address space in 32-bit processes.
In general we're trying to use ideas from the extensible web manifesto: provide the lowest-level capabilities, build tooling on top of it, but let developers to something else that we didn't expect. In this case it'll be nice for the tooling to do ASLR by default, but some developers may want something else, or we may simply get it wrong! Some applications will want to get clever with memory allocation locations (e.g. asan). I'm hoping my explanations of what we assume make sense. Maybe we're assuming the wrong things :-) |
@AndrewScheidecker Just to add on to what @jfbastien already said, once one agrees on the contiguity requirement, then I think what you're proposing is equivalent to:
You do make the good point that I'm often using "module" in a fuzzy way that blurs the distinction between static code and a loaded instance. When I'm careful, I try to say "module instance" :) (FWIW, ES6 has Module Records and Module Environment Records.) Since we've had this confusion before, perhaps it's worth saying "process" instead of "module instance" and/or adding a clarifying section to Modules.md. |
I agree with the principal that the design should allow an implementation to embed a WASM address space in the browser process's address space. That's obviously necessary for the polyfilled MVP at least. I do want to make sure that it's practical for a browser implementation to execute a WASM process in a separate OS process, but that's not dependent on this issue.
I will concede that the peak address space hint needs to be part of WASM itself, rather than being demoted to a polyfill parameter as I suggested. I was assuming most browsers would want to put WASM processes in a separate OS process, but that looks impractical with the synchronous JS interop.
I think this makes sense, but it's a question of what's practically extensible. If you require the loader to ask the WASM process where to put a data segment, then it's fine to let the process do ASLR or whatever it wants to do. Making the loader call into the process is a huge can of worms (see DllMain), so I think that's an argument against doing it that way.
To be explicit about the terms I use below:
I think it makes sense for the WASM process to declare how its peak address space up front for the benefit of implementations where address space is a precious resource. Such implementations would then be able to OS-reserve a contiguous block of pages for that peak address space to ensure the WASM address space base and size are immutable. For the polyfill, the OS-reservation corresponds to a statically sized ArrayBuffer, and the page fault semantics could be ignored. The platform would also WASM-reserve+commit some pages for the data segments of modules loaded to start the process, but otherwise when control is transferred to the WebAssembly process it shouldn't assume any more committed pages than that. To start making dynamic allocations, it must call mmap to commit additional pages. So the module doesn't need to declare a minimum memory size, that's implicit in the size of the .data and .bss sections.
By library, do you mean something compiled into the WASM process, or something defined by the WASM platform? By "delegate address space allocation to the VM" I mean that the set of WASM-reserved pages is managed by the WASM platform rather than the WASM process. Here I believe you're using commit to mean "has a physical page associated", in contrast to how I've been using it. I've been using it to include virtual pages backed by not only physical pages but also pagefile or lazily initialized zeroes; anything that is committed as far as a user-space process is concerned. I'm proposing that the WASM platform be allowed (but not required for at least the polyfill) to start your process out with most of the address space uncommitted in the sense that accessing it will result in an unhandled page fault.
I'm not worried about reducing the amount of physical memory used, but rather making sure this will all work with dynamic linking, or asan as @jfbastien mentioned. Making the address space PROT_NONE by default isn't important beyond that an implementation should be allowed to do it, and WASM processes should go through a platform-level mmap to commit pages instead of expecting to be able to read and write anywhere in its address space. |
We're pretty worried about this because mobile platforms don't have that much memory to spare. These platforms otherwise have to resort to an OOM killer if they don't expose an API where developers can relinquish memory. |
Yes, as @jfbastien explained.
That is already the case.
That is already the case, assuming the module declares a small initial heap size. |
There is now a lot of consensus around the current model of a contiguous address space, |
Forked from the discussion on dynamically linked data segments (#302):
I think what @jfbastien said is about as good as you can do within the current memory model, but to me it seems inevitable that wasm must abandon the idea that a module has absolute control over its address space. I think a model closer to OS processes can be efficiently polyfilled and will more easily accommodate dynamic linking.
Some things we can take from OSes:
Some things we shouldn't take from OSes:
If we applied these ideas to WebAssembly:
This could all be implemented efficiently in the polyfill as a page allocator for the asm.js linear memory. The polyfill could also require the application to provide an initial size for linear memory, a hint that would no longer be necessary for the wasm module itself.
I expect native implementations would reserve a fixed range of addresses for a wasm process, and generate memory access code using an immutable base and bounds check. However you do it, supporting a true 64-bit address space would likely require a separate OS process for each wasm process, which has its own implications for things like APIs to talk to the browser.
Thoughts?
The text was updated successfully, but these errors were encountered: