-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wasm needs a better memory management story #1397
Comments
Thanks @juj, this is a great write-up! I just wanted to add a supplementary comment, but I hope someone else can chime in with a more holistic perspective (I did read the whole thing, I'm just not qualified to respond to most of it):
IIUC, this is already permitted by the specification, since even when setting a maximum size it is permitted for |
This is a really interesting read, thank you. I may not be particularly qualified to comment on this, but my outsider's perspective is that wasm as it stands today assumes, and prohibits deviations from, a simulated physical memory model. It should continue requiring only such a model, but allow for "full" virtual memory capabilities (with the possible exception of such pains as mapping thread-local storage into the shared address space). This should happen in the wasm spec, rather than simply stating that all memory issues are implementation-dependent. That is because while the virtual memory model does offer near-endless possibilities, most of them can be accessed through standardized and extensible interfaces which would not be beyond the scope of such a specification. We're talking about a small number of POSIX system calls, and having reasonable fallbacks for them (such as copying rather than remapping memory). In other words, I think this is a case where the benefits of going for a general solution outweigh the burden of implementing a few ENOSYS wrappers on low-end implementations. The initial model was way too limited, and replacing it by one that's still quite limited seems like a bad idea to me. |
Not having access to virtual memory and memory being committed vs reserved is one of the reasons why for the WASM target Lumen (our AoT, single-binary Erlang runtime/compile) needs to have a different memory allocator than one closer to how the BEAM VM for Erlang does memory management. @bitwalker can go into more details of the changes. |
Hi @juj! There's a lot to address in your comment, but just to focus on the subsection "Why Wasm requires developers to know the needed memory size at compile time", w.r.t this bullet:
Maybe I'm misunderstanding the problem or the current implementation strategies in Chrome/Safari, but the intention of having a separate Would that address this part of the problem? If so, perhaps we could ask the Chrome/Safari engineers if this matches their current implementation. |
@lukewagner one aspect of the problem mentioned in that same subsection is that, at least on V8, that approach leads to This ties into the point made in idea (1) towards the end of the post, that the optimal strategy for picking EDIT: if the Firefox implementation is aggressive in reserving as much memory/address space as it can, does it ever try to release any if it's not grown into after some amount of time (in line with my comment)? One other aspect of the OP is that Wasm programs making large reservations can cause problems for mobile devices. |
JavaScriptCore only reserves the requested initial. That said, currently JSC's wasm only ships on 64-bit so VA space is less of an issue. Although, we do put WASM into a large "caged" VA space so they could out of VA there but they're much more likely to get killed by the OS before that. If we ever shipped on 32-bit we would certainly have the same issue as V8.
My assumption is that FF is |
On the surface, it looks like the biggest pain point is inability to release memory. #1396 describes a workaround - reinstantiate while preserving compiled module, but that requires a high degree of compartmentalization and might not be feasible for some apps. Shrinking memory within existing model isn't trivial. While we can grow memory by adding more pages at the end of the address space, if we do the same for shrinking it we would require defragmentation (to ensure those are actually empty), which means that a simple I am not sure using memory buffer for anything else would open the door for security vulnerabilities (probably not in an obvious way, but probably would require a bit of hardening), but more importantly any solution would require new instructions. I think we need a memory buffer management tied to primitives accessible from host memory management routines, which is probably close to approach 2. There is a multi-memory proposal, maybe it would be possible to map large allocations to new memories which would get GC'd once unreferenced. |
How close is this to adding a GC'd reference type representing a first-class byte buffer, with operations analogous to Related, there is a JS proposal for a ResizableArrayBuffer, which if implemented successfully could have implications for the viability of |
@conrad-watt Oops, I had missed that comment, sorry. Just to give a bit more historical background: half the motivation for adding
FF clamps the max reservation size to 1gb which, in practice, seems to leave enough room for the other allocations, although I could imagine also choosing a somewhat lower clamp. It's hard to design a heuristic that knows when you've seen the "last" |
On the separate topic of shrinking: do people actually want a |
To concretely help gauge the differences in browsers on this behavior, I wrote a mobile friendly interactive memory allocation test page, available at http://clb.confined.space/wasm_grow.html (self contained HTML you can download, or run live) Here is what I see: Huawei P10 Plus (6GB of RAM) + Android 8.0.0 + Chrome 88.0.4324.152
Huawei P10 Plus (6GB of RAM) + Android 8.0.0 + Firefox 85.1.3
iPhone Xs + iOS Safari 13.3.1(apologies for not testing on a newer iOS Safari, but iOS update is not working to update to newer version on this phone, and I do not have any other one to test with. I hope this data is still relevant)
SummaryThe aforementioned issues pop up in different forms in the tests:
Being in danger of suffocating browser native address space issues would not show up in this test, mainly because this test does not call out to any memory intensive browser APIs (XHR/Fetch/WebGL/WebAudio) that might risk exhausting memory. It is hard to say how prevalent such issues are on 32-bit Chrome. Firefox had an excellent memory allocation success in this test. Testing some of this behavior is extremely fuzzy, for two main reasons:
|
Hi Luke! I recall this thread of conversation well, as I was also working with that partner collaboration. It did indeed help 32-bit Firefox to a great extent based on their telemetry. In the test scheme above, Firefox on Android performs well, and is able to allocate large heaps. (not sure if it is a 64-bit process already on Android?) Though in the test scheme above, it looks like no browser performed any different when gratuitous Even with that recollection, this current behavior we have been seeing with
We have received some odd behavior (maybe due to this?) in Safari where people report that when they have "old browser process" (long running process/lots of tabs open?), they may fail to launch Unity pages due to OOMs or page reloads, but killing Safari process and reopening it will help a Unity game to launch again. It has been very difficult to raise a bug report about this since producing an "old browser process" in QA is quite a fuzzy and nonrepeatable procedure. (in fact, we do see get similar reports also occassionally in Firefox and Chrome, but not quite as often as with Safari) Although now in the above test, this "shrinking" of available memory was reproduced, i.e. first page load got 768MB of Wasm heap, first page reload 544MB, and second page reload was down to 512MB. Opened [WebKit 222097] about this.
I tend to agree, since if there is a way to release memory, then it would probably fall out of that that the initial commit vs reserve semantics would need to become well defined across implementations. One could then release all the memory that was initially committed (if it happened to cause a commit). The memory allocation problems with test results that I have in the above post could presumably be dealt with implementation specific bugs (Chrome being 32-bit, not getting graceful JS OOM throws on large alloc failures, etc).
On its own, a .shrink() would not be enough. Indeed it would be an opportunistic behavior where an emmalloc/dlmalloc impl could only .shrink() when the freed allocations occurred at the top of the heap, which may not be the case for many applications (and needs the programmer to be memory fragmentation aware). Although in some apps, this could "trivially" be the case when they do large transitions in application lifetime (user closes edited document, player exits a game level back to main menu), where user navigation flow has been able to guarantee this kind of stacking allocation order. The intent with .shrink() was that maybe it could help give some address space back to a 32-bit browser, i.e. let wasm apps run at all times with the smallest heap size that they need to fit all their own memory into. Then the browser would know also at runtime how much of that gratuitously reserved One pragmatic thing that such a .shrink() would certainly help if nothing else, are the large number of bug reports people produce about Wasm apps consuming large amounts of memory, or having a memory leak when they enter and exit a full game scene in Unity. What people are doing is they look at their Chrome/Firefox/Safari DevTools Memory tab, and see the effects of .grow() when they enter a scene, but when they exit back and the scene is unloaded and memory cleared, they can not observe any shrink in the wasm heap in DevTools, leading them to think that a memory leak must have occurred. In other words, browser DevTooling is unable to account for the actually used memory in Wasm. I wonder what would happen on desktop when wasm64 becomes a thing. If a wasm64 app performs a huge/maximum address space reservation, could such operation cause a 64-bit browser to be address space constrained on the native side? Could a wasm64 app be desired to be able to .shrink() the address space back to the browser? Or maybe wasm64 will still not allow an app to reserve the full 64-bit address space, but a much more modest fraction of it, so that browser still will have plenty for its own. Btw, after seeing https://github.com/bytecodealliance/wasm-micro-runtime earlier my first thought was to wonder how they deal with the lack of .shrink() in extremely memory constrained systems that may not have a concept of virtual address space at all(?).
This would certainly be the main remedy that I can think. Instead of .shrink(), that would allow all apps to benefit, and help the Fast App Switching problems. Orthogonally to all of this, even already today without any spec changes, I wonder if the current browser DevTools implementations could be improved to detect and display how much of the wasm heap is actually committed vs just reserved? Currently all browsers will show a huge opaque block of Memory for the Wasm allocation. It would be nice to have DevTools display a "committed size, reserved size, % committed" type of visuals, where one could then see how much memory their application is impacting in practice. What this would help is that developers would better understand the behavior they are getting when they are doing browser specific workarounds to WebAssembly.Memory() allocation patterns. Also when writing Emscripten's emmalloc I have wondered whether the memory region marking strategy can cause excess page commits for unused memory pages that applications may not ever be using, so would be great to see how that behaves in practice. |
Somewhat related: discussion on "Support for reserving address space" in Memory64: WebAssembly/memory64#4 I generally would be very much in support of adding features related to mmap / reservation / shrinking / probing etc. to Wasm. Besides needing them for memory constrained devices, we will also need these for the opposite: programs wishing to manage large amounts of address space. |
@juj It seems like, if browsers did implement the FF Jukka, would a good deal of your needs be addressed if:
? |
@conrad-watt ideally very close, I just wasn't sure how this would work in the existing memory model. Sorry, I have not been following GC proposal close enough, can an object like this be accessed as part of linear memory? |
The "simplest" version of (my interpretation of) this idea would be to make such buffers like any other GC object. That is, each buffer would have an entirely disjoint address space from any other (enforced by bounds checking), they'd have their own family of load/store operations, and would be stored (by reference) in a table, or as a field of another GC object, rather than in linear memory. I wasn't sure if this was what you had in mind, or if the idea was to tie more closely to the existing linear memory (by having a host procedure to manage chunks of linear memory that are still manually accessible through regular load/store?). |
That would certainly be expected to fix the Chrome and Safari issues that allocating a large initial is better than growing from a small initial. That would also be expected to fix the Fast App Switching problem. I am not sure if after those we will still have stability issues on 32-bit browsers, caused by a Wasm page reserving a large 2GB part of the process address space, leaving the browser with <=2GB left for its own use. Currently nothing stops a browser from gnawing back from top end of that address space if JS side does large XHRs or memory intensive WebGL operations, but if the page happened to temporarily have done a huge 2GB alloc (to grow() to consume the whole heap) but then freed all of it, that address space would then permanently be off limits for the browser to chip into. Would a .shrink() operation to enable address space stealing be too contrived to implement? One particular detail about .discard() is the behavior that should happen when an app attempts to touch the memory to commit it again, but there is not enough memory available to commit. Regular JS ArrayBuffer allocations and Wasm .grow()s are "blocky" in that if I want to allocate e.g. 1GB, the allocation is monolithic in that 1GB, and if that fails, I should be able to gracefully get a JS exception/trap out of it, and decide to do something else. This is super-important for stability. But touching memory to commit it will not be blocky, but will roll in one page at a time, so one might get 900MB of that 1GB reserve committed, and then run into a page that finally exhausts the available physical memory. We would not want the browser to silently reload the page like current Firefox/Safari/Chrome behavior on OOM can be. But instead, one would prefer to have a way to gracefully manage the page commit failure, and avoid the 1GB allocation altogether (and probably uncommit that 900MB from before to avoid browser small OOMing itself right after). So the exact semantics of what should happen when a page commit fails on memory store are important. (also what should happen when one attempts to load memory from an uncommitted page?) Maybe in addition to memory store implicitly committing a page, there could be a dedicated instruction Would the commit vs reserve page size be fixed (to the same 64K of the wasm page size?), or variable size depending on the underlying architecture? If variable size, can there be an instruction to query this size? Finally, would it make sense to give applications an instruction to programmatically query a) if a given address (range?) is committed or not, and b) ask the number of committed pages total in wasm memory? Those would be nice to help implement debugging and profiling support to applications and allocators. |
For these issues, I'd like to re-highlight my earlier comment (second half) about (1) assumed engine clamping of the
That's a great point. More generally, from talking about this with @lars-t-hansen today, it seems like, on systems where a random (As a side note on terminology, and I'm not sure if I'm correct here, so happy to have corrections, but, IIUC: "committed" means neither "virtual address space allocated" nor "RAM pages allocated to page table entries"; rather, it means "you can access this region without SIGSEGV, but it might not be backed by RAM, so you may have a kernel trap on access that may OOM-kill you". Given this, it seems like "committed" doesn't imply the desired property of "not crashing at random Returning to the hypothetical |
I must admit that I am not familiar with the Linux/Unix parlance of these terms, but I hope my use of "reserved" vs "committed" in earlier messages follows the correct semantics that Windows uses them with (https://docs.microsoft.com/en-us/previous-versions/ms810627(v=msdn.10) ).
In the absence of a It might be brittle if the reservation stealing would only work if the wasm app was still pristine, but not if it had earlier temporarily used a lot of memory. Or maybe .shrink() is not needed and such reservation stealing would also work on unpopulated pages at the top end of the heap, where the browser could take those away in low mem scenarios even if app had .grow()n to them but later discarded the pages; and forbid the wasm app from populating any of the high pages if the browser needed to use them to avoid OOMing? It is true that such .shrink() only from the top end type may require developers to pay extra attention to fragmentation, but I do see that as being better option, compared to the possible problems that might arise if wasm apps that have temporarily used a lot of memory can make the browser more prone to OOMing.
This sounds very good. That would help apps decide to do something else on large OOMs without risking of populating up to last available page in the browser and then failing. What would the semantics of memory loads and stores in general be like to unpopulated pages, when there is plenty of memory available? Would each touch of a page implicitly populate under the hood? Or would it trap? It feels like either behavior could be useful, not sure which way to lean on this. Also, would it make sense to have an instruction to switch a page to be read-write vs read-only vs noaccess? Those could be interesting to help debugging and error catching. |
This is where it's important to distinguish "reserved-by-maximum vmem" from "memory accessible to wasm via
Although you could imagine a trapping semantics being useful for catching bugs, this would place a major requirement on wasm engines to use signal-handler tricks to avoid costly per-memory-access checks (which not all engines can do now or in the future). That's why I proposed above that
Definitely agreed that these would be valuable, but the same caveat applies that implementing this feature without the benefit of memory-protection+signal-handlers would be pretty expensive. At least, that's what has held us back so far; maybe we should revisit this at some point. There's also challenging questions in the JS API for how to handle typed array views that overlap read-only or inaccessible regions. |
I do understand that, but that is not the scenario when I am concerned about browser not being able to release it. If the wasm page temporarily uses all of the reserved max memory, i.e. .grow()s to take over it, but then later releases most of it, then the memory region would again be unpopulated, but currently the browser cannot recognize it, and cannot claim any of it to its own use. This temp large .grow() is what I am concerned about, since that cannot be undone unless a .shrink() would be supported.
That does make sense.
Gotcha - memory protection is something that I don't see critical at all for solving mobile memory problems, so that can certainly be left out. Was just rather curious whether that would have come practically "for free" on the side. |
Ah, that's a different case than I was replying to. For the For the general case, I think The reason I push back on
Yep! |
Based on the memory management in V8 for array buffers, especially shared buffers, it would be quite difficult to support |
Yes, indeed. I'll repeat the rationale:
In some applications it is easily the case that on "grand scale" the allocations have good stack-like characteristics when one transitions between e.g. main menu and the game levels. Applications may need to develop custom memory pools to manage this kind of behavior, but that is not much different from wasm today. Already in the absence of .shrink(), Wasm developers need to be mindful about memory fragmentation, so introducing a .shrink() would not change that fact.
This is perhaps a bit too simplistic model. If we look at app loading flow, then under current .grow() only model, it will actually be the "second document load" (document/game level/asset/...) that will cause the most simultaneously consumed address space pressure, e.g.:
Outside the loading process, applications can also have large persistent JS side memory allocations long after the wasm heap has been .grow()n to its maximum size. E.g. when
but the size of the needed JS side memory can vary between documents/game levels, so when one level might need more of Wasm memory, another level might need more of JS memory. E.g. in Unity game specifically, if there is programmatically heavy computation (pathfinding, AI, noise, skinning, some other game C# computation) in one level, that could amplify a lot of wasm .grow()s to occur. If there is a lot of audio, or data marshalling, or cutscene videos, then there will be a lot of JS memory usage. These Wasm vs JS side maximums will not necessarily happen at the same time, but without a .shrink() operation, one cannot do anything to combat this (and should probably pretend as if these maximums did occur simultaneously). With wasm32 at least 64-bit browsers will be immune to this, so this will be a 32-bit browser only concern. Not sure what will happen with wasm64.
It would certainly not be a 100% cure, since a wasm application that was not fragmentation aware would not be able to benefit. Though if one is developing a wasm page with large data sets, unfortunately one will already need to be fragmentation aware, there is no escaping that with or without .shrink(). I do appreciate the trouble with shared memories. |
Thanks for the info @juj. The point I'm trying to dig into, though, is: even though apps may have this stack-like "grand scale" behavior you mention, that doesn't ensure that
Your loading scenario makes sense, but a slight variation shows how having the browser release vmem dynamically could be equally problematic: imagine step 5 shrinks (and releases vmem) and but then step 6 tries to perform a large new wasm allocation which now fails due to fragmentation. This seems like a difficult tension to resolve in general. What I can imagine being a more reliable way to avoid this kind of thrashing between wasm and JS memory is to avoid pulling in large allocations directly into linear memory all-at-once by instead streaming bounded-sized chunks (from a backing Blob or ArrayBuffer) into wasm memory on-demand. (I know that's not always possible, though.)
On a side note, IIUC, on both Chrome and Firefox, Blobs are not kept in memory. Thus, I think you can "stream" a Blob by |
@conrad-watt sorry for taking this long to reply :)
That should work as long as we can present the allocated objects as something memory-like to the consumers in the module. Do we have enough support for this in the standard or near-future proposals?
This is what I originally thought, since that is closer to how accessing memory works in the native world, though after giving it a little more thought I am not sure this would be easy to support within existing model of linear memory. |
@conrad-watt's approach can be extended to support POSIX stack emulation - instead of incrementing a global "stack base" symbol on entry and decrementing it on exit, function can request an object on which would be GC'd after it exits. This would free up linear memory and prevent stack walking. |
Not in any language that can take the address of stack variables, I think. |
Not necessarily, if some form or reference would be considered an address it would work; also this issue would apply to heap objects too. I am not yet sure how this would work though, my speculation would be that via some combination of interface types and GC we can get an object which can be represented as a bag of bytes and then do something that resembles memory operations on it. |
The kind of fragmentation you're describing makes it sound like you're thinking of general-purpose allocators like
I think the idea with most of these is that if you need some info across frames/requests, you just store it globally; but separate arenas with different lifetimes are also a thing, typically called region-based memory management. Obviously these are indeed a big architectural decisions with app-wide implications, but I don't think they're unusual at all in practice, especially for games, which are a major use case for Wasm IIUC. I apologize if you already know all this—most of this ticket is over my head, nor am I a game developer. |
I think What's the problem with this approach? |
Problem with just shrinking is that free pages might not be at the end. Though I am not sure that is a good enough reason to not introduce it: for usage patterns where that would work it would provide a relief, while the rest would stay unchanged. |
Others have approached an instruction which semantically zeroes memory pages but also hints that they will not be needed soon, so that the underlying implementation can do the equivalent of (edit: read up the thread a bit, I think the suggestions cover |
It seems that this whole "feature" of a shrink method possibly not helping much, seems to be a product of two things:
As I dev I can probably fix this for myself by employing some techniques for compacting my data and avoiding fragmentation, on the simplest end of the spectrum I'd prevent long-living objects, on the most complex I'd develop my own garbage collection library. I guess that "paging" of the memory poses a security concern or it can't be done for all OSes? |
Hey, we are getting towards a 2 year anniversary of this conversation thread - I am wondering if there might have been updated progress or revised thoughts on the WebAssembly group on this topic? On Unity's side, we are getting growing amounts of issue reports about running out of memory on mobile devices, and about Unity content behaving poorly with respect to trying to avoid application switching eviction behavior. More Unity Wasm developers are trying their feet with targeting mobile, and game developers overwhelmingly report that the mobile space is where gaming dominates. At the moment we are in a hard position to be able to officially call "Mobile WebGL" being a supported platform at Unity, due to the memory challenges that mobile Wasm content faces. Most recently as of yesterday, we have started getting reports about Unity Wasm content running out of memory on mobile devices in the NASA JPL Artemis moon rocket tracking application: https://www.nasa.gov/specials/trackartemis/ that has been developed with Unity. (Those reports have been anecdotal in that we haven't been able to verify them in action, but it did did remind me to chime in on this issue) @dtig opened the discussion thread #1439 active for the proposal https://github.com/dtig/memory-control . There the operation Again I want to echo that I would be eager to help test an implementation against Emscripten dlmalloc/emmalloc and Unity Wasm content to provide real-world feedback on how well the feature would work in practice, if/when there would be a browser+LLVM tooling implementation prototype that would become available. |
@juj The proposal did go on a hiatus for some time for bandwidth reasons, and to figure out how to make |
@juj SpiderMonkey now also has a prototype of a |
Hey, this is absolutely amazing news! Made a note to look into experimenting with this, and see how it plays out. |
I've now created a branch of Emscripten that adds From a super-quick test, it is working out as expected in Firefox Nightly. I'll look to do more comprehensive integrated testing as the next steps. |
Hi all,
after a video call with google last week, I was encouraged to raise a conversation here around issues we at Unity have with Wasm memory allocation.
The short summary is that currently Wasm has grave limitations that make many applications infeasible to be reliably deployed on mobile browsers. Here I stress the word reliably, since things may work on some devices for some % of users you deploy to, depending on how much memory your wasm page needs, but as your application's memory needs grow, the % of users you are able to deploy to can dramatically fall.
These issues already occur when the Wasm page uses only a fraction of total RAM of the device. (e.g. at 300MB-500MB)
These issues have been raised as browser issues, but the underlying theme is recognizing that the wasm spec is not robust enough for mobile deployment to customers.
These troubles stem from the following limitations:
So basically Wasm memory story is "you can only grab more memory, with no guarantee if the memory you got is a reserve or a commit".
These are not particularly newly recognized issues, the memory model has been the same since MVP, and we have been dealing these ever since early asm.js days, but now that applications are becoming more complex and developers' expectations on what types of applications they want to deploy on which devices is growing, and developers are actually aiming to ship to paying customers, where reliability needs to be near that 100%, we are seeing hard ceilings on this issue in the wild.
Note that listing the limitations above is not implying that fix would be for wasm spec to somehow add support to all of these, but to set the stage that these are the limitations that exist, since their contributed combination is what causes headache to developers.
The way that Wasm VM implementations seem to tackle these issues is to try to be smart/automatic under the hood about reserve vs commit behavior, and esp. around shared vs non-shared memory. However it is still the application developer's responsibility to concretely navigate the app in the low-memory landscape, and this leads to developers needing to "decipher" the VM's behavior patterns around commit vs reserve outside the spec. For an example of the vendor-specific suggestions that this leads to, see https://bugs.chromium.org/p/chromium/issues/detail?id=1175564#c7 .
On desktop, the Wasm spec memory issues have so far fallen in the "awkward" category at most, because i) all OSes and browsers have completed migration to 64-bit already, ii) desktops can afford large 16GB+ RAM sizes (and RAM sizes are expandable on many desktops), and iii) desktops have large disk sizes for the OS to swap pages out to, so even large numbers of committed pages may not be the end of the world (just "awkward") esp. if they go unused for most parts.
On mobile, none of that is true.
Note that wasm memory64 proposal does not relate or solve to this problem. That proposal is about letting applications to use more than 4GB of memory, but this issue is about Wasm applications not being able to safely manage much smaller amounts of memory on mobile devices. (the opposite is probably true, attempting to deploy wasm64 on mobile devices would cause even more issues)
Currently allocating more than ~300MB of memory is not reliable on Chrome on Android without resorting to Chrome-specific workarounds, nor in Safari on iOS. As per the suggestions in the Chromium thread, applications should either know up front at compile time how much memory they will need, or gratuitously reserve everything that they can. Neither of these suggestions is viable.
Why Wasm requires developers to know the needed memory size at compile time
The Wasm spec says that one can conveniently set initial memory size to what they need to launch, and then grow more when the situation demands it. Setting maximum is optional, to allow for unbounded growth. On paper this suggests that developers might not need to know how much they need at compile time.
Reality is quite different, for the following reasons:
In practice, especially on memory constrained devices, the current spec necessitates developers to somehow "just know" how much memory will be needed.
Why expecting developers to set memory size at compile time is not feasible
With respect to memory usage patterns, there are generally three types of apps/app workloads:
App developers cannot know the wasm memory size of apps of first type. To enable everyone's work size, they must generally reserve everything they can, and this has problems:
App developers of type 2) share much of the above problems that apps of type 1) have, but one might argue they should be expected to be able to find the max needed size throughout their app lifetime and allocate that, but finding that limit can be hard work, and you may not be able to do it with 100% certainty.
Or developers of apps of type 3) might certainly be expected to choose the right needed amount and be happy with it. Initially it sounds like developers who have an app of type 3 can profile their apps to come up with a suitable initial memory size and never grow. However this has issues:
Android app switching is a major Wasm usability pain
The documentation at https://developer.android.com/topic/performance/memory-overview at the very bottom of the page states:
It is a common game development QA test to perform "fast app switching" testing, which can kill game UX and player interest if it does not work. For example if a user is playing a game, then gets a WhatsApp message, they will quickly switch over to WhatsApp, type in a message, and then switch back in to the game, and expect the game to still be running. Or switch over to email, or Instagram, or whatever you have, and come back a few minutes later.
The less memory your application is consuming, the better chances you have that the page will not need to reload. With native applications this prompts the developer to push their memory usage down as much as possible when they are switched out. Mobile devices do not swap memory back to disk (at least like desktops do), but they will kill background apps if they run out of memory.
For wasm apps running in a browser, this means that for an app that has extra gig in their Wasm heap going unused because they cannot release it back to the OS, the browser will become a prime target for being killed out, and when you task switch back to the app page, the page will reload from scratch, killing fast switching.
Safari even kills you on the foreground if you allocate too much - but you have no way of knowing how much that too much is.
Some applications need address space, not memory
Native compiled wasm applications behave very similar to native applications. It is often a need for a native application to reserve a lot of address space in order to get access to a chunk of linearly consecutive memory (when existing memory allocations cannot find a linear block). Wasm applications sometimes need that too. Currently the only way to do that is to .grow() by a large amount. This means that whatever smaller bits of fragmented memory a wasm app has, can go unused, but still be committed in memory. This causes wasm apps to use more committed memory than their native counterparts.
The amount of this overhead depends on the amount of fragmentation that the wasm app causes. Most native applications have not needed to care about this for ages, but for wasm, this can be all of a sudden a huge issue. Note that memory64 proposal again does not resolve this, because it does not bring virtual memory to wasm - just changes the ISA to accept 64-bit addresses (to my best knowledge)
Summarising the problems
Reiterating, the main problems that we currently see:
What can be done about the problem?
In a recent video call with ARM, we discussed the (lack of) adoption of Unity3D on Wasm on ARM mobile devices, and the short summary is that these memory issues are a hard wall for feasibility of Unity3D on Wasm on Android. There have been existing conversations in #1396 and #1300 about how to shrink memory, but no concrete progress.
On the concrete bugs front, if Chrome eventually migrates to 64-bit process on Android, it can help larger than 300MB Wasm applications to work on chrome. (However an issue here may be is that manufacturers are still releasing 32-bit only Android hardware in 2020, because of old inventory stock or what - we have no idea) If Safari fixes their eager page kill behavior, maybe it will help developers gauge the max limits on iPhones. But those will not help the problem that a committed memory page is still a committed memory page, and a mobile device does have to carry it around somewhere.
Besides that, here are some ideas:
Would it be possible to make the commit vs reserve behavior explicit for Wasm? Maybe as a browser coordinated extension if not for the core spec? This would give guarantees to application developers as to what the best practices initial vs maximum vs grow semantics should be. The current situation where one browser vendor recommends to probe the max amount of memory that can be reserved, vs another browser vendor expecting that apps allocate only the minimum needed amount or be killed if they exceed that, strongly suggests that the spec is missing something to connect the expectations together.
Would it be possible to add support for unmapping memory pages from Wasm? Then e.g. Emscripten could implement unmapping of memory pages into its dlmalloc() and emmalloc() implementations, fixing memory commit issues, and the related Safari "high memory consumption" process killing, and Android task switch killing troubles?
Would it be possible to somehow make a softer version of WebAssembly.Memory maximum field? If an app allocates Memory with maximum=4gb, which risks the rest of the browser/JS losing its address space (in 32-bit contexts), then maybe the browser could start reclaiming the highest parts of that reserved address space for its own purposes if the wasm app hasn't .grow()n that memory into its own use yet?
Then if one allocated a Memory with maximum probed to as much as it can go, but then allocated a large regular ArrayBuffer, maybe the browser could just steal some of that maximum back, if the Wasm app hasn't .grow()n into it? Likewise, if there was a .shrink() operation that an app could make use of, then maybe paired with this kind of address space stealing logic, the Wasm app and the rest of the browser could coordinate to "trade" address space, depending on how much of it was actually committed in the wasm heap, vs not actually used.
I hope the impressions here will not be a "this should be left to implementation details", since when I raised these concerns as a browser implementation bug, the message was that maybe the wasm spec should address this. And currently browsers are certainly not providing common enough implementations to enable developers to succeed with Wasm on mobile devices.
Thanks if you read all the way to the end on the long post!
The text was updated successfully, but these errors were encountered: