-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unbounded memory allocations #34
Comments
I'm critical about this, for two reasons I think. If you disagree, please feel free to provide some insights into how you think the two issues should be addressed. Rust and it's "convenient" heap data structuresIf you look at C, you are required to calculate sizes of anything you allocate manually, and then pass that to a Rust offers very convenient wrappers around malloc. It calculates the sizes for you! The proposed way to address this issue is to re-do the calculation and to then use those Rust provided wrappers, only if the calculation resulted in an okay amount of data. That's a bad idea IMO because it means that I need to re-do all the stuff that Rust is supposed to do for me already. I'd be thrown back to the convenience levels of C! I think that a good way to fix this issue is to have the ability to set custom, failible allocators. Then I could just set any limit I want. I wonder whether there are any crates that have support either for custom and failible allocators, or for memory limits. Another problem is that allocation is just so easy to invoke... basically it can be as close as a The challenge of "sane default"
I'm not really sure whether lewton is a good place for choosing such values. I mean, just look at the simplest example I can think of: vorbis metadata. They are encoded in the format where first a 32 bit number n is encoded for the length, then the next n bytes contain the strings. Which should be a sane maximum length for the string? I could of course say, okay 640k oughta be enough, but maybe someone has put something really really long into the metadata? IDK. |
Sadly, memory allocators are global for the compiled binary, so the limit would also be global. Even if lewton is prepared to deal with failing allocations, client code cannot be expected to be able to recover from them, so the binary would crash on a failed allocation anyway. I don't think it is a reasonable solution.
This is not how C libraries actually operate. They do not impose limits on memory specifically because that tends to be very confusing for the users - it gets very hard to answer the question "will this file decode with default limits or not?". For example, libpng limits the image size in pixels to 1,000,000 by 1,000,000 by default, or roughly 2^40 pixels. This results in most real-world images still decoding fine, but the peak memory allocation is limited to 2^44 bytes per image in the worst case. That's a lot, but not anywhere near enough to exhaust 64-bit address space. If the API user expects to deal with images that large - which, by the way, would require 16 Terabytes of RAM and therefore completely impractical for years to come - they can override the default limit.
I'd actually allow full-length metadata (32 bits means 2Gb per chunk) and limit the amount of chunks to 16384 or some such. This gives 2^46 maximum allocation size. Or even lower amount of chunks, because recording 16,000 of those in a single file does not make sense in practice and if you've put that many of them in a single file you're probably trying to DoS something. And if the user has a really weird use case where they need to process such files, they can simply raise the limit. Builder pattern in Rust makes that a breeze. But if you don't want to limit the amount of data that can be decoded in any way, but also don't want to allow allocating terabytes of RAM up front, how about preallocating RAM up to a certain sane default, and only allocating more if that much data gets actually written? This solves the problem of address space exhaustion with crafted files, it still allows decoding huge files if that's what you've actually got (well, probably not terabytes), and it's rusty - Rust's vectors provide a very convenient way to do exactly that. |
Recently, @raphlinus has given an interesting talk about a bunch of things, including low-latency audio programming. He mentioned the idea that in order to check that code indeed avoids allocations, one could create a global allocator that panics/aborts if there is an allocation happening. Something like this (using the global allocator) could be used by lewton as well to ensure that there are no/few allocations. There won't be a static guarantee, but there'll be a dynamic one. |
I wrote a crate and pushed a new branch that uses it to address #43. I think this is the way forward. |
I'm a bit reluctant though as:
|
Also, do I understand correctly that Vorbis comments for which balloc is currently used are not compressed in any way? In that case you only have to limit the amount of memory you're willing to preallocate without actually writing to it, and just let vectors automatically increase capacity to accommodate the data once real writes start happening. This way you'd need at least a 512Mb file to allocate 1Gb of memory (assuming the preallocation threshold is under 1Gb). Best of all, this would avoid the API break altogether. |
In order to become allocation-free, an API change would be needed yeah, but my goal with #40 is only to reduce the allocations as much as possible.
Yes, the limiting is currently only done for the comments. I'm not done with it. There is minor scope overlap as allocations should either be removed or ported to balloc. The idea of balloc is that only a few allocations are happening, so that the extra checking only has negligible performance impact.
You mean something like As for the API breakage, any function that needs to allocate requires the ability to have limits passed down to it. Basically, this is the entire public API. Either I create a duplicate for each function, or I break the API and just create duplicates for the most important ones. I prefer the latter. |
The current changes on master are all backwards-compatible. I'll make a final release of the |
I mean something like this, overflow checks omitted for brevity: if total_comments_len + comment_len < limit {
vec::with_capacity(comment_len)
total_comments_len += comment_len
} else {
vec::new()
} For uncompressed content you would not be able to make the program allocate N bytes without actually supplying at least (N/2)+1 bytes of data. This could still be attacked by streaming a very long file over the network, but there would no longer be any 15kb files allocating hundreds of terabytes of RAM. |
@Shnatsel yeah that's the approach that you'd use in languages like C but as I wrote above I'd prefer if the checks are one abstraction layer below. |
There is now support for custom allocators in Vec on nightly: rust-lang/rust#78461 This should make keeping track of allocations a great deal easier. |
@Shnatsel if you hover over the PRs description you see me being the first guy who added the 🎉 emoji reaction :). Very glad it's finally happening. |
When given a crafted input lewton attempts to allocate enormous amounts of memory. This presents several issues:
Most decoding libraries face this issue at some point. This is usually solved by limiting the amount of allocated memory to some sane default, and letting people override it if they're really dealing with enormous amounts of data. In Rust we can easily allow the API user to override these limits via the builder pattern.
See https://libpng.sourceforge.io/decompression_bombs.html for more info on how a similar issue was solved in libpng. See also the Limits struct from flif crate.
We have briefly discussed this in #32, but I'm now filing this is as a separate issue because it is now a blocker for me. I'm seeing intermittent crashes under fuzzer that I cannot reproduce, and the inability to use sanitizers prevents me from pinpointing them.
The text was updated successfully, but these errors were encountered: