-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix nested returns. #16081
Fix nested returns. #16081
Conversation
Nice fix. |
I've been running this commit locally and it appears that it actually does break Tcp connections. The simple program
outputs Here is an strace: https://gist.github.com/apoelstra/ea7f2e8f0cddb14ccfc1 I'll recompile without the patch and post the difference. |
Here is an isolated test case that demonstrates the issue: // 16 bytes
struct RawData {
pub data: [u8, ..16]
}
// 16 bytes
struct StructuredData {
pub data: u16,
pub _align: i64,
}
fn addr_to_sockaddr() -> StructuredData {
let mut storage = StructuredData { data: 0, _align: 0 };
unsafe {
// Try to set the fields by directly setting memory
let raw = &mut storage as *mut _ as *mut RawData;
for i in range(0, 16) {
(*raw).data[i] = i as u8;
}
}
// Then return it as a struct which is largely padding
return storage;
}
fn main() {
// Get it back
let addr = addr_to_sockaddr();
// Convert to raw data
unsafe {
let addrp = &addr as *const _ as *const RawData;
// The padding is now zeroes!
println!("raw data {}", (*addrp).data.as_slice());
}
} This should print the numbers 0 through 15, but instead the numbers 2-7 (which correspond to bytes which were padding in the This is a problem because this sort of cast lunacy is used all over the place in the glibc/BSD sockets code, and libnative wraps this pretty faithfully. This causes the test failures since the Edit: replaced the test code to eliminate all the socket-related cruft. |
Modifying |
FWIW the struct probably needs |
@kballard I don't think undefined struct layout has been implemented yet. |
Oh sure, I'm not saying that's the cause here, just pointing out that it needs to be done. The RFC has been accepted so it's technically incorrect today even though the compiler support isn't written yet. |
Here's a neat observation: #include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef unsigned char byte;
// 16 bytes
struct RawData {
byte data[16];
};
// 16 bytes
struct StructuredData {
short data;
long _align;
};
struct StructuredData addr_to_sockaddr() {
struct StructuredData storage = { 0 };
// Try to set the fields by directly setting memory
struct RawData *raw = (struct RawData *) &storage;
memset(raw, 99, 16);
// Then return it as a struct which is largely padding
return storage;
}
int main(void) {
// Get it back
struct StructuredData addr = addr_to_sockaddr();
// Convert to raw data
struct RawData *addrp = (struct RawData *) &addr;
int i;
for(i = 0; i < 16; ++i) {
// The padding is now zeroes!
printf("%d, ", addrp->data[i]);
}
puts("");
} when compiled with gcc or clang, this exhibits exactly the same "bug". The difference is that it's non-idiomatic to return structures by value in C, so this is not a problem I've ever encountered (even when writing low-level socket code). IIRC for this sort of stuff out-pointers are the usual C idiom. I noticed this when trying to chase down the cause of the "no-padding copies" and finding myself surrounded by the faceless stares of LLVM intrinsics...so this gives weight to the "change the TCP code" strategy for fixing this :/. |
An idea: every struct for which we do this kind of thing should have all padding explicitly named, to ensure that LLVM load instructions will copy them over. They should be |
The `repr_c_implicit_padding` lint checks structures marked a u8 followed by a u64, which would require 7 bytes of padding to get the u64 aligned correctly. The reason for this is that in the BSD socket code, there is some shifty casting between structs which results in data being stored in padding bytes. Since the LLVM Load instruction does not copy padding, this can result in data loss. (This is also a problem in C, but since structs are rarely copied by value, instead using pointers and memcpy(), it is not as serious a problem.) We therefore need explicit padding for the socket structures, and this lint will make sure we get it. This is necessary because the recent pull rust-lang#16081 which is a fix for rust-lang#15763, changes the return value semantics to involve a Load. Without padding fixes, this breaks the Tcp code.
Fixed in #16208 --- once we merge that PR this one should pass. |
@apoelstra Why do we need to be explicit about padding? Actually caring about the contents of the padding seems like a weird exception, rather than the normal behavior for C datatypes. |
@kballard because within the socket code, struct pointers are casted into each other in such a careless way that data winds up in padding fields, depending which pointer type is used to access a region of memory. This is definitely a "weird exceptional" case, but sadly I doubt it is unheard of in other pointer-cast-polymorphism C code, hence my solution of adding a lint for it. |
This is needed for some GC stuff in Servo.
…t use an out pointer.
This change unfortunately caused a regression in stack usage (#19684), due to the significant increase in alloca's for code that wouldn't have encountered the problem this change fixes. From the looks of it, "nested returns" aren't the problem. The problem is really that we eagerly assume that the return expression we see will actually be executed, which isn't the case. |
rust-lang#16081 fixed an issue where a nested return statement would cause incorrect behaviour due to the inner return writing over the return stack slot that had already been written too. However, the check was very broad and picked many cases that wouldn't ever be affected by this issue. As a result, the number of allocas increased dramatically and therefore stack-size increased. LLVM is not able to remove all of the extraneous allocas. Any code that had multiple return values in a compound expression at the end of a function (including loops) would be hit by the issue. The check now uses a control-flow graph to only consider the case when the inner return is executed conditionally. By itself, this narrowed definition causes rust-lang#15763 to return, so the control-flow graph is also used to avoid passing the return slot as a destination when the result won't be used. This change allows the stack-size of the main rustc task to be reduced to 8MB from 32MB.
Remove the return_address intrinsic. This intrinsic to get the return pointer was introduced in #16248 / #16081 by @pcwalton for Servo. However, as explained in #34227, it's impossible to ensure it's used correctly, and it broke with `-Zorbit`. Servo's usage is being replaced in servo/servo#11872, and I expect nobody else to have abused it. But I've also started a crater run, just in case this is a `[breaking-change]` for anyone else.
Don't trim trailing whitespace from doc comments Don't trim trailing whitespace from doc comments as multiple trailing spaces indicates a hard line break in Markdown. I'd have liked to add a unit test for `docs_from_attrs`, but couldn't find a reasonable way to get an `&Attrs` object for use in the test. Fixes rust-lang#15877.
Fixes #15763