Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nested returns. #16081

Merged
merged 6 commits into from
Aug 12, 2014
Merged

Fix nested returns. #16081

merged 6 commits into from
Aug 12, 2014

Conversation

luqmana
Copy link
Member

@luqmana luqmana commented Jul 29, 2014

Fixes #15763

@brson
Copy link
Contributor

brson commented Jul 29, 2014

Nice fix.

bors added a commit that referenced this pull request Aug 1, 2014
@apoelstra
Copy link
Contributor

I've been running this commit locally and it appears that it actually does break Tcp connections. The simple program

use std::io::net::tcp;

fn main() {
  println!("{}", tcp::TcpStream::connect("google.ca", 80).err());
}

outputs Some(Network is unreachable).

Here is an strace: https://gist.github.com/apoelstra/ea7f2e8f0cddb14ccfc1

I'll recompile without the patch and post the difference.
Edit: without the patch it succeeds: https://gist.github.com/apoelstra/b505c712acfbe75037ba

@apoelstra
Copy link
Contributor

Here is an isolated test case that demonstrates the issue:

// 16 bytes
struct RawData {
    pub data: [u8, ..16]
}

// 16 bytes
struct StructuredData {
    pub data: u16,
    pub _align: i64,
}

fn addr_to_sockaddr() -> StructuredData {
    let mut storage = StructuredData { data: 0, _align: 0 };
    unsafe {
        // Try to set the fields by directly setting memory
        let raw = &mut storage as *mut _ as *mut RawData;
        for i in range(0, 16) {
            (*raw).data[i] = i as u8;
        }
    }
    // Then return it as a struct which is largely padding
    return storage;
}

fn main() {
    // Get it back
    let addr = addr_to_sockaddr();
    // Convert to raw data
    unsafe {
        let addrp = &addr as *const _ as *const RawData;
        // The padding is now zeroes!
        println!("raw data {}", (*addrp).data.as_slice());
    }
}

This should print the numbers 0 through 15, but instead the numbers 2-7 (which correspond to bytes which were padding in the StructuredData struct) are zeroed out. Before this patch, all 16 bytes were returned successfully.

This is a problem because this sort of cast lunacy is used all over the place in the glibc/BSD sockets code, and libnative wraps this pretty faithfully. This causes the test failures since the sockaddr, sockaddr_in, sockaddr_storage structs are casted and created with this type of code, and once that is fixed (by avoiding casting), there are later failures which cause packet checksums to be wrong.

Edit: replaced the test code to eliminate all the socket-related cruft.

@apoelstra
Copy link
Contributor

Modifying StructuredData to have three more u16's (so that no padding is needed to get the following i64 to be aligned correctly) fixes the problem. So the bug is definitely a change in how struct padding of return values is (not) handled.

@lilyball
Copy link
Contributor

lilyball commented Aug 2, 2014

FWIW the struct probably needs #[repr(C)] if it's depending on the actual layout of the fields.

@sfackler
Copy link
Member

sfackler commented Aug 2, 2014

@kballard I don't think undefined struct layout has been implemented yet.

@lilyball
Copy link
Contributor

lilyball commented Aug 2, 2014

Oh sure, I'm not saying that's the cause here, just pointing out that it needs to be done. The RFC has been accepted so it's technically incorrect today even though the compiler support isn't written yet.

@apoelstra
Copy link
Contributor

Here's a neat observation:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef unsigned char byte;

// 16 bytes
struct RawData {
    byte data[16];
};

// 16 bytes
struct StructuredData {
    short data;
    long _align;
};

struct StructuredData addr_to_sockaddr() {
    struct StructuredData storage = { 0 };

    // Try to set the fields by directly setting memory
    struct RawData *raw = (struct RawData *) &storage;
    memset(raw, 99, 16);

    // Then return it as a struct which is largely padding
    return storage;
}

int main(void) {
    // Get it back
    struct StructuredData addr = addr_to_sockaddr();

    // Convert to raw data
    struct RawData *addrp = (struct RawData *) &addr;

    int i;
    for(i = 0; i < 16; ++i) {
      // The padding is now zeroes!
      printf("%d, ", addrp->data[i]);
    }
    puts("");
}

when compiled with gcc or clang, this exhibits exactly the same "bug". The difference is that it's non-idiomatic to return structures by value in C, so this is not a problem I've ever encountered (even when writing low-level socket code). IIRC for this sort of stuff out-pointers are the usual C idiom.

I noticed this when trying to chase down the cause of the "no-padding copies" and finding myself surrounded by the faceless stares of LLVM intrinsics...so this gives weight to the "change the TCP code" strategy for fixing this :/.

@apoelstra
Copy link
Contributor

An idea: every struct for which we do this kind of thing should have all padding explicitly named, to ensure that LLVM load instructions will copy them over. They should be repr(C) and we should have a new lint repr_c_explicit_padding set on everything(?) in liblibc. This lint would verify that structs marked repr(C) do not have any implicit padding in them.

apoelstra added a commit to apoelstra/rust that referenced this pull request Aug 3, 2014
The `repr_c_implicit_padding` lint checks structures marked
a u8 followed by a u64, which would require 7 bytes of padding
to get the u64 aligned correctly.

The reason for this is that in the BSD socket code, there is
some shifty casting between structs which results in data being
stored in padding bytes. Since the LLVM Load instruction does
not copy padding, this can result in data loss. (This is also a
problem in C, but since structs are rarely copied by value,
instead using pointers and memcpy(), it is not as serious a
problem.) We therefore need explicit padding for the socket
structures, and this lint will make sure we get it.

This is necessary because the recent pull

  rust-lang#16081

which is a fix for rust-lang#15763, changes the return value semantics
to involve a Load. Without padding fixes, this breaks the Tcp
code.
@apoelstra
Copy link
Contributor

Fixed in #16208 --- once we merge that PR this one should pass.

@lilyball
Copy link
Contributor

lilyball commented Aug 4, 2014

@apoelstra Why do we need to be explicit about padding? Actually caring about the contents of the padding seems like a weird exception, rather than the normal behavior for C datatypes.

@apoelstra
Copy link
Contributor

@kballard because within the socket code, struct pointers are casted into each other in such a careless way that data winds up in padding fields, depending which pointer type is used to access a region of memory.

This is definitely a "weird exceptional" case, but sadly I doubt it is unheard of in other pointer-cast-polymorphism C code, hence my solution of adding a lint for it.

bors added a commit that referenced this pull request Aug 12, 2014
@bors bors closed this Aug 12, 2014
@bors bors merged commit 71e19d5 into rust-lang:master Aug 12, 2014
@luqmana luqmana deleted the nr branch August 12, 2014 21:03
@Aatch
Copy link
Contributor

Aatch commented Dec 15, 2014

This change unfortunately caused a regression in stack usage (#19684), due to the significant increase in alloca's for code that wouldn't have encountered the problem this change fixes.

From the looks of it, "nested returns" aren't the problem. The problem is really that we eagerly assume that the return expression we see will actually be executed, which isn't the case.

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Dec 21, 2014
rust-lang#16081 fixed an issue where a nested return statement would cause incorrect behaviour due to the inner return writing over the return stack slot that had already been written too. However, the check was very broad and picked many cases that wouldn't ever be affected by this issue.

As a result, the number of allocas increased dramatically and therefore stack-size increased. LLVM is not able to remove all of the extraneous allocas. Any code that had multiple return values in a compound expression at the end of a function (including loops) would be hit by the issue.

The check now uses a control-flow graph to only consider the case when the inner return is executed conditionally. By itself, this narrowed definition causes rust-lang#15763 to return, so the control-flow graph is also used to avoid passing the return slot as a destination when the result won't be used.

This change allows the stack-size of the main rustc task to be reduced to 8MB from 32MB.
bors added a commit that referenced this pull request Jun 27, 2016
Remove the return_address intrinsic.

This intrinsic to get the return pointer was introduced in #16248 / #16081 by @pcwalton for Servo.
However, as explained in #34227, it's impossible to ensure it's used correctly, and it broke with `-Zorbit`.

Servo's usage is being replaced in servo/servo#11872, and I expect nobody else to have abused it.
But I've also started a crater run, just in case this is a `[breaking-change]` for anyone else.
lnicola pushed a commit to lnicola/rust that referenced this pull request Jan 3, 2024
Don't trim trailing whitespace from doc comments

Don't trim trailing whitespace from doc comments as multiple trailing spaces indicates a hard line break in Markdown.

I'd have liked to add a unit test for `docs_from_attrs`, but couldn't find a reasonable way to get an `&Attrs` object for use in the test.

Fixes rust-lang#15877.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nested return expressions can cause drop to run on incorrect values.
8 participants