Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start work on a new implementation of TLS #17583

Closed
wants to merge 1 commit into from
Closed

start work on a new implementation of TLS #17583

wants to merge 1 commit into from

Conversation

thestinger
Copy link
Contributor

This begins the work towards a new TLS implementation. It will be useful
to have the initial work in-tree for test coverage across various
platforms and to allow for collaborative work on the API. The current
fast implementation using #[thread_local] is expected to work on
Linux, Windows (with #17563), OS X, FreeBSD and the current Android
version (but not the bot).

Coming up with a good API will need to wait until there's a full
implementation with destructor support (#17572) and a fallback path
(#17579) for platforms like iOS and old versions of Android. A set of 6
macros is probably not the ideal API... but it's an obvious way of
covering all of the use cases with minimal overhead.

Closes #17569

@rust-highfive
Copy link
Collaborator

warning Warning warning

  • These commits modify unsafe code. Please review it carefully!

@thestinger
Copy link
Contributor Author

cc @aturon, @alexcrichton, @brson

@thestinger
Copy link
Contributor Author

I do have a working implementation of destructors for recent Linux and OS X platforms, but I've left it out for now because I haven't written a slow fallback implementation or a way to deal with differences between platform versions via cfg. It looks like this at the moment:

            #[cfg(target_os = "macos")]
            extern {
                fn _tlv_atexit(dtor: unsafe extern "C" fn(ptr: *mut c_void), ptr: *mut c_void);
            }

            #[cfg(target_os = "linux")]
            extern {
                static mut __dso_handle: i8;
                fn __cxa_thread_atexit_impl(dtor: unsafe extern "C" fn(ptr: *mut c_void), ptr: *mut c_void,
                                            dso_symbol: *mut i8);
            }

            #[thread_local]
            pub static mut PTR: *mut $t = 0 as *mut $t;

            unsafe extern "C" fn destructor(ptr: *mut c_void) {
                ::std::ptr::read(ptr as *const $t);
            }

            #[inline(always)]
            #[cfg(not(target_os = "macos"), not(target_os = "linux"))]
            unsafe fn register_destructor() {
                ::std::intrinsics::abort(); // TODO: not yet implemented
            }

            #[cfg(target_os = "linux")]
            unsafe fn register_destructor() {
                __cxa_thread_atexit_impl(destructor, PTR as *mut c_void, &mut __dso_handle);
            }

            #[cfg(target_os = "macos")]
            unsafe fn register_destructor() {
                _tlv_atexit(destructor, PTR as *mut c_void);
            }

            #[inline(always)]
            fn init() {
                unsafe {
                    if PTR.is_null() {
                        PTR = ::std::rt::heap::allocate(::std::mem::size_of::<$t>(),
                                                          ::std::mem::align_of::<$t>()) as *mut $t;
                        *PTR = $init;
                        if ::std::intrinsics::needs_drop::<$t>() {
                            register_destructor();
                        }
                    }
                }
            }

The memory allocation is necessary due to lack of support for uninitialized mutable global variables and lack of a way to have global variables for types with destructors. The compiler wouldn't need to actually call the destructor, it would just need to ignore it. I think allowing that with a lint warning about it would make sense. Most types with destructors don't provide a constant initializer anyway.

@pcwalton
Copy link
Contributor

r? @alexcrichton

@alexcrichton
Copy link
Member

I'll take a closer look at this tomorrow when I have some more time, but as a high-level comment I'm a little wary about including bits in the standard distribution which don't really work across all platforms. We don't have much of a precedent for this beyond liblibc and some internal details of libnative, and I'd personally like to see the fallback implementation for platforms like android fleshed out before landing (but can certainly discuss this beforehand).

Another thing that I'm wary of is that the destructors for these values aren't ever run, and it's not super clear to me where they would be run. Opening an issue (#17572) about this is a great way to track it, but it's not immediately clear to me whether the proposed methods in the issue are feasible ways forward. I'd want to investigate scenarios like printing, failing, blocking, etc in destructors of TLS values in order to mark this api as safe.

At the last work week we also talked about a TLS implementation which did not take ownership of the value, but rather it was a sort of scope-based API where a value was only inserted into TLS for the scope of the lifetime of the object. That has a convenient side effect of not needing global initialization and destruction, which I found quite convenient! I'm not sure we took fantastic notes, but you can see what we did take.

As a final point, the distribution continues to officially support libgreen for the time being, and we want to consider other particular threading models when looking forward into the future. Currently these apis are not safe when used with libgreen, and it's unclear how they can be safe with any threading model other than 1:1 threads. This is largely just a point of whether these apis should be marked unsafe or not, but it is certainly something to consider.

Anyway, I'll review in detail tomorrow!

@thestinger
Copy link
Contributor Author

I'll take a closer look at this tomorrow when I have some more time, but as a high-level comment I'm a little wary about including bits in the standard distribution which don't really work across all platforms. We don't have much of a precedent for this beyond liblibc and some internal details of libnative, and I'd personally like to see the fallback implementation for platforms like android fleshed out before landing (but can certainly discuss this beforehand).

Android does actually support static TLS and I've verified that C++11 thread_local works on the tablet I have access to (Android 4.4.4). 64-bit iOS supports it too, so the problem is going to fade away. Since it's part of C11 and C++11, it can be expected that all future hosted platforms provide it.

I'll write the slow path for platforms without static TLS support (iOS, old Android versions) after the basics are in-tree and the API has been bikeshedded. There's nothing preventing an implementation of the current API on top of dynamic TLS, but the code is going to depend heavily on the API that's exposed. It's #[experimental] code so there is no backwards compatibility risk. There's no point in writing a bunch of low-level code for niche platforms that's likely going to need a complete rewrite. It will hold back improvements to the API.

Another thing that I'm wary of is that the destructors for these values aren't ever run, and it's not super clear to me where they would be run. Opening an issue (#17572) about this is a great way to track it, but it's not immediately clear to me whether the proposed methods in the issue are feasible ways forward. I'd want to investigate scenarios like printing, failing, blocking, etc in destructors of TLS values in order to mark this api as safe.

Types with destructors are not permitted in static mut variables so the API is entirely safe. C++11 implements static destruction for global and thread local variables in the compiler. Rust's compiler does not, so it would need to be done with separate macros using a dynamic allocation to hold the data and making use of dynamic TLS or lighter platform specific mechanisms (mentioned in the issue). I will implement destructor support in a follow-up pull request and it won't be hard to demonstrate that it is safe. As the issue states, support for TLS variables with destructors isn't implemented. It is not broken, although Rust's old local_data API is broken - it doesn't have memory safe destruction and there are other holes.

At the last work week we also talked about a TLS implementation which did not take ownership of the value, but rather it was a sort of scope-based API where a value was only inserted into TLS for the scope of the lifetime of the object. That has a convenient side effect of not needing global initialization and destruction, which I found quite convenient! I'm not sure we took fantastic notes, but you can see what we did take.

That may or may not be useful but it's not a replacement for this feature. Static thread-local storage is a hard requirement of a fast general purpose allocator and other library code reliant on thread caches for performance. A scoped implementation is an entirely different feature.

As a final point, the distribution continues to officially support libgreen for the time being, and we want to consider other particular threading models when looking forward into the future. Currently these apis are not safe when used with libgreen, and it's unclear how they can be safe with any threading model other than 1:1 threads. This is largely just a point of whether these apis should be marked unsafe or not, but it is certainly something to consider.

Setting up registers correctly on context switches and implementing dynamic TLS support is the responsibility of the green threading library. Not setting up the thread-local storage register is not the only problem with how libgreen implements context switches. It's missing support for other registers too but we still enable passes like auto-vectorization and allow work on SIMD. It's entirely possible for it to handle this stuff just like the C standard library does for native threads. To quote RFC 62, which was accepted:

Finally, a guiding principle for the above work is uncompromising support for native system APIs, in terms of both functionality and performance. For example, it must be possible to use thread-local storage without significant overhead, which is very much not the case today. Any abstractions to support M:N threading models -- including the now-external libgreen package -- must respect this constraint.

It's no longer the responsibility of the standard library to hold back progress for libgreen. The memory unsafety is caused by fixable implementation issues in that library, not this one. Even if it wasn't possible for these to co-exist (it is), the RFC permits moving forwards regardless.

#![feature(phase, macro_rules)]

#[phase(plugin, link)]
extern crate core;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that absolutely nothing is needed from a system libc to make this crate work? I would have expected LLVM to inject some silent dependencies, but that's pretty awesome if it works standalone!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on a function provided by the linker in a dynamic library. In a static executable there are no external dependencies. Due to undefined behaviour in the standard library, Rust currently doesn't tell LLVM that it's not building a library so the linker call overhead isn't optimized out without -C dynamic-no-pic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback code for iOS / Android will add a dependency on libc though.

@alexcrichton
Copy link
Member

If the intention of this is to be the lowest-level cross-platform implementation of a TLS value, then it may be worth having an unsafe primitive from which to build the other primitives on top of. It may not work out, but we've benefited in the past from stripping away all layers of abstraction (even Cell and RefCell) and exposing the underlying unsafe internals for manipulation. It may also help reduce duplication between these macros, provide an easy entrypoint for a non-#[thread_local] implementation, etc.

What was the reasoning behind moving this to a separate crate? Was it because of the macros-in-prelude issues with the standard library?

From an API point of view, it would be nice to unify all these macro invocations and rationalize them with the existing statics. I would personally think that declaring a TLS static would be very similar to the declaration of a normal static (or static mut). Considering rust-lang/rfcs#246:

// Thread-local static, not too useful because it can't be modified
tls!(static FOO: uint = 3);

// Thread-local unsafe cell, useful because it can be modified!
tls!(static FOO: UnsafeCell<uint> = ...);

// mutable, but still unsafe because borrows can be sent across threads
tls!(static FOO: uint = 3);

That's kinda just a sketch, it's more of conceptually what I might expect out of the TLS implementation from the standard distribution. I'm not sure how far along that tangent this can go, but it's also kinda just along the lines of an absolute bare-bones implementation which imposes 0 overhead, but may be unsafe still. From that perhaps we could build up a tls_cell/tls_refcell macro?

Just some ideas, I'm curious what you think about them. Also, in general I don't really want to consider the standard distribution to be a ground where "anything goes if it's experimental". We're trying to cut down in scope for a 1.0 release, and most things currently marked as experimental are either "not triaged" or "triaged, and we'd really like to find a replacement, but we just can't remove this". I suspect that a TLS implementation is in the category of "we'd really like to have this", but as long as it's backwards compatible to add it's not super urgent that it land now.

@Valloric
Copy link
Contributor

You may want to reword the PR title to make it clear that this is about Thread Local Storage, not Transport Layer Security. It had me going for a minute.

@thestinger
Copy link
Contributor Author

What was the reasoning behind moving this to a separate crate? Was it because of the macros-in-prelude issues with the standard library?

Yes, it's to work around not being able to mark macros as experimental. Instead, the crate as a whole can be marked experimental and then deprecated with the functionality moved to the standard library when it's finished.

@thestinger
Copy link
Contributor Author

That's kinda just a sketch, it's more of conceptually what I might expect out of the TLS implementation from the standard distribution. I'm not sure how far along that tangent this can go, but it's also kinda just along the lines of an absolute bare-bones implementation which imposes 0 overhead, but may be unsafe still. From that perhaps we could build up a tls_cell/tls_refcell macro?

It would make sense to factor out the portability shim for iOS / old Android like that. Since I hadn't started on it I didn't bother trying. The details of the portable implementation won't impact the higher-level safe wrappers but it would play into the design of the unsafe API.

Just some ideas, I'm curious what you think about them. Also, in general I don't really want to consider the standard distribution to be a ground where "anything goes if it's experimental". We're trying to cut down in scope for a 1.0 release, and most things currently marked as experimental are either "not triaged" or "triaged, and we'd really like to find a replacement, but we just can't remove this". I suspect that a TLS implementation is in the category of "we'd really like to have this", but as long as it's backwards compatible to add it's not super urgent that it land now.

Rust is going to need thread-local storage, and developing it in-tree means the tests will be run to prevent regressions and it won't bitrot. It also makes it sane for more than one person to put work into it because it won't be getting rebased over and over.

@thestinger
Copy link
Contributor Author

From an API point of view, it would be nice to unify all these macro invocations and rationalize them with the existing statics. I would personally think that declaring a TLS static would be very similar to the declaration of a normal static (or static mut). Considering rust-lang/rfcs#246:

If having static X: RefCell<TypeWithDestructor> where you are allowed to store a type with a destructor is actually possible, then it could work. It can't be implemented that way right now though. The low-level shim will need to support implementing thread-local destructors on top, and ideally with minimal overhead like C++11 (no indirection via raw pointers).

@alexcrichton
Copy link
Member

With C++, you also have global initialization and destruction of statics, and I don't believe that we don't expose this for technical reasons, but moreso safety/engineering/etc reasons. I view this as similar to TLS statics in that where it is definitely possible, we may choose to not expose it at this time (to keep the two in sync).

This begins the work towards a new TLS implementation. It will be useful
to have the initial work in-tree for test coverage across various
platforms and to allow for collaborative work on the API. The current
fast implementation using `#[thread_local]` is expected to work on
Linux, Windows (with #17563), OS X, FreeBSD and the current Android
version (but not the bot).

Coming up with a good API will need to wait until there's a full
implementation with destructor support (#17572) and a fallback path
(#17579) for platforms like iOS and old versions of Android. A set of 6
macros is probably not the ideal API... but it's an obvious way of
covering all of the use cases with minimal overhead.

Closes #17569
@thestinger
Copy link
Contributor Author

@alexcrichton: There are no safety or engineering reasons to avoid exposing destructors for TLS. It would make the API crippled relative to C++. It would be unusable for something like a general purpose allocator without a way to clean up the data. I don't see why Rust should settle for a lower quality TLS implementation than C++11.

@thestinger
Copy link
Contributor Author

Current Android versions and 64-bit iOS (iPhone 5S / iOS7 and later) support TLS so I don't consider the lack of a fallback to be a pressing issue. The only relevant versions of iOS for Rust will be 64-bit, so the only loss would be not working within legacy 32-bit software targeting a 64-bit OS.

@thestinger
Copy link
Contributor Author

I fixed the one soundness issue involving lack of unsafe hygiene. It's already a useful, safe API and I plan on continuing to expand / improve it once it's in-tree. Any further work on it is blocked on having an assurance that it won't be rejected or it being in-tree so I can iterate via new pull requests. I don't enjoy waiting weeks to progress anywhere on any non-trivial work.

@vhbit
Copy link
Contributor

vhbit commented Oct 8, 2014

Currently the only supported iOS version is 32 bit and I think at
least till the next year it MUST be supported as 4S is still widely
used

But since 32bit is fading away - I can suggest unsafe option, we had a
discussion about it with Alex before merging - there is a faster TLS,
undocumented method extracted from libc sources, definitely unsafe,
but considering timeframe of 32bit OS - it might be worth trying.

@thestinger
Copy link
Contributor Author

I could implement this API for it, but I have no way to test it because there is no Rust bot. Someone with access to iOS can do it after this lands. In-tree development means people can collaborate. It seems that privilege is reserved for people who work for Mozilla though.

@vhbit
Copy link
Contributor

vhbit commented Oct 8, 2014

I can test it on iOS 32 and I can do it out of tree - as I maintain iOS on
working state for internal projects and build it anyway.

@eddyb
Copy link
Member

eddyb commented Oct 11, 2014

@alexcrichton @aturon Is there anything blocking this, or can it be merged as a first step?

@alexcrichton
Copy link
Member

@eddyb, I'd like to point out again that as part of library stabilization we're trying to cut down on the surface area of the standard distribution. I, however, also believe that everyone is in alignment that we'd like to have a nice TLS implementation.

I would also like to point out that there is nothing stopping this from being a cargo package which could be quickly iterated on. The benefit of being a cargo package is also that quicker iteration is allowed (bors isn't exactly speedy), versioning is independent from Rust itself, and other implementations can have more interoperability. The downsides, of course, of being a cargo package are that it is not updated automatically when breaking changes are made and discoverability is currently difficult. The breaking changes part will alleviate as Rust becomes more stable, and the discoverability will alleviate once we have a central Registry (working quite hard on it right now!).

I'm not saying that this shouldn't be in the standard distribution, I'd just like to point out that this implementation is not dead in the water if we don't merge it at this time.


Specifically speaking, I don't think any of my concerns have been addressed, I'll reiterate them below for clarity. I hope this answers your question about what is blocking this PR.

  • Leveraging macros for generating the entire implementation

    In general I'm not a huge fan of having a huge amount of code generated by macros, it's got downsides like being difficult to document, exposing implementation details, various hygiene issues, surprising access patterns, lack of macro import/export, etc.

  • Incorrect casting

    Due to the compiler being able to re-order struct fields, I don't think that this is a valid cast.

  • Inconsistency with existing statics today

    From an API point of view, it would be nice to unify all these macro invocations and rationalize them with the existing statics.

There are also some broader concerns which I was going to bring up once some of the above concerns were addressed:

  • There is no documentation as part of this PR
  • This is currently not cross-platform to all of Rust's supported platforms

Both of these points, while somewhat minor, are important in terms of being integrated with the standard distribution. There are also precisely where a cargo package may also help because arbitrary cargo packages don't have the same restriction requirements as the standard library.


As-is, I would be uneasy to merge this mostly because of the inconsistency with existing statics today. This is duplicating exactly how Cell and RefCell work, but in statics. In the past when we've had duplicate functionality we have always found the need to consolidate it (such as MutexArc and Arc, for example). This seems like a pretty clear location where the storage (TLS) should be separated from the type itself (Cell/RefCell).

If there are technical problems blocking this goal, then those seem like challenges to overcome rather than to start officially supporting an API we would just wish to change later. Note that this is also precisely where a Cargo package helps because the package could have an entirely separate interface and be slowly deprecated over time if language changes elsewhere enable a better API.

That may sound all quite vague, so I would like to give an outline of an api which I think would address my concerns:

use std::cell::UnsafeCell;

// Genric structure representing a slot in TLS, its definition changes
// per-platform.
//
// Note that the inner field is private, and this would require some form of
// macro hygiene to allow the macros below to initialize the fields without
// allowing access to them. This is already highly desired for other structures
// like, for example, UnsafeCell, Cell, and RefCell.
pub struct Tls<T> { inner: T }

// If a platform didn't support #[thread_local], the definition would look more
// like:
pub struct Tls<T> {
    init: T,         // bit pattern to initialize thread statics with
    key: AtomicUint, // init 0, lazily initialized
}

// These would be appropriately modified for platfors which didn't support
// #[thread_local] as just normal `static`/`static mut` globals.
macro_rules! tls(
    (static $name:ident: $t:ty = $init:expr) => (
        #[thread_local]
        static $name: Tls<$t> = Tls { inner: $init };
    );
    (static mut $name:ident: $t:ty = $init:expr) => (
        #[thread_local]
        static mut $name: Tls<$t> = Tls { inner: $init };
    );
)
macro_rules! cell(($e:expr) => (Cell { inner: UnsafeCell { value: $e } }))
macro_rules! refcell(($e:expr) => (...))

impl<T> Tls<T> {
    pub fn get(&'static self) -> TlsRef<T> {
        // On platforms which don't support #[thread_local], this would perform
        // lazy initialization of the corresponding OS-based TLS key.
        TlsRef { inner: &self.inner }
    }

    // This is a safe function, it should be unsafe to get a mutable reference
    // in the first place.
    pub fn get_mut(&'static mut self) -> TlsRefMut<T> {
        TlsRefMut { inner: &self.inner }
    }
}

pub struct TlsRef<T> { inner: &'static T }
pub struct TlsRefMut<T> { inner: &'static mut T }

impl<T> Deref<T> for TlsRef<T> {
    fn deref<'a>(&'a self) -> &'a T { self.inner }
}
impl<T> Deref<T> for TlsRefMut<T> {
    fn deref<'a>(&'a self) -> &'a T { &*self.inner }
}
impl<T> DerefMut<T> for TlsRefMut<T> {
    fn deref_mut<'a>(&'a mut self) -> &'a mut T { &mut *self.inner }
}

// Note that this is stored in a `static`, which normally requires `Sync`. The
// compiler would understand that a `#[thread_local]` static does not require
// `Sync`.
tls!(static FOO: Cell<uint> = cell!(1));
tls!(static BAR: RefCell<int> = refcell!(2));

fn main() {
    let foo = FOO.get();
    assert_eq!(foo.get(), 1);
    foo.set(1);

    let mut bar = BAR.get();
    assert_eq!(2, *bar.borrow());
    *bar.borrow_mut() = 3;
}

This addresses these concerns:

  • The interface is easily documentable. The documentation is placed on the Tls struct as well as the get method, and then there are examples showing how the Tls struct may only be constructed via the macro. Also, bounds can be placed on T to ensure that it's Copy or doesn't have destructors, etc.
  • There is no longer a macro generating a wad of code in a different namespace. All expressions are evaluated in the namespace they're declared in.
  • This is composable with existing types. For example you can have a tls UnsafeCell, or you can have a totally unsafe static mut with tls. You're not forced into Cell or RefCell if you don't want to. The purpose of Tls is to provide a bare bones abstraction over the platform's best TLS solution, be that #[thread_local], pthreads, or TlsGetValue.
  • This continues (as does the current design) to provide a way to fall back to an OS-based TLS implementation.
  • The current story for stability of macros at 1.0 is somewhat in flux, and a type/struct-based solution (instead of a 100% macro-based solution) is more likely to be available than the macro-based version.

There are a number of tweaks which would have to be made to the compiler to make this work, however:

  • Privacy hygiene with respect to static initialization would have to be implemented. This is wanted regardless for UnsafeCell<T>, and it would provide a nice way to provide static constructors of generic items. For example there could be a static constructor of Mutex<T> through a macro. In general this is solve much better through CTFE (or something similar), but we may be able to settle for a macro-based solution for now.
  • Safe access to a non-Sync #[thread_local] static would have to be allowed, this is fairly trivial to do.

And, of course, there are a number of cons to this solution:

  • This does not allow dynamic initialization or construction. Note, however, that this is consistent with the rest of Rust globals today. It is certainly an extension that can be added in the future, but I think we should add it to both static and thread local globals at the same time. Singling one out as special seems odd.

In general I would like to reiterate that no one doesn't want a good TLS interface. The practice which we've been encouraging for new functionality in the standard library is for it to be developed out of tree, and then migrate it in-tree if possible. This helps us promote faster iteration of libraries such as this along with reducing the surface area of the standard library that needs to be stable for 1.0 (which is quite soon!).


Also, @thestinger, after re-reading many of the comments in this thread I've noticed that you've added to them in many cases. Would you be ok pinging the issue when you update a comment? Github doesn't send out any notifications, so I won't know to check back on this thread if you update a past comment. I was, for example, completely unaware that you pasted bits and pieces about construction/destruction.

@wycats
Copy link
Contributor

wycats commented Oct 11, 2014

@thestinger docs and support for all supported platforms seem like blockers. Any reason not to iterate on this as a Cargo package?

You mentioned mandatory compiler support. Can you say more about that? Maybe you can get what you need on that front without having to do all the design and iteration on master.

@thestinger
Copy link
Contributor Author

@wycats: The current TLS implementation in the repository needs to be replaced or removed. It's used throughout the implementation despite being incredibly slow and bloated. This does cover all officially supported platforms already - it works on Windows, OS X, Linux, and Android. It does not work on 32-bit iOS but that has never been officially supported / tested.

@wycats
Copy link
Contributor

wycats commented Oct 11, 2014

@thestinger did you test it on Linux 2.6.18 and glibc 2.5 (the oldest supported platforms)?

@wycats
Copy link
Contributor

wycats commented Oct 11, 2014

@thestinger also, it would be really helpful (at least to me) if you responded to @alexcrichton's specific questions. He did a pretty thorough review and your reply is pretty dismissive of what seems like a good-faith effort to have a discussion about your patch.

@thestinger
Copy link
Contributor Author

@wycats: Can you please quote this reply you're calling "dismissive"?

@wycats
Copy link
Contributor

wycats commented Oct 11, 2014

@thestinger maybe "dismissive" means something stronger to you than I meant.

What I meant was that @alexcrichton wrote a long, thoughtful and detailed review of your patch, and your response reiterated what everyone already knows: that the current implementation is not very good and needs to be replaced.

I would honestly really appreciate responses to his specific review.

@thestinger
Copy link
Contributor Author

@wycats: You haven't given me time to respond. I opened this thread and was going to respond to @alexcrichton until you started throwing accusations around.

@alexcrichton
Copy link
Member

Sadly our auto/try bots are not CentOS 5.10 (the old linux we test on), just the snapshot builders are. I have manually verified, however, that #[thread_local] at least passes a smoke test on CentOS 5.10 on both 32/64 bit. That should address @wycats's concern about 2.6.18

@thestinger I know we've had trouble with #[thread_local] in the past on windows, so I tested this locally with the most recently night. A smoke test (thread local static mut, just prints it), ran fine on 64-bit windows, but it segfaulted on 32-bit windows. Did you verify both architectures? If so, perhaps my install of msys/msys2 is broken? If we need to tweak our mingw/msys requirements that's probably fine, but we just need to be prepared to do so ahead of time.

I'd also like to point out that std::local_data is #[experimental] today, and it's highly unlikely that that will ever change (other than to become #[deprecated]). The continued usage of it is purely historical, and one of the current plans about the release channel of rust is to make it impossible to reach #[experimental] code, which would make it impossible to use std::local_data in stable code. Note, however, that these plans are developing over time.

@wycats
Copy link
Contributor

wycats commented Oct 11, 2014

@thestinger apologies if something I said looked accusatory. I was just making an observation that @alexcrichton wrote a fairly long review of the patch, and your reply to his comments seemed curt. It sounds like you didn't mean it that way, though!

Looking forward to a fuller response. Like you, I would like to see a better TLS implementation in libstd.

@alexcrichton
Copy link
Member

I also just tested this out on the android toolchain that I had lying around, it's the same smoke test of thread local static mut followed by a print, and I got this error:

$ ./x86_64-unknown-linux-gnu/stage2/bin/rustc foo.rs --target arm-linux-androideabi -C linker=$HOME/code/android/ndk_standalone/bin/arm-linux-androideabi-gcc 
error: linking with `/home/alex/code/android/ndk_standalone/bin/arm-linux-androideabi-gcc` failed: exit code: 1
note: /home/alex/code/android/ndk_standalone/bin/arm-linux-androideabi-gcc '-marm' '-L' '/home/alex/code/rust2/x86_64-unknown-linux-gnu/stage2/lib/rustlib/arm-linux-androideabi/lib' '-o' 'foo' 'foo.o' '-Wl,--whole-archive' '-lmorestack' '-Wl,--no-whole-archive' '-nodefaultlibs' '-fno-lto' '-Wl,--gc-sections' '-Wl,--allow-multiple-definition' '/home/alex/code/rust2/x86_64-unknown-linux-gnu/stage2/lib/rustlib/arm-linux-androideabi/lib/libnative-4e7c5e5c.rlib' '/home/alex/code/rust2/x86_64-unknown-linux-gnu/stage2/lib/rustlib/arm-linux-androideabi/lib/libstd-4e7c5e5c.rlib' '/home/alex/code/rust2/x86_64-unknown-linux-gnu/stage2/lib/rustlib/arm-linux-androideabi/lib/libsync-4e7c5e5c.rlib' '/home/alex/code/rust2/x86_64-unknown-linux-gnu/stage2/lib/rustlib/arm-linux-androideabi/lib/librustrt-4e7c5e5c.rlib' '/home/alex/code/rust2/x86_64-unknown-linux-gnu/stage2/lib/rustlib/arm-linux-androideabi/lib/libcollections-4e7c5e5c.rlib' '/home/alex/code/rust2/x86_64-unknown-linux-gnu/stage2/lib/rustlib/arm-linux-androideabi/lib/librand-4e7c5e5c.rlib' '/home/alex/code/rust2/x86_64-unknown-linux-gnu/stage2/lib/rustlib/arm-linux-androideabi/lib/liballoc-4e7c5e5c.rlib' '/home/alex/code/rust2/x86_64-unknown-linux-gnu/stage2/lib/rustlib/arm-linux-androideabi/lib/libunicode-4e7c5e5c.rlib' '/home/alex/code/rust2/x86_64-unknown-linux-gnu/stage2/lib/rustlib/arm-linux-androideabi/lib/liblibc-4e7c5e5c.rlib' '/home/alex/code/rust2/x86_64-unknown-linux-gnu/stage2/lib/rustlib/arm-linux-androideabi/lib/libcore-4e7c5e5c.rlib' '-L' '/home/alex/code/rust2/.rust' '-L' '/home/alex/code/rust2' '-Wl,--whole-archive' '-Wl,-Bstatic' '-Wl,--no-whole-archive' '-Wl,-Bdynamic' '-ldl' '-llog' '-lgcc' '-lc' '-lm' '-lcompiler-rt'
note: /home/alex/code/android/ndk_standalone/bin/../lib/gcc/arm-linux-androideabi/4.6/../../../../arm-linux-androideabi/bin/ld: foo.o: in function main::h74429ed16fd18446iaa:foo.0.rs(.text._ZN4main20h74429ed16fd18446iaaE+0x64): error: undefined reference to '__tls_get_addr'
collect2: ld returned 1 exit status

error: aborting due to previous error

The key part of the error I think being: undefined reference to '__tls_get_addr'. This seems quite similar to #10842! We could take two courses of action here:

  1. Raise the minimum required android version. This would require figuring out precisely what version of android enabled #[thread_local] globals and making the decision from there (I know very little about android).
  2. Implement the fallback path

I suspect the sub-question in in route 1 will give us more information and will help guide this decision. @thestinger, would you be ok figuring out what the minimum version of android is to implement TLS in this fashion?

@thestinger
Copy link
Contributor Author

@wycats: I didn't reply to his comments, I was replying to you.

@thestinger
Copy link
Contributor Author

I'd like to point out again that as part of library stabilization we're trying to cut down on the surface area of the standard distribution. I, however, also believe that everyone is in alignment that we'd like to have a nice TLS implementation.

It's far smaller than the surface area of the existing TLS implementation and will be able to replace it. It needs support for destructors to replace it and that's a bit tricky to do correctly - the existing implementation does it unsoundly. It deserves to be landed separately, and then there will be platform-specific optimizations to leverage features like _tlv_atexit on OS X that were added for C++11 support. This will certainly require compiler support because it's not currently possible to detect these features.

@thestinger
Copy link
Contributor Author

I would also like to point out that there is nothing stopping this from being a cargo package which could be quickly iterated on. The benefit of being a cargo package is also that quicker iteration is allowed (bors isn't exactly speedy), versioning is independent from Rust itself, and other implementations can have more interoperability. The downsides, of course, of being a cargo package are that it is not updated automatically when breaking changes are made and discoverability is currently difficult. The breaking changes part will alleviate as Rust becomes more stable, and the discoverability will alleviate once we have a central Registry (working quite hard on it right now!).

TLS is used within the standard libraries (task-local RNG) and compiler. A replacement of the old TLS implementation needs to be done in-tree.

@thestinger
Copy link
Contributor Author

I'm not saying that this shouldn't be in the standard distribution, I'd just like to point out that this implementation is not dead in the water if we don't merge it at this time.

It's already in the standard distribution as local_data_key!, local_ptr and more. It's a replacement of the old implementation as part of removing the dynamic runtime and fixing unsoundness / performance issues, not a new feature.

@thestinger
Copy link
Contributor Author

Specifically speaking, I don't think any of my concerns have been addressed, I'll reiterate them below for clarity. I hope this answers your question about what is blocking this PR.

I did address your concerns. Perhaps you weren't satisfied with the replies, but I'm not convinced that these are serious / blocking issues.

@thestinger
Copy link
Contributor Author

Leveraging macros for generating the entire implementation

There isn't a proposed alternative with the same flexibility and performance. Using macros makes it possible to sanely support destructors and implement a fallback path. It's an #[experimental] API meant to be a step towards replacing the old TLS implementation, not a finalized / stable API based on lessons learned during implementation and usage.

@thestinger
Copy link
Contributor Author

Incorrect casting

I guess you weren't satisfied with my answer? As I pointed out, this is not incorrect with the current implementation of the compiler. It would not be incorrect if fields were reordered based on size and aligned. It's a workaround that's used throughout the standard libraries already and I think it would be unreasonable to change the compiler implementation in a way that would prevent working around issues like the inability to statically initialize a RefCell.

@thestinger
Copy link
Contributor Author

Inconsistency with existing statics today

The ultimate solution is a fully working #[thread_local]. It's provided by C, C++, D and other systems languages that Rust is competing with. It's a stopgap until that is possible (if it ever is) and could be reimplemented on top of it. You've proposed possible improvements but you need to take the destructor handling and other issues like efficiency into account. It's not possible to do things like creating an uninitialized global variable or statically initializing types from other modules with private fields so workarounds are necessary.

@thestinger
Copy link
Contributor Author

If there are technical problems blocking this goal, then those seem like challenges to overcome rather than to start officially supporting an API we would just wish to change later. Note that this is also precisely where a Cargo package helps because the package could have an entirely separate interface and be slowly deprecated over time if language changes elsewhere enable a better API.

There is already a TLS implementation in-tree and it needs to be replaced. It would be better to stabilize this API than the one we already have that's unsound and incredibly slow. However, this is explicitly marked as #[experimental] and I have stated that it is not the finalized API but rather a base to build on in the subsequent pull requests I already have lined up. I have destructor support and other improvements ready but it's too difficult to land it all at once.

@thestinger
Copy link
Contributor Author

This does not allow dynamic initialization or construction. Note, however, that this is consistent with the rest of Rust globals today. It is certainly an extension that can be added in the future, but I think we should add it to both static and thread local globals at the same time. Singling one out as special seems odd.

TLS needs support for dynamic initialization and destruction to be usable. It is completely unrelated to those features in globals because the implementation and semantics are much different. There are no safety or ordering issues with it. It is required to replace the current TLS implementation which already supports those features. Why are my pull requests singled out for the application of all of these strict double standards?

@thestinger
Copy link
Contributor Author

In general I would like to reiterate that no one doesn't want a good TLS interface. The practice which we've been encouraging for new functionality in the standard library is for it to be developed out of tree, and then migrate it in-tree if possible. This helps us promote faster iteration of libraries such as this along with reducing the surface area of the standard library that needs to be stable for 1.0 (which is quite soon!).

TLS is already a feature that's in-tree and replacing the implementation was accepted in an RFC. I went ahead with a simplistic initial implementation to iterate on in-tree because I had already put substantial thought and effort into it. The surface area claim doesn't make any sense because it's going to replace a far larger unsound TLS implementation.

The API is marked #[experimental] and I've explicitly stated that there are improvements I want to make to this after it lands. It's the base I am going to build on in the follow-up commits I had lined up to submit after this landed (destructor support, portability to old Android).

Most possible improvements are blocked on fixing compiler bugs and compiler / language limitations such as the lack of a way to make an uninitialized global variable, inability to store a type with a destructor directly in a static mut (calling the destructor is unnecessary, just allowing Option<T> initialized to None where T has a destructor) and various other issues.

@thestinger
Copy link
Contributor Author

@thestinger I know we've had trouble with #[thread_local] in the past on windows, so I tested this locally with the most recently night. A smoke test (thread local static mut, just prints it), ran fine on 64-bit windows, but it segfaulted on 32-bit windows. Did you verify both architectures? If so, perhaps my install of msys/msys2 is broken? If we need to tweak our mingw/msys requirements that's probably fine, but we just need to be prepared to do so ahead of time.

I identified the problem on Windows and fixed it in 6bb648f. The problem was just that the MinGW-w64 linker's implementation of ASLR is thoroughly broken. It seems to work fine on 32-bit too...

@thestinger
Copy link
Contributor Author

@alexcrichton: I don't know which version of Android started having working TLS. I do know that the __tls_get_addr problem would be solved for TLS in executables and static libraries by telling LLVM when we are building an executable. It is not currently possible because the existing TLS implementation and libgreen depend on undefined behaviour that's broken by the optimizations LLVM uses when it's informed of that. If the existing TLS implementation was removed, my pull request changing that could land and you wouldn't get that error.

@thestinger
Copy link
Contributor Author

@alexcrichton: __tls_get_addr is now limited dynamically linked Rust libraries. We could easily provide an implementation on top of POSIX TLS in that case for old versions of Android and legacy 32-bit iOS. AFAIK, #[thread_local] with static linking will now work even on old versions of Android.

@thestinger thestinger closed this Oct 13, 2014
@thestinger thestinger reopened this Oct 13, 2014
@thestinger
Copy link
Contributor Author

Closing for now. I've filed #18004 about this and will submit the same changes in a different order than I planned.

@thestinger thestinger closed this Oct 13, 2014
@thestinger thestinger deleted the tls branch October 13, 2014 14:32
@alexcrichton
Copy link
Member

It's far smaller than the surface area of the existing TLS implementation and will be able to replace it.

I'd love to be able to replace the current std::local_data! I wouldn't exactly classify the API as large though, it's largely just a macro and two functions (replace/get).

It needs support for destructors to replace it and that's a bit tricky to do correctly.

I agree! We may be able to get by without implementing destructor support, however, by using the scoped-approach that we laid out in the work week. The "unsafe building block" could serve as the foundation for that strategy of TLS. The compiler is essentially entirely scoped, and the other major use case I know about is the task-local RNG which could probably be implemented specially.

the existing implementation does it unsoundly

That's not good! Can you open a bug on this?

There isn't a proposed alternative with the same flexibility and performance.

Can you explain to me why you didn't think my example would provide either the flexibility or performance?

I guess you weren't satisfied with my answer? As I pointed out, this is not incorrect with the current implementation of the compiler.

I didn't really see "it's not a worry" as an answer to the fact that struct layout is undefined. We are reserving the right to reorder struct fields at will, not only for performance but perhaps randomly for security. Today a Vec is also three words, but we're also not encouraging transmuting from a Vec to three words and back. It was tough for me to see how we could not rely the field orderings of the two structs being the same, so I was worried about coming to rely on this fact and being unable to pursue various optimizations in the future with struct field orderings.

You've proposed possible improvements but you need to take the destructor handling and other issues like efficiency into account

Can you explain a little more about how a bare-bones layer over #[thread_local] and os-based TLS wouldn't provide the ability to support constructors/destructors in the future? I would expect this bare-bones layer to be present in basically any cross-platform implementation, so putting it at the base of the stack seems natural to me.

Could you also explain how the example I gave was less efficient than the module-expansions you proposed here? I was under the impression that it's just a few pointers lying around and would optimize away completely.

However, this is explicitly marked as #[experimental] and I have stated that it is not the finalized API but rather a base to build on in the subsequent pull requests I already have lined up

I'd like to note that while I personally do not feel that #[experimental] is the threshold for accepting code to std, others may feel differently. I'm just expressing my own personal opinion, I don't speak for everyone as a whole.

TLS needs support for dynamic initialization and destruction to be usable.

Remember though that I'm not proposing an end-all be-all interface, this was just the bare bones interface to the rest of the TLS subsystem, it's likely that other abstractions, such as those with dynamic initialization/destruction are built on top.

It is required to replace the current TLS implementation which already supports those features.

I do agree that to be a drop-in replacement it needs to support these features, but I think we may be able to get by with some unsafe code perhaps in the meantime and scoped TLS (which doesn't require dynamics). Long term, we definitely need to support this though!

Why are my pull requests singled out for the application of all of these strict double standards?

I'm sorry you feel this way, and I apologize if any of my comments have led you to believe this. It is certainly not what I intended to convey. I may be a little more nitpickity at reviews, but I do not at all intend to single you out and apply a double standard.

It's the base I am going to build on in the follow-up commits I had lined up to submit after this landed (destructor support, portability to old Android).

Currently almost everything we provide in the standard distribution, apart from liblibc, is cross-platform, so I would consider the portability fallback to be required for a first-pass implementation.

inability to store a type with a destructor directly in a static mut (calling the destructor is unnecessary, just allowing Option initialized to None where T has a destructor) and various other issues.

Could you elaborate some more on these issues? Currently this code compiles:

static FOO: Option<Box<int>> = None;

fn main() {}

I identified the problem on Windows and fixed it in 6bb648f. The problem was just that the MinGW-w64 linker's implementation of ASLR is thoroughly broken. It seems to work fine on 32-bit too...

I think Saturday's nightly (the one I tested) had that commit. Have you tested recently on 32-bit?

Closing for now. I've filed #18004 about this and will submit the same changes in a different order than I planned.

Ok, I'm also fine if you want to leave this open! I wanted to talk more with others about this today and get their thoughts on this approach.

@thestinger
Copy link
Contributor Author

@alexcrichton: I'll start with an unsafe API providing a portable fallback for #[thread_local].

@joakim-noah
Copy link

Android does actually support static TLS and I've verified that C++11 thread_local works on the tablet I have access to (Android 4.4.4).

Hello, I was curious about the state of rust on Android and I stumbled across this thread. Since D was mentioned earlier, I've been working on getting D on Android, hence my interest. :)

@thestinger, I don't believe Android supports native TLS for apps even today. They're still storing the pointers for bionic's pthread TLS implementation in native TLS, so that's going to write over any data that apps put in there. It is true that purely static TLS without any relocations will sometimes randomly work, which is why it might have worked for you when doing small tests with C++, but when I went over 4K in TLS data on Android/x86 it started to break. That's because they did not take native TLS support for C/C++ out of their lightly patched gcc/clang/binutils/gold native toolchain in the NDK, but it's unsupported by bionic and the dynamic linker so native TLS won't work beyond such random corner cases. I had to wrap pthread_getspecific/setspecific to get TLS to work for D, sounds like you have a similar fallback in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add a new implementation of thread-local storage
9 participants