-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
start work on a new implementation of TLS #17583
Conversation
cc @aturon, @alexcrichton, @brson |
I do have a working implementation of destructors for recent Linux and OS X platforms, but I've left it out for now because I haven't written a slow fallback implementation or a way to deal with differences between platform versions via #[cfg(target_os = "macos")]
extern {
fn _tlv_atexit(dtor: unsafe extern "C" fn(ptr: *mut c_void), ptr: *mut c_void);
}
#[cfg(target_os = "linux")]
extern {
static mut __dso_handle: i8;
fn __cxa_thread_atexit_impl(dtor: unsafe extern "C" fn(ptr: *mut c_void), ptr: *mut c_void,
dso_symbol: *mut i8);
}
#[thread_local]
pub static mut PTR: *mut $t = 0 as *mut $t;
unsafe extern "C" fn destructor(ptr: *mut c_void) {
::std::ptr::read(ptr as *const $t);
}
#[inline(always)]
#[cfg(not(target_os = "macos"), not(target_os = "linux"))]
unsafe fn register_destructor() {
::std::intrinsics::abort(); // TODO: not yet implemented
}
#[cfg(target_os = "linux")]
unsafe fn register_destructor() {
__cxa_thread_atexit_impl(destructor, PTR as *mut c_void, &mut __dso_handle);
}
#[cfg(target_os = "macos")]
unsafe fn register_destructor() {
_tlv_atexit(destructor, PTR as *mut c_void);
}
#[inline(always)]
fn init() {
unsafe {
if PTR.is_null() {
PTR = ::std::rt::heap::allocate(::std::mem::size_of::<$t>(),
::std::mem::align_of::<$t>()) as *mut $t;
*PTR = $init;
if ::std::intrinsics::needs_drop::<$t>() {
register_destructor();
}
}
}
} The memory allocation is necessary due to lack of support for uninitialized mutable global variables and lack of a way to have global variables for types with destructors. The compiler wouldn't need to actually call the destructor, it would just need to ignore it. I think allowing that with a lint warning about it would make sense. Most types with destructors don't provide a constant initializer anyway. |
I'll take a closer look at this tomorrow when I have some more time, but as a high-level comment I'm a little wary about including bits in the standard distribution which don't really work across all platforms. We don't have much of a precedent for this beyond Another thing that I'm wary of is that the destructors for these values aren't ever run, and it's not super clear to me where they would be run. Opening an issue (#17572) about this is a great way to track it, but it's not immediately clear to me whether the proposed methods in the issue are feasible ways forward. I'd want to investigate scenarios like printing, failing, blocking, etc in destructors of TLS values in order to mark this api as safe. At the last work week we also talked about a TLS implementation which did not take ownership of the value, but rather it was a sort of scope-based API where a value was only inserted into TLS for the scope of the lifetime of the object. That has a convenient side effect of not needing global initialization and destruction, which I found quite convenient! I'm not sure we took fantastic notes, but you can see what we did take. As a final point, the distribution continues to officially support libgreen for the time being, and we want to consider other particular threading models when looking forward into the future. Currently these apis are not safe when used with libgreen, and it's unclear how they can be safe with any threading model other than 1:1 threads. This is largely just a point of whether these apis should be marked Anyway, I'll review in detail tomorrow! |
Android does actually support static TLS and I've verified that C++11 I'll write the slow path for platforms without static TLS support (iOS, old Android versions) after the basics are in-tree and the API has been bikeshedded. There's nothing preventing an implementation of the current API on top of dynamic TLS, but the code is going to depend heavily on the API that's exposed. It's
Types with destructors are not permitted in
That may or may not be useful but it's not a replacement for this feature. Static thread-local storage is a hard requirement of a fast general purpose allocator and other library code reliant on thread caches for performance. A scoped implementation is an entirely different feature.
Setting up registers correctly on context switches and implementing dynamic TLS support is the responsibility of the green threading library. Not setting up the thread-local storage register is not the only problem with how libgreen implements context switches. It's missing support for other registers too but we still enable passes like auto-vectorization and allow work on SIMD. It's entirely possible for it to handle this stuff just like the C standard library does for native threads. To quote RFC 62, which was accepted:
It's no longer the responsibility of the standard library to hold back progress for libgreen. The memory unsafety is caused by fixable implementation issues in that library, not this one. Even if it wasn't possible for these to co-exist (it is), the RFC permits moving forwards regardless. |
#![feature(phase, macro_rules)] | ||
|
||
#[phase(plugin, link)] | ||
extern crate core; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true that absolutely nothing is needed from a system libc to make this crate work? I would have expected LLVM to inject some silent dependencies, but that's pretty awesome if it works standalone!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on a function provided by the linker in a dynamic library. In a static executable there are no external dependencies. Due to undefined behaviour in the standard library, Rust currently doesn't tell LLVM that it's not building a library so the linker call overhead isn't optimized out without -C dynamic-no-pic
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback code for iOS / Android will add a dependency on libc though.
If the intention of this is to be the lowest-level cross-platform implementation of a TLS value, then it may be worth having an unsafe primitive from which to build the other primitives on top of. It may not work out, but we've benefited in the past from stripping away all layers of abstraction (even What was the reasoning behind moving this to a separate crate? Was it because of the macros-in-prelude issues with the standard library? From an API point of view, it would be nice to unify all these macro invocations and rationalize them with the existing // Thread-local static, not too useful because it can't be modified
tls!(static FOO: uint = 3);
// Thread-local unsafe cell, useful because it can be modified!
tls!(static FOO: UnsafeCell<uint> = ...);
// mutable, but still unsafe because borrows can be sent across threads
tls!(static FOO: uint = 3); That's kinda just a sketch, it's more of conceptually what I might expect out of the TLS implementation from the standard distribution. I'm not sure how far along that tangent this can go, but it's also kinda just along the lines of an absolute bare-bones implementation which imposes 0 overhead, but may be unsafe still. From that perhaps we could build up a Just some ideas, I'm curious what you think about them. Also, in general I don't really want to consider the standard distribution to be a ground where "anything goes if it's experimental". We're trying to cut down in scope for a 1.0 release, and most things currently marked as experimental are either "not triaged" or "triaged, and we'd really like to find a replacement, but we just can't remove this". I suspect that a TLS implementation is in the category of "we'd really like to have this", but as long as it's backwards compatible to add it's not super urgent that it land now. |
You may want to reword the PR title to make it clear that this is about Thread Local Storage, not Transport Layer Security. It had me going for a minute. |
Yes, it's to work around not being able to mark macros as experimental. Instead, the crate as a whole can be marked experimental and then deprecated with the functionality moved to the standard library when it's finished. |
It would make sense to factor out the portability shim for iOS / old Android like that. Since I hadn't started on it I didn't bother trying. The details of the portable implementation won't impact the higher-level safe wrappers but it would play into the design of the
Rust is going to need thread-local storage, and developing it in-tree means the tests will be run to prevent regressions and it won't bitrot. It also makes it sane for more than one person to put work into it because it won't be getting rebased over and over. |
If having |
With C++, you also have global initialization and destruction of statics, and I don't believe that we don't expose this for technical reasons, but moreso safety/engineering/etc reasons. I view this as similar to TLS statics in that where it is definitely possible, we may choose to not expose it at this time (to keep the two in sync). |
This begins the work towards a new TLS implementation. It will be useful to have the initial work in-tree for test coverage across various platforms and to allow for collaborative work on the API. The current fast implementation using `#[thread_local]` is expected to work on Linux, Windows (with #17563), OS X, FreeBSD and the current Android version (but not the bot). Coming up with a good API will need to wait until there's a full implementation with destructor support (#17572) and a fallback path (#17579) for platforms like iOS and old versions of Android. A set of 6 macros is probably not the ideal API... but it's an obvious way of covering all of the use cases with minimal overhead. Closes #17569
@alexcrichton: There are no safety or engineering reasons to avoid exposing destructors for TLS. It would make the API crippled relative to C++. It would be unusable for something like a general purpose allocator without a way to clean up the data. I don't see why Rust should settle for a lower quality TLS implementation than C++11. |
Current Android versions and 64-bit iOS (iPhone 5S / iOS7 and later) support TLS so I don't consider the lack of a fallback to be a pressing issue. The only relevant versions of iOS for Rust will be 64-bit, so the only loss would be not working within legacy 32-bit software targeting a 64-bit OS. |
I fixed the one soundness issue involving lack of |
Currently the only supported iOS version is 32 bit and I think at But since 32bit is fading away - I can suggest unsafe option, we had a |
I could implement this API for it, but I have no way to test it because there is no Rust bot. Someone with access to iOS can do it after this lands. In-tree development means people can collaborate. It seems that privilege is reserved for people who work for Mozilla though. |
I can test it on iOS 32 and I can do it out of tree - as I maintain iOS on |
@alexcrichton @aturon Is there anything blocking this, or can it be merged as a first step? |
@eddyb, I'd like to point out again that as part of library stabilization we're trying to cut down on the surface area of the standard distribution. I, however, also believe that everyone is in alignment that we'd like to have a nice TLS implementation. I would also like to point out that there is nothing stopping this from being a cargo package which could be quickly iterated on. The benefit of being a cargo package is also that quicker iteration is allowed (bors isn't exactly speedy), versioning is independent from Rust itself, and other implementations can have more interoperability. The downsides, of course, of being a cargo package are that it is not updated automatically when breaking changes are made and discoverability is currently difficult. The breaking changes part will alleviate as Rust becomes more stable, and the discoverability will alleviate once we have a central Registry (working quite hard on it right now!). I'm not saying that this shouldn't be in the standard distribution, I'd just like to point out that this implementation is not dead in the water if we don't merge it at this time. Specifically speaking, I don't think any of my concerns have been addressed, I'll reiterate them below for clarity. I hope this answers your question about what is blocking this PR.
There are also some broader concerns which I was going to bring up once some of the above concerns were addressed:
Both of these points, while somewhat minor, are important in terms of being integrated with the standard distribution. There are also precisely where a cargo package may also help because arbitrary cargo packages don't have the same restriction requirements as the standard library. As-is, I would be uneasy to merge this mostly because of the inconsistency with existing statics today. This is duplicating exactly how If there are technical problems blocking this goal, then those seem like challenges to overcome rather than to start officially supporting an API we would just wish to change later. Note that this is also precisely where a Cargo package helps because the package could have an entirely separate interface and be slowly deprecated over time if language changes elsewhere enable a better API. That may sound all quite vague, so I would like to give an outline of an api which I think would address my concerns: use std::cell::UnsafeCell;
// Genric structure representing a slot in TLS, its definition changes
// per-platform.
//
// Note that the inner field is private, and this would require some form of
// macro hygiene to allow the macros below to initialize the fields without
// allowing access to them. This is already highly desired for other structures
// like, for example, UnsafeCell, Cell, and RefCell.
pub struct Tls<T> { inner: T }
// If a platform didn't support #[thread_local], the definition would look more
// like:
pub struct Tls<T> {
init: T, // bit pattern to initialize thread statics with
key: AtomicUint, // init 0, lazily initialized
}
// These would be appropriately modified for platfors which didn't support
// #[thread_local] as just normal `static`/`static mut` globals.
macro_rules! tls(
(static $name:ident: $t:ty = $init:expr) => (
#[thread_local]
static $name: Tls<$t> = Tls { inner: $init };
);
(static mut $name:ident: $t:ty = $init:expr) => (
#[thread_local]
static mut $name: Tls<$t> = Tls { inner: $init };
);
)
macro_rules! cell(($e:expr) => (Cell { inner: UnsafeCell { value: $e } }))
macro_rules! refcell(($e:expr) => (...))
impl<T> Tls<T> {
pub fn get(&'static self) -> TlsRef<T> {
// On platforms which don't support #[thread_local], this would perform
// lazy initialization of the corresponding OS-based TLS key.
TlsRef { inner: &self.inner }
}
// This is a safe function, it should be unsafe to get a mutable reference
// in the first place.
pub fn get_mut(&'static mut self) -> TlsRefMut<T> {
TlsRefMut { inner: &self.inner }
}
}
pub struct TlsRef<T> { inner: &'static T }
pub struct TlsRefMut<T> { inner: &'static mut T }
impl<T> Deref<T> for TlsRef<T> {
fn deref<'a>(&'a self) -> &'a T { self.inner }
}
impl<T> Deref<T> for TlsRefMut<T> {
fn deref<'a>(&'a self) -> &'a T { &*self.inner }
}
impl<T> DerefMut<T> for TlsRefMut<T> {
fn deref_mut<'a>(&'a mut self) -> &'a mut T { &mut *self.inner }
}
// Note that this is stored in a `static`, which normally requires `Sync`. The
// compiler would understand that a `#[thread_local]` static does not require
// `Sync`.
tls!(static FOO: Cell<uint> = cell!(1));
tls!(static BAR: RefCell<int> = refcell!(2));
fn main() {
let foo = FOO.get();
assert_eq!(foo.get(), 1);
foo.set(1);
let mut bar = BAR.get();
assert_eq!(2, *bar.borrow());
*bar.borrow_mut() = 3;
} This addresses these concerns:
There are a number of tweaks which would have to be made to the compiler to make this work, however:
And, of course, there are a number of cons to this solution:
In general I would like to reiterate that no one doesn't want a good TLS interface. The practice which we've been encouraging for new functionality in the standard library is for it to be developed out of tree, and then migrate it in-tree if possible. This helps us promote faster iteration of libraries such as this along with reducing the surface area of the standard library that needs to be stable for 1.0 (which is quite soon!). Also, @thestinger, after re-reading many of the comments in this thread I've noticed that you've added to them in many cases. Would you be ok pinging the issue when you update a comment? Github doesn't send out any notifications, so I won't know to check back on this thread if you update a past comment. I was, for example, completely unaware that you pasted bits and pieces about construction/destruction. |
@thestinger docs and support for all supported platforms seem like blockers. Any reason not to iterate on this as a Cargo package? You mentioned mandatory compiler support. Can you say more about that? Maybe you can get what you need on that front without having to do all the design and iteration on master. |
@wycats: The current TLS implementation in the repository needs to be replaced or removed. It's used throughout the implementation despite being incredibly slow and bloated. This does cover all officially supported platforms already - it works on Windows, OS X, Linux, and Android. It does not work on 32-bit iOS but that has never been officially supported / tested. |
@thestinger did you test it on Linux 2.6.18 and glibc 2.5 (the oldest supported platforms)? |
@thestinger also, it would be really helpful (at least to me) if you responded to @alexcrichton's specific questions. He did a pretty thorough review and your reply is pretty dismissive of what seems like a good-faith effort to have a discussion about your patch. |
@wycats: Can you please quote this reply you're calling "dismissive"? |
@thestinger maybe "dismissive" means something stronger to you than I meant. What I meant was that @alexcrichton wrote a long, thoughtful and detailed review of your patch, and your response reiterated what everyone already knows: that the current implementation is not very good and needs to be replaced. I would honestly really appreciate responses to his specific review. |
@wycats: You haven't given me time to respond. I opened this thread and was going to respond to @alexcrichton until you started throwing accusations around. |
Sadly our auto/try bots are not CentOS 5.10 (the old linux we test on), just the snapshot builders are. I have manually verified, however, that @thestinger I know we've had trouble with I'd also like to point out that |
@thestinger apologies if something I said looked accusatory. I was just making an observation that @alexcrichton wrote a fairly long review of the patch, and your reply to his comments seemed curt. It sounds like you didn't mean it that way, though! Looking forward to a fuller response. Like you, I would like to see a better TLS implementation in |
I also just tested this out on the android toolchain that I had lying around, it's the same smoke test of thread local
The key part of the error I think being:
I suspect the sub-question in in route 1 will give us more information and will help guide this decision. @thestinger, would you be ok figuring out what the minimum version of android is to implement TLS in this fashion? |
@wycats: I didn't reply to his comments, I was replying to you. |
It's far smaller than the surface area of the existing TLS implementation and will be able to replace it. It needs support for destructors to replace it and that's a bit tricky to do correctly - the existing implementation does it unsoundly. It deserves to be landed separately, and then there will be platform-specific optimizations to leverage features like |
TLS is used within the standard libraries (task-local RNG) and compiler. A replacement of the old TLS implementation needs to be done in-tree. |
It's already in the standard distribution as |
I did address your concerns. Perhaps you weren't satisfied with the replies, but I'm not convinced that these are serious / blocking issues. |
There isn't a proposed alternative with the same flexibility and performance. Using macros makes it possible to sanely support destructors and implement a fallback path. It's an |
I guess you weren't satisfied with my answer? As I pointed out, this is not incorrect with the current implementation of the compiler. It would not be incorrect if fields were reordered based on size and aligned. It's a workaround that's used throughout the standard libraries already and I think it would be unreasonable to change the compiler implementation in a way that would prevent working around issues like the inability to statically initialize a |
The ultimate solution is a fully working |
There is already a TLS implementation in-tree and it needs to be replaced. It would be better to stabilize this API than the one we already have that's unsound and incredibly slow. However, this is explicitly marked as |
TLS needs support for dynamic initialization and destruction to be usable. It is completely unrelated to those features in globals because the implementation and semantics are much different. There are no safety or ordering issues with it. It is required to replace the current TLS implementation which already supports those features. Why are my pull requests singled out for the application of all of these strict double standards? |
TLS is already a feature that's in-tree and replacing the implementation was accepted in an RFC. I went ahead with a simplistic initial implementation to iterate on in-tree because I had already put substantial thought and effort into it. The surface area claim doesn't make any sense because it's going to replace a far larger unsound TLS implementation. The API is marked Most possible improvements are blocked on fixing compiler bugs and compiler / language limitations such as the lack of a way to make an uninitialized global variable, inability to store a type with a destructor directly in a |
I identified the problem on Windows and fixed it in 6bb648f. The problem was just that the MinGW-w64 linker's implementation of ASLR is thoroughly broken. It seems to work fine on 32-bit too... |
@alexcrichton: I don't know which version of Android started having working TLS. I do know that the |
@alexcrichton: |
Closing for now. I've filed #18004 about this and will submit the same changes in a different order than I planned. |
I'd love to be able to replace the current
I agree! We may be able to get by without implementing destructor support, however, by using the scoped-approach that we laid out in the work week. The "unsafe building block" could serve as the foundation for that strategy of TLS. The compiler is essentially entirely scoped, and the other major use case I know about is the task-local RNG which could probably be implemented specially.
That's not good! Can you open a bug on this?
Can you explain to me why you didn't think my example would provide either the flexibility or performance?
I didn't really see "it's not a worry" as an answer to the fact that struct layout is undefined. We are reserving the right to reorder struct fields at will, not only for performance but perhaps randomly for security. Today a
Can you explain a little more about how a bare-bones layer over Could you also explain how the example I gave was less efficient than the module-expansions you proposed here? I was under the impression that it's just a few pointers lying around and would optimize away completely.
I'd like to note that while I personally do not feel that
Remember though that I'm not proposing an end-all be-all interface, this was just the bare bones interface to the rest of the TLS subsystem, it's likely that other abstractions, such as those with dynamic initialization/destruction are built on top.
I do agree that to be a drop-in replacement it needs to support these features, but I think we may be able to get by with some unsafe code perhaps in the meantime and scoped TLS (which doesn't require dynamics). Long term, we definitely need to support this though!
I'm sorry you feel this way, and I apologize if any of my comments have led you to believe this. It is certainly not what I intended to convey. I may be a little more nitpickity at reviews, but I do not at all intend to single you out and apply a double standard.
Currently almost everything we provide in the standard distribution, apart from
Could you elaborate some more on these issues? Currently this code compiles: static FOO: Option<Box<int>> = None;
fn main() {}
I think Saturday's nightly (the one I tested) had that commit. Have you tested recently on 32-bit?
Ok, I'm also fine if you want to leave this open! I wanted to talk more with others about this today and get their thoughts on this approach. |
@alexcrichton: I'll start with an |
Hello, I was curious about the state of rust on Android and I stumbled across this thread. Since D was mentioned earlier, I've been working on getting D on Android, hence my interest. :) @thestinger, I don't believe Android supports native TLS for apps even today. They're still storing the pointers for bionic's pthread TLS implementation in native TLS, so that's going to write over any data that apps put in there. It is true that purely static TLS without any relocations will sometimes randomly work, which is why it might have worked for you when doing small tests with C++, but when I went over 4K in TLS data on Android/x86 it started to break. That's because they did not take native TLS support for C/C++ out of their lightly patched gcc/clang/binutils/gold native toolchain in the NDK, but it's unsupported by bionic and the dynamic linker so native TLS won't work beyond such random corner cases. I had to wrap |
This begins the work towards a new TLS implementation. It will be useful
to have the initial work in-tree for test coverage across various
platforms and to allow for collaborative work on the API. The current
fast implementation using
#[thread_local]
is expected to work onLinux, Windows (with #17563), OS X, FreeBSD and the current Android
version (but not the bot).
Coming up with a good API will need to wait until there's a full
implementation with destructor support (#17572) and a fallback path
(#17579) for platforms like iOS and old versions of Android. A set of 6
macros is probably not the ideal API... but it's an obvious way of
covering all of the use cases with minimal overhead.
Closes #17569