-
-
Notifications
You must be signed in to change notification settings - Fork 747
Get rid of static this for initializing std{in,out,err} #5421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
std/stdio.d
Outdated
| .stderr._p = &stderrImpl; | ||
| static __gshared File.Impl impl; | ||
| static __gshared File result; | ||
| static shared bool lilyWasHere; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lilyWasHere
What?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahem... not that it's bad to have fun, but I don't think such names should be present in a standard library. We already have a few words in there like "Stuff", that are really non-descript. Or messages worded in terms of "Dunno how to do ", which someone will eventually find difficult to understand. This one is completely internal and not supposed to be looked at all that much, but a person who eventually does have to look is also most likely to go "What???"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing wrong with having a bit of fun, but no problem.
| return true; | ||
| }()); | ||
| return result; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear on why this appears uncovered. It is covered in unittests that assign to stdout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unittest just defines a main method, it doesn't actually execute any code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another unittest that assigns to stdout.
|
This seems more complex than necessary. Just make stdin a pointer to a function that returns a ref. Statically initialize it to a function that initializes stdin, and rewrites the function pointer to point to a function that returns a ref without checking initialization. |
|
@WalterBright that won't work for e.g. functions that take a |
|
Updated |
|
I'm guessing the performance implication of this is likely negligible? Tot that it really matters, but I wonder if an indirect function call is going to be less expensive than an atomic load... |
|
Been thinking of the pointer to function approach. In summary: it can be made to work and in all likelihood it is marginally more efficient, but it's more complicated than the code in this PR. An arbitrary number of threads may enter the "initialize" pointer to function simultaneously. So inside the function that does the initialization code must exist that does essentially what Then, for each call in the steady state, we have an atomic load of the pointer to function and an indirect call. On x86 the atomic load is free. |
std/stdio.d
Outdated
| result._p = &impl; | ||
| return true; | ||
| }()); | ||
| if (!initialized) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems redundant, initOnce does effectively the same thing internally. If this is to help with inlining the atomicLoad, aren't we better off fixing the compiler instead of doing this "triple checked singleton"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scratch that, it's still better for platforms where atomicLoad won't be cheap.
std/stdio.d
Outdated
| } | ||
|
|
||
| extern(C) void std_stdio_static_this() | ||
| // Undocummented but public because std* handles are aliasing it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmmmmmmh, typos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
std/stdio.d
Outdated
| static shared bool initialized; | ||
| if (!initialized) | ||
| { | ||
| // Use double-checking because initOnce is inefficient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain how adding a third thread-global flag that is set in exactly the same way is more efficient? (I might be missing something obvious.)
Also, depending on what the semantics for shared in D are, using a plain if might strictly speaking not be well-defined – it would be a data race in C++ and hence undefined behaviour. This is why we need to define a memory model! (I don't think LLVM or GCC LTO would currently cause problems, though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not third, it's second, the same it was before.
I tripped over that at first too. initOnce() internally uses the double-checking pattern with atomicLoad, which may or may not be more expensive than a plain load depending on platform. But 'true' can never be observed on this new extra check before initOnce() has completed, so in effect this simply short-circuits a potentially more expensive load. It may come at a cost of a few extra calls to initOnce() though, depending on how memory works on a particular platform.
If it was allowed for initialized to flip from true to false during runtime, that new if would indeed have been a mistake.
IIRC, TDPL initially proposed that reads and writes from/to shared should be both atomic and fenced. That last part is a very heavy restriction though, and, among other things, shows exactly why we need a DIP to make shared actually work.
The reason why it was found "slow" on x86 is due to dmd's poor inlining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem I see here is that initialized is perfectly race-free. But the setting of the handle is a side effect. So is it possible one thread reads true while the handle has not yet been set?
I'm somewhat lost with the way this is done. You are protecting a boolean with a complicated set of calls using atomics and a mutex, and I don't fully understand why they are all there. The result value seems unprotected, as it's never loaded or stored atomically, whereas the booleans explicitly are inside initOnce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tripped over that at first too.
So far, I don't think I tripped over anything. ;)
initOnce() internally uses the double-checking pattern with atomicLoad, which may or may not be more expensive than a plain load depending on platform. But 'true' can never be observed on this new extra check before initOnce() has completed, so in effect this simply short-circuits a potentially more expensive load.
What makes you think that if this optimization is valid, initOnce can't employ it all the same? It's specifically designed for this use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initOnce can't optimize it. It has to take the (potentially) fenced approach in order to make sure it does indeed store the initializer only once, and that may be slower on non-x86. It may be called multiple times, but only once should it actually store. But there is no reason to call it at all, and thus incur a penalty, if true is already observed.
@schveiguy, initializer is stored under a mutex, which should imply a fence. Sigh... "should". @klickverbot is right, we need a formalized model.
I'm somewhat lost with the way this is done. You are protecting a boolean with a complicated set of calls using atomics and a mutex, and I don't fully understand why they are all there. The result value seems unprotected, as it's never loaded or stored atomically, whereas the booleans explicitly are inside initOnce.
initOnce employs a double-checked locking pattern. Which is check - lock - check again. Actual store, both of the initializer and the flag, is performed only under a lock, and only if the second check agrees with the first one. This prevents second thread from writing an initializer a second time, if it blocked on a mutex after the first check but before another thread has finished storing. Storing itself does not have to be atomic, so long as the observability upon releasing the lock is guaranteed.
Why this was slow in Andrei's measurements was because dmd does a poor job inlining the first check. So he added an unencumbered read to circumvent that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a look at the code again, it seems I am wrong. It still stores the value (in this case the external boolean initialized) non-atomically. So I think it's fine the way this is done (albeit very convoluted). initOnce needs an overload that doesn't assume the initialization is done by assignment (as in this case, it cannot be).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klickverbot I think the optimization is valid, because initialized will never go from true to false. A similar technique could be employed inside initOnce, but I think it still doesn't help because of the way initialization is assumed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The short version:
But there is no reason to call it at all, and thus incur a penalty, if true is already observed.
Careful with the word observed. Non-atomic reads – for the sake of the argument, let's use the C++11 memory model and consider the read from initialized relaxed instead of plainly being a data race – don't come with any sequential consistency guarantees, and so the execution order constraints you can infer from not taking the if are minimal.
The longer, hopefully more enlightening version in very much of a Socratic spirit (not that there would much of a basis for discussion if we don't assume, say, C++11 semantics for D):
initOnce can't optimize it.
What about literally just moving the if inside initOnce?
It has to take the (potentially) fenced approach in order to make sure it does indeed store the initializer only once
That's already guaranteed by the check that makes the "doubled" part of the double-checked locking pattern, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not about storing the initializer atomically at all. It's about observing both it's value and the flag value after they've been stored.
It is exactly the same technique inside initOnce already, only initOnce can't count on plain load, it has to do an acquire (which, on some platforms, requires a memory fence).
As for overloads, it also needs a @nogc and nothrow one. The latter is a bug with lazy, for the former we'd need to allocate the mutex without the GC. I'd propose ditching the lazy parameter in favor of template callable, but I fear that ship has sailed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Careful with the word observed. Non-atomic reads – for the sake of the argument, let's use the C++11 memory model and consider the read from initialized relaxed instead of plainly being a data race – don't come with any sequential consistency guarantees, and so the execution order constraints you can infer from not taking the if are minimal.
Reads from shared integrals should be atomic, according to TDPL. They should not interleave with writes. I.e. there should not be a data race there.
The longer, hopefully more enlightening version in very much of a Socratic spirit (not that there would much of a basis for discussion if we don't assume, say, C++11 semantics for D):
initOnce can't optimize it.What about literally just moving the if inside initOnce?
We can't. It's of arbitrary type, might not even be convertible to bool, or true might not actually be the expected value.
It has to take the (potentially) fenced approach in order to make sure it does indeed store the initializer only onceThat's already guaranteed by the check that makes the "doubled" part of the double-checked locking pattern, isn't it?
You're looking at the wrong boolean :) initOnce has its own flag, which it uses for its' double-checked locking. Variable initialized just happens to be a boolean.
So, what happens is:
- thread reads (atomically, but perhaps out of order)
initialized, not using any fences. - it may read
falseincorrectly, and consequently callinitOnce. But it will never readtrueincorrectly. - if it read
falseand calledinitOnce,initOncehas to make sure it doesn't read it's own flag out of order w.r.t. writes.
So Andrei's check looks redundant w.r.t. to initOnce, but it isn't, as it's guaranteed cheap.
|
Thanks for all reviews. May I suggest a means to improve the efficiency of reviews. The PR is an engineering solution in the sense that it uses the tools available (in this case:
What happens sometimes is that discussion on the first topic (appropriate for the PR) get into the second. At best, we would enhance collaboration by helping the PR creator instead of assigning him/her the second task in addition to the first one. Right now we are in the position where a couple of collective hours have already been spent in debating what would be correct and what wouldn't. The same time could be spent infinitely more productively by simply writing a better tool (e.g. So, @klickverbot you clearly have the know-how and the enthusiasm in the topic. Could you please implement |
|
(I'll note that @radcapricorn would be just as indicated for the task but if I asked "could @klickverbot or @radcapricorn ..." each would assume the other would do it and there'd be no progress.) |
|
Thanks :) I would've asked for another candidate in any case, I already have a few PRs which I want to see through into Phobos or rejected first. |
6ef3b5a to
ee31837
Compare
|
I committed an alternative implementation that does not use |
I'm all for that – this is actually my main frustration with contributing to the core language.
It isn't, it is racy. (See below for assumptions.)
It's already there (
Disagree. I'd claim my time is much better spent reminding people of the finer points in topics I'm knowledgeable about than doing all the work myself, so that they'll get it right themselves next time around. In this particular case, the tool you need already exists and works, unless one breaks it by trying to be clever. ;) To meaningfully discuss the correctness of multi-threaded programs, we need to have execution semantics that allow for multiple threads of execution. D currently has no formal memory model, but given that the good old "hope that the compiler doesn't transform the code too much" approach works just fine with DMD (helped by the almost-sequentially-consistent guarantees x86 provides) and both GDC/LDC use C++ semantics, let's just consider the C++11 memory model. We really need to formalise this for D to have a claim to be a "no space below" systems programming language, but that's a topic for another day. Furthermore, let's not worry about the fact that data races are undefined behaviour in C++11 (and a compiler detecting this statically could apply arbitrary transformations), and just concentrate on the fact whether unintended outcomes can be observed by considering all accesses to The problem with the proposed change is that loading |
Just saw it, will have a look. As far as I can see, |
std/stdio.d
Outdated
| { | ||
| impl.handle = handle; | ||
| result._p = &impl; | ||
| atomicOp!"+="(initialized, uint.max / 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one will not be correct even on x86, it would require a memory fence. I'm for the initOnce version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying there is a bug in atomicOp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. impl.handle, result._p and initialized are different memory locations. A further load from result._p may get reordered w.r.t. store to impl.handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@radcapricorn as @klickverbot said, that's a topic for another day. For now let's simplify our lives by using the same assumptions that initOnce makes: https://github.com/dlang/phobos/blob/master/std/concurrency.d#L2408
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@klickverbot The fly in the ointment is |
LDC as well as, from what I understand, GDC currently treat
I haven't really looked at the Java situation in any detail, but I believe |
|
@klickverbot
This one should not apply. According to TDPL,
Yes it is in the initOnce version, because both stores are protected by a memory barrier (incurred by the mutex). It can load |
|
@klickverbot could you please please kindly please write five lines of code that clarify what you say? Essentially could you please paste the code as you think LDC would find it correct? Thanks! |
Unless something has changed very recently, in spite of what TDPL says, |
|
@jmdavis compiler barrier, not memory. The part about |
|
@radcapricorn: We might want to take this elsewhere to keep the PR on-topic. It boils down to compiler barriers not being sufficient to guarantee sequential consistency of the memory effects. Of course, it will still be fine on x86 as MOVs for loads/stores are implicitly acquire/release.
I didn't have a chance to properly think about your alternative code yet, but it seems correct at first glance. |
|
@andralex done. Yeah, that old address should be dead. |
dnadlinger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be okay now if we are fine with the added complexity.
|
thx! |
MartinNowak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use initOnce for that!
|
This function was added about a year ago to simplify lazy initialization, no need to roll your own spinlock either @andralex. |
|
@MartinNowak the initial version used initOnce indeed. It marked a performance regression for using std* handles. |
I'll have a look at http://forum.dlang.org/post/og4d2d$fkc$1@digitalmars.com, doesn't seem like much investigation happened. Meanwhile this PR broke unit-threaded Issue 17472 – [Reg 2.075] typeof(stdin) is no longer a File. |
|
@MartinNowak: Agreed, as mentioned above I don't think the inline spinlock thing is worth it (my green checkmark thing concerned the actual implementation, which was broken before). Not too sure if/how we want to tackle the type issue. As far as I can see, we'd need to keep the static constructor (but maybe could offer an explicit initialization function for runtime-less builds, if this is where @andralex et al. are aiming. |
|
Generally we should offer implementations the freedom to swap properties for variables. Granted, that's not present in the specification so yes this PR is liable to break code that queries the type of |
| .stderr._p = &stderrImpl; | ||
| } | ||
| static __gshared File.Impl impl; | ||
| static __gshared File result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The static is redundant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will edit the file in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only you didn't make it a struct S {}
S foo() { return S(); }
@property S bar() { return S(); }
pragma(msg, typeof(foo));
pragma(msg, typeof(bar)); |
Performance of initOnce isn't that well because a common Mutex is passed as default argument, but this one is also created lazily. Combined with dmd's incapable inliner, this makes 2 function calls, and 2 atomic loads (in non-inlineable asm block functions). |
|
@MartinNowak thanks for the PR. @mdparker interested in a blog on "The One Use Case of |
|
Yeah, I'll add it to my list for after my |
Are you sure this is a good idea? I mean... https://wiki.dlang.org/Language_issues#Properties |
|
I think |
|
Perhaps I missed this in the discussion about, but this P.R. breaks code that takes the address of |
|
Is there a reason why |
|
@John-Colvin missed that. Could you please create a short PR? Thanks! |
In case someone is wondering whether that's a regression - it's luckily not. import std.stdio;
void main() @safe {
stderr.writeln("Hello World");
} |
|
Yes, and until that issue is addressed I'm strongly against marking that function trusted. I'll try to take a look at that issue again today; maybe it's easier to fix now, but it may still turn out that File needs major restructuring for shared. |
Convert awkward globals to functions that return ref File and lazily initialize themselves. This should rid us of a bunch of troubles linked to circular dependencies, betterC etc.