Skip to content

Conversation

@thewilsonator
Copy link
Contributor

so that methods that are shared can be called using unshared objects. This is fine because any extra synchronisation done as a result of being shared will not cause race conditions, see https://forum.dlang.org/thread/mailman.4068.1538360998.29801.digitalmars-d@puremagic.com

This probably doesn't catch all edge cases of this change, hence WIP. Also test cases etc.

so that methods that are shared can be called using unshared objects. This is fine because any extra synchronisation done as a result of being shared will not cause race conditions, see https://forum.dlang.org/thread/mailman.4068.1538360998.29801.digitalmars-d@puremagic.com
@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @thewilsonator! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + dmd#8782"

@dlang-bot dlang-bot added the Review:WIP Work In Progress - not ready for review or pulling label Oct 1, 2018
@jmdavis
Copy link
Member

jmdavis commented Oct 1, 2018

I can see the argument, but at the same time, if this type was designed to be used as shared why do you have a thread-local instance of it anywhere? It makes no sense. I'm very much afraid that if we start allowing any kind of implicit conversions here, we're risking subtle bugs because of a corner case we haven't considered, and I don't see why it's even useful. For a type to work a shared, it either must be designed that way (in which case, it has shared member functions and the thread-safety is managed internally - and it really doesn't make sense to have thread-local instances of it), or the type is not designed at all to work as shared, and the thread-safety must be managed externally either with atomics or with mutexes being used and shared being temporarily cast away while they're locked. In neither case, do you have a thread-local object that you end up passing to a function that takes a shared object - let alone a member function. As far as I can tell, if the code is doing that, it implies that it's doing something seriously wrong. What's the use case here that actually makes sense and would occur in real code?

@thewilsonator
Copy link
Contributor Author

What's the use case here that actually makes sense and would occur in real code?

@TurkeyMan ?

@TurkeyMan
Copy link
Contributor

Literally any thread-safe thing I write doesn't work single-threaded...

I have a pattern where a scheduler manages jobs that specify the data they interact with, and in what way they interact; mutable, thread-safe, read-only.
Concurrent scheduling of jobs can be optimised with respect to access patterns.
I have some data source that is designed to be thread-safe, such that jobs that interact with the data can be scheduled concurrently; as such, many methods are shared.
There are some methods of the object that are not thread-safe though, and that requires acquiring an exclusive (ie, mutable) lease on the object. The scheduler knows to serialise access... but the problem is, you receive a mutable pointer, and you can't call the methods that are shared... which is kinda dumb.
Since most of the object's functionality is implemented as shared functions, it becomes very inconvenient to wrap everything up in mutable overloads.

@TurkeyMan
Copy link
Contributor

@jmdavis The 2 cases you describe are not this case. you describe shared -> shared (intended to be shared, written as such), or shared -> mutable (not-thread-safe; manually acquire a lock and case shared away)... mutable -> shared is safe, and very useful!

It's unlikely that such an object exists that it is "shared and only shared". First; almost all objects begin life unique and initialise before being distributed. Initialisation is probably not thread-safe, so init methods aren't shared. Likewise, implementing thread-safe functionality is expensive, and in my experience, only a subset of functionality should be shared. Those functions that are not shared are like you say; require you to acquire a lock (or enforce synchronisation in some way; in my case, the scheduler knows how to do it) such that you have an exclusive lease, and then you're good.

@jmdavis
Copy link
Member

jmdavis commented Oct 1, 2018

Thinking about this further, I can point out exactly why implicitly converting to shared doesn't work, and and that's because while access to the shared object may be protected by a mutex, the thread-local reference has no such protection. So, in the general case, implicitly converting a thread-local object to shared actually breaks the guarantees that the compiler is supposed to be able to give for thread-local variables and is allowed to assume when it does stuff like optimizations (especially when the function that the object was handed off to isn't pure and therefore could have done who-knows-what with the shared reference). It's quite possible that there are specific functions where there wouldn't actually be a problem in practice, but the compiler needs to be able to guarantee it. It's like with std.concurrency. It can't implicitly convert from thread-local to shared to pass across threads, because D's type system has no real concept of ownership. So, we're forced to cast to immutable and cast back to pass across threads (not shared because of issues with Variant's implementation). Having it work with implicit conversions would be a serious hole in the type system which could easily result in objects being on multiple threads but being typed as thread-local on each thread. It would be fantastic to fix that, but without some sort of ownership semantics in D's type system, I don't know how that would be possible, and an ownership system would likely be far too onerous. So, we're stuck with the annoying casting.

@thewilsonator
Copy link
Contributor Author

Right, but two things to note:

Implicit conversion to const shared will work. (this is rusts model)

I realised I have neglected to mention that this is supposed to be for methods (i.e. this is the type undergoing the implicit conversion) not for parameters (where anything could happen if the method is not pure). As such it should be fine provided the method is also scope.

@TurkeyMan
Copy link
Contributor

TurkeyMan commented Oct 1, 2018

@jmdavis Right, you've identified the key gotcha... the reference should be scope.
The promoted now-shared pointer can't escape.

Implicit conversion to const shared will work. (this is rusts model)

I don't think that's true... while a non-shared reference exists, no set of shared instances should exist, otherwise the thread-local instance might do something not-threadsafe even while the others are read-only-ing.

I think scope shared is the only acceptable promotion. It can't become shared via a promotion to a shared method.

@thewilsonator
Copy link
Contributor Author

Actually it should be fine for parameters as long as they are scope, because the function call will be synchronous w.r.t the current thread and scope guarantees that while the reference may be mutated as a result of the call, the reference will not outlive the call and so all is good.

@TurkeyMan
Copy link
Contributor

the reference will not outlive the call and so all is good.

Right. So I think it's the case that this probably hasn't been revisited since scope happened (and kinda is-still-happening)

@thewilsonator
Copy link
Contributor Author

I don't think that's true... while a non-shared reference exists, no set of shared instances should exist, otherwise the thread-local instance might do something not-threadsafe even while the others are read-only-ing.

If those shared references never modify the data (which const guarantees) and they access it with atomics (n.b they cannot hold a mutex) then all is well, because there is only one owner who can change the object.

@jmdavis
Copy link
Member

jmdavis commented Oct 1, 2018

I agree with Manu about const shared. It couldn't screw with the thread-local instance, but it's still not okay, because you still need a mutex or atomics to deal with a const shared object. The fact that it's const just means that you can't mutate it, not that you don't need to worry about protecting access to it, and the thread-local reference isn't going to be doing anything with mutexes. So, it only stops the problems in one direction. It can work in the case that atomics are used, but there's no guarantee that the code was guaranteed to use atomics and not mutexes, so I don't see how the type system can assume that it's safe to do the implicit conversion.

scope shared is probably fine, though DIP 1000 completely breaks my brain, so I don't know what you actually end up with in practice. The key thing is that when the function returns, it must be guaranteed that there are no shared references to muck with the thread-local reference. As long as the type system can guarantee that the conversion to shared is temporary and that no references escape, then I don't think that it would break the type system.

@TurkeyMan
Copy link
Contributor

and they access it with atomics

That's a really big and un-provable 'if'. I don't think that's reasonable.
Also, atomics aren't a silver bullet; if 'valid state' is defined by a conjunction of 2 values, atomic access to each one of them can't help you, because you might access in between the 2 atomic mutations.
This is detail for a method that advertises 'shared' to guarantee.

I think it's a strong specification in D that a thing is either thread-local, or is is shared. I am happy with that design decision; it makes implementation of shared methods possible within reason, and also will improve the perf of shared methods, since they don't have to account for the possibility of a non-shared doing random mutations.

What is upsetting is that a thread-local thing can't call a shared method that definitely works... and as we've determined here, the reason is that the shared method might be impure. So, scope is our friend I think. That's the direction D is taking towards ownership-related problems, and I think it's natural that that's the solution.

@thewilsonator
Copy link
Contributor Author

Fair enough.

scope shared is probably fine

Lets try doing that then, unfortunately easier said then done.

The key thing is that when the function returns, it must be guaranteed that there are no shared references to muck with the thread-local reference

Agreed. I'm pretty sure thats what scope is for, but the exact inner working of -dip1000 hurt my head too (Any feedback on dlang/dlang.org#2453 would be useful to try and improve that situation).

@jmdavis
Copy link
Member

jmdavis commented Oct 1, 2018

Agreed. I'm pretty sure thats what scope is for, but the exact inner working of -dip1000 hurt my head too

Yeah. In principle, that should be what it's for, but in practice, I've found that I've had to slap scope in all kinds of weird places to make -dip1000 happy - e.g. pretty much every member function of SysTime has to have scope on it, which really makes no sense to me. I believe that it's because of the Rebindable!(immutable TimeZone) member - which obvious has indirections, but what is there to escape? The TimeZone itself doesn't live on the stack even if the SysTime itself is scope. If you give the SysTime a new TimeZone, of course, it can't be scope, since the reference has to be copied, and of course, when you get the TimeZone from the getter property, it's not scope. The fact that scope is involved here makes no sense to me. It's probably yet another case where the compiler doesn't play nicely with Rebindable (though since it arguably violates the type system, I suppose that that should be no surprise), but regardless, it's maddening.

In any case, the whole mess just about makes me want to throw up my hands in the air over scope. In the simple cases with pointers, it's clearly valuable and makes sense, but once you get more complex and add user-defined types into the mix, it gets so messy, so fast, and I'm not sure that anyone other than Walter actually understands it (assuming that even he does).

@thewilsonator
Copy link
Contributor Author

Well, @TurkeyMan can be our guinea pig. ;) and hope that its Rebindable that makes it annoying.

@TurkeyMan
Copy link
Contributor

If can make it work, I will pig.

@thewilsonator
Copy link
Contributor Author

Oh it can be done, I've just got to find all the places in the function matching code where I need to test for the conversion.

@WalterBright
Copy link
Member

scope would work better if anybody would approve the PRs I made for it.

@thewilsonator
Copy link
Contributor Author

Your PRs would get approved of we knew what you were trying to accomplish with them. Please review dlang/dlang.org#2453

@jmdavis
Copy link
Member

jmdavis commented Oct 1, 2018

scope would work better if anybody would approve the PRs I made for it.

Maybe, I should find the time to look at them, but honestly, the only piece of dmd that I've spent much time looking at was the lexer - and that was before the conversion to D (though it probably still looks mostly the same still). So, I just don't generally feel very qualified to review dmd PRs as far as the actual code goes. I can comment on language stuff like here, but the focus of my time on D simply hasn't been on the compiler, and any comments I would make would almost certainly be superficial. So, I feel kind of bad whenever you make comments like that, but I'm not sure that there's much that I can really do that would actually be useful. :| I certainly can't approve anything.

@RazvanN7
Copy link
Contributor

RazvanN7 commented Oct 1, 2018

I, personally, don't think that this change is justified. The fact that types that need synchronization are not implicitly convertible to types that do is a cool thing and I can't think of any situation where this is actually a liability. Of course, shared methods cannot be called on mutable objects, but that's not a problem and in most situations it's a good thing because it makes it clear that a specific function adds the overhead of synchronization.

I don't see a valid use case where existing methods fail to solve the problem and where this conversion is actually needed.

@TurkeyMan
Copy link
Contributor

TurkeyMan commented Oct 1, 2018

I, personally, don't think that this change is justified.

I have a code-base that a billion dollar company is staking its future on that says otherwise. This problem space is modern and critically important. Also, Walter is all over this space; trying to gear D for the future is probably his hottest ambition (judging by his various lectures).
This is a particular struggle that C++ can't express, and D can theoretically express the problem space, but falls slightly short.
This particular detail is a massive fly in the ointment of an otherwise very-compelling argument to look into D.
It's the difference between "this design works", and "this design doesn't actually work. shame, it looked good on paper!".
These silly pass-through overloads that perform the blunt casts when it's statically know-able that it's safe to do so are not going to win any friends, and kinda reek of design failure.

I don't see a valid use case where existing methods fail to solve the problem and where this conversion is actually needed.

It's weird, because I don't see a single use case where this isn't needed. Everything that may be shared may also not be shared (or not shared yet, or was shared but isn't now)... It's unacceptable that the thread-safe API for the object is inaccessible because there's only one instance.

Here's one to consider:

struct Logger
{
  void log(string line) shared scope; // <- logging API is thread-safe, naturally.

  void flush(); // flush function not-threadsafe. called once per cycle during a sync-point
}

void work(ref shared Logger b)
{
  b.log("Hello"); // i'm a worker that does logging
}

void finaliseCycle(ref Logger b) // <- I have the only one in existence
{
  b.log("Finalising..."); // ERROR: wat?!
  //...
  b.flush(); // natural time to flush for this cycle
}

A thread-safe API like this collects data into thread-local pools and then reconciles later... the idea that you can't log during the sync-phases is just... really annoying.

This pattern propagates to literally everything I have that would make use of shared.
Rendering, job creation, etc. Everything I have with a thread-safe API follows this pattern.

@jmdavis
Copy link
Member

jmdavis commented Oct 1, 2018

Honestly, I find it concerning if an API is designed to work as both thread-local and shared given that that means that it has to deal with the thread-synchronization internally to make the thread-safety work, but none of that is needed for thread-local objects. Creating objects as thread-local and then casting them to shared makes sense, and some of the shutdown code may make sense as thread-local, but I wouldn't expect there to be any thread-local instances in between (outside of the object's member functions when a mutex has been locked and shared has been cast away anyway) or for the shared functions that were used while the object was shared to be used during creation (or destruction if it's destroyed as thread-local). Having a thread-local instance treated as shared means that it's going to have synchronization primitives used on it in order to make it thread-safe - which makes no sense. Unless atomics are being used, shared is already going to have to be cast away internally (after a mutex has been locked) in order to do anything useful, and the code after that can be a function that takes the object as thread-local. So, if you really do have a thread-local object, it should be using that function, not being converted to shared and having mutexes locked for it and whatnot. So, I'm not at all convinced that any kind of design which would benefit from an implicit conversion like discussed here really makes sense and in fact am inclined to think that it implies that there's something wrong with the design.

That being said, what ultimately matters here is correctness of the type system. As far as I can tell, implicitly converting a thread-local object to scope shared does not violate the compiler guarantees of either thread-local objects or shared. So, if it makes Manu's design work, then maybe it makes sense to add it. And maybe Manu's design really does make sense when seen in its entirety. From what I can see, I don't agree with his approach, but I'm not omniscient, and as far as the type system goes, it seems sound to implicitly convert thread-local objects to scope shared. And pointlessly locking a mutex (or using some other type of thread synchronization mechanism on a thread-local object) doesn't actually break anything. It's just inefficient and pointless.

@TurkeyMan
Copy link
Contributor

TurkeyMan commented Oct 2, 2018

given that that means that it has to deal with the thread-synchronization internally to make the thread-safety work

I surely that's the entire point of a shared method...?

but I wouldn't expect there to be any thread-local instances in between

An object is thread-local whenever it isn't distributed... and that could be as often (or not) as the program likes. During those moments it's possible to access a wider range of not-threadsafe functionality, but that doesn't mean the threadsafe functionality should go away.
Managing access to resources is a problem for scheduling.

[...] outside of the object's member functions when a mutex has been locked and shared has been cast away [...]

That's a tremendously shit way to interact with shared stuff... it's not really even 'shared'. Locking an external mutex asserts a state where it's not shared at all, and casting it away just commits to that situation. What you're describing is NOT interacting with a shared thing... you're just interacting with a thread-local thing. Using an outer mutex to force a context where the thing is thread-local for some period while you do thread-local stuff is not shared access

For my money, this has absolutely nothing to do with shared. A shared method is shared, and as such it must self-synchronise. Acquiring an outer lock and doing thread-local interaction is not shared, and by that definition, there's no reason to attribute methods as shared. I can't do anything interesting or useful with shared if usage is intended as such, and you must wonder why it's possible to attribute methods as shared at all?

So, I'm not at all convinced that any kind of design which would benefit from an implicit conversion like discussed here [...]

You said 'mutex' and 'cast away' 4-5 times... you are evidently not writing super-scalar code.
There are atomics, TLS, semaphores, and mutexes... there is also matters of scheduling that guarantees appropriate access rights. There are a whole bunch of tools in the toolbox, and it's the business of the shared method to implement the appropriate solution to assure the guarantee that the attribution requires.
I don't think there's a single mutex in our codebase...

pointlessly locking a mutex (or using some other type of thread synchronization mechanism on a thread-local object) doesn't actually break anything. It's just inefficient and pointless.

It's certainly not 'pointless', because the function does the function that the function does, which is intended by the call ;)

Let's say the shared method is called 100's or 1000's of times from shared code, and exactly once from thread-local (and not-hot) code... is it worth the busy work to write an 'efficient' non-shared overload for that one call? That's a DRY violation, and added maintenance burden... increasing tech-debt for what gain? The thread-safe version of the method can obviously handle a thread-local call.

Anyway, there's all this talk of mutexes and casting. That's not 'shared' at all, that's just controlled thread-local access. The feature is completely worthless in those terms. If that's the design goal of shared, then I suspect it was designed by people that had no idea what they were trying to achieve... that's certainly not what the future of software looks like!

@thewilsonator
Copy link
Contributor Author

Hmm, you could auto-implement the unshared methods to cast this to shared with forced inlining with a (template) mixin. Then that would be a one time cost plus one line per struct.

@TurkeyMan
Copy link
Contributor

@thewilsonator Possible. But that's basically an admission of defeat that the design of shared is a failure.
It's hard to get people excited and buy-in to a fundamental piece of language design when it's demonstrably broken.

@jmdavis
Copy link
Member

jmdavis commented Oct 2, 2018

shared is really supposed to do three things:

  1. Allow the compiler to assume that everything that isn't explicitly marked as shared is thread-local.
  2. Make it so that the data that is shared across threads is explicitly marked as such and that that code is greppable by the programmer (just like @trusted code and casts can be easily grepped for).
  3. Ensure that anything that's marked as shared cannot do anything which is not thread-safe.

On some level, the compiler currently fails on the third one. It's improved over time, but a number of operations which are not thread-safe are still allowed (like assignment or copying). That's likely to be sorted out when Walter and Andrei finally sit down and finish the final details of shared (and without the copy constructor DIP, it's not actually possible to have thread-safe copying of shared objects without an external mutex). The basics are well-known, but the fine details still need some work.

In general, shared has to be dealt with in one of two ways:

  1. The type is not designed to work as shared and has no member functions marked as shared. So, for it to be used as shared, any synchronization on it must be done externally - be it via atomics so that you can safely access the data while it's still shared or by protecting access to the variable with a mutex and temporarily casting away shared while the mutex is locked.

  2. The type is designed to work as shared. So, it has shared member functions, and the thread-safety issues are managed internally. The basic issue is still the same albeit with the difference that the object can then hold other shared objects that deal with synchronization on their own. But ultimately, each shared object has to do internally what would have had to be done externally if the object weren't designed to be shared. Either atomics need to be used for thread-safe access, or it needs to lock a mutex and temporarily cast away shared so that it can safely operate on it as thread-local. Because shared is part of the API, such an approach is cleaner to use, but the type is also less reusable, because it's specifically designed to work as a shared object and not as a thread-local object. It works wonderfully for stuff like producer-consumer queues but not so well for types that are sometimes used as thread-local and sometimes used as shared. synchronized classes were supposed to be a form of this that did the casting awe of shared for you (albeit just casting away the outer layer of shared since that's all it could guarantee that was safe), but they've never been implemented.

Of course, other synchronization tools such as semaphores also exist, but in principle, they're just shared objects that use atomics internally (though looking at druntime, our semaphore implementation is one of those things that has never been updated for shared like it should be). Ultimately, when dealing with a shared object, you have the choice of

  1. using an object that's designed to work as shared and deals with its synchronization internally with the same tools that that would be dealt with externally - just encapsulated.
  2. using an object that's not designed to work as shared, and you use atomics to safely access the data.
  3. using an object that's not designed to work as shared, and you use a mutex to protect it so that you can temporarily cast away shared and treat it as thread-local.

If you don't want to use mutexes, and you'd prefer to do all of your multi-threaded stuff with atomics and semphores and the like, that's fine. There are obviously use cases where atomics are more appropriate than mutexes, though I would have expected mutexes to be by far the more common use case. Regardless, shared is about the fact that the data is shared across threads, not about which synchronization primitives are used to access it safely. So, if you want to use atomics or semaphores rather than mutexes, more power to you, but using mutexes and casting away shared is a big part of using shared in general. A whole language feature was designed around the idea (though it has yet to be completed).

@WalterBright
Copy link
Member

More discussion here: https://digitalmars.com/d/archives/digitalmars/D/shared_..._319797.html which pretty much renders this unworkable in its current form. Reopen if these problems are addressed.

@jmdavis
Copy link
Member

jmdavis commented Oct 2, 2018

I suppose that it should be unsurprising that Timon noticed something that we missed. He tends to do that.

@WalterBright
Copy link
Member

Timon is right far too often :-)

But hey, we sure need him. Having to swallow our pride is better than investing a lot in huge, embarrassing mistakes. I'm glad when he proves me wrong.

@TurkeyMan
Copy link
Contributor

...are we talking about the "scope is not transitive" comment? Is that the case? How is it that scope is not transitive? That seems un-workable?

@Geod24
Copy link
Member

Geod24 commented Oct 4, 2018

Is that the case?

Unfortunately yes
This is actually stated in many places, all the way back to DIP69.

How is it that scope is not transitive?

It was always this way AFAIK, even if some people (including myself) argued for it to be.

@TurkeyMan
Copy link
Contributor

Can we continue riffing on this with a test for shared-can-not-access-members?
I reckon I can make interesting and useful infrastructure with with that tweak.

@thewilsonator
Copy link
Contributor Author

I won't have time to work on this before I see you, we can talk about it then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review:WIP Work In Progress - not ready for review or pulling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants