-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Condition/RecursiveLock: add ability to handle threads #30061
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This *MT
naming convention seems a bit unfortunate. Is it correct to use Condition
when Julia is multithreaded? Or ConditionMT
when Julia is single-threaded?
I agree --- code should be independent of the number of threads as much as possible. I think these can be seen as different ways of using conditions, rather than different kinds of conditions: in one case, you just want to be woken up and don't need to test a predicate, and in the other case you need to test a predicate and so need to continue holding a lock when |
It's a statement of the algorithm being used (whether concurrent modifications are supported), not the configuration of Julia. You can always use a thread-safe variant of any object (hence in 2.0, we might decide to make that the default) by just accepting the performance hit, but the ability to avoid the extra overhead of a lock while using the same code might remain desirable.
The only way to do that is to make everything very slow. I strongly disagree with this.
I don't think a caller can be local, since something external ("the caller" could be anyone on the callstack) is, by definition, non-local to the callee.
Um, that's what C does, and is precisely what we just said we don't want to do (#30026) |
No, what I'm saying is that we should try to skip ahead to making all of these thread-safe by default. I think it might be possible to arrange the API to enable that. For example, I feel more strongly about not having, or strongly discouraging using, a thread-unsafe lock. That seems like a logical place to be thread-safe by default. If it's truly necessary to have a task-only lock for performance tuning, that can be added later when needed and hidden away somewhere, possibly not even exported. |
No. I don’t think you understand what’s going on here. I’ve been thinking about this a lot for the past couple months, and haven’t been able to find a substitute (sadly) and hence am finally turning this into a PR. If you aren’t already holding the lock when you call ‘wait’, your code is broken. It’s absolutely useless to break the API just to return with the lock held (I do it here because it’s convenient and makes the implementation a bit more powerful around RecursiveLock—but it’s absolutely just optional code). The options I’m aware of are: (a) break backwards compatibility and require you to hold the lock first (b) declare in the constructor whether correct locking is mandatory (this PR) or (c) declare / detect in the wait call whether you previously acquired the lock, and branch to the appropriate implementation at runtime |
I think that's fairly close to what I'm proposing.
|
1323511
to
e0e8391
Compare
Yes, I was actively testing the branch with the switch to change the default lock type to MT-safe, and that seems to build and work. Since those have the same API, your proposal seems like an obvious improvement. I could switch Condition to ConditionMT, but (for backwards compatibility), that requires removing the concurrency violation errors in some places, and replacing them with some logic to bypass thread-safety instead when we detect that case. It's not my favorite option (Since I like that with the current state of the PR, the user states. via the choice of constructor, whether their code is updated to be thread-safe. And I think it's a bit easier to assert than to generally handle this case.). However, I don't foresee any actual blockers in getting the test-suite to pass with that option. (Implementation-wise, this might mean adding a flag to RecursiveLock of whether to disable error checking, or perhaps just handling it specially inside |
Actually, in talking with Kiran today, there actually is another option (d). In this PR, I'm using the word Condition in the same sense as pthreads/C/Java/TBB/Winnt/etc. of a queue of threads waiting for an event. But we could also use the word in the sense of the simple notification trigger itself, or what might be called an Event in some other contexts (such as kevent/epoll/Winnt/Julia). This would be an intermediate approach that alters some results but doesn't outright break all usage. In exchange, we might be able to gain partial MT-safety for the existing The trade-off here (pros and cons, in no particular order) is that
|
e0e8391
to
c2ef8fe
Compare
I've now added another commit demonstrating the ability for the As stated in the commit description, this is a fully functional prototype, but not necessarily a final implementation. As such, it still leaves in the machinery to define, use, and experiment with the *ST and *MT aliases. Once we settle on a design, I'll go back through and remove any content and flexibility that we didn't end up wanting. |
We talked some last week about other options besides what is currently implemented in this PR. Is what we discussed then basically equivalent to option (c) above? If so, if I remember correctly, the tradeoff was essentially: Is that more or less right? If so, it seems to me that this tradeoff is not worth it. Am I characterizing the discussion correctly? I'm sorry that I don't remember all of the details. Did we come to a consensus that I'm not remembering? |
@vtjnash brought up the point that in the future there might be multi-threaded and single-threaded versions of various things, like HashMap vs. ConcurrentHashMap. So we might want to think about this from the perspective of having a general naming convention, which we can then also use here. Some possibilities:
? |
Talking more with Jeff yesterday, we proposed the following actions:
|
+10 I think that all sounds really great to me! I agree that the current behavior of Condition belongs with Event, and that nicely solves the naming problems. +1 and thanks for thinking hard about this! :) |
Another option for resolving the |
324c2d5
to
06caab5
Compare
06caab5
to
fef54a7
Compare
👍 Looks good. We'll have to wait for the 1.1 news to be moved aside then rebase. |
fef54a7
to
eaa8cf1
Compare
Yay! Thanks for the hard work on this, Jameson! It looks really good! :)) |
b0c2e18
to
a6ea3ef
Compare
This extends Condition to assert that it may only be used in the single-threaded case (co-operatively scheduled), and then adds a thread-safe version of the same: Threads.Condition. Additionally, it also upgrades ReentrantLock, etc. to be thread-safe.
a6ea3ef
to
515e5c1
Compare
assert_havelock(l::AbstractLock, tid::Nothing) = error("concurrency violation detected") | ||
|
||
""" | ||
AlwaysLockedST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the only case where the original *ST
and *MT
naming has remained in the merged PR. Was that intentional or was this just overlooked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, there aren't any MT
and this type is only meant to be used internally. We could rename it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that's what I was thinking—just calling it AlwaysLocked
would be good.
In Julia 1.2, the error that occurs when releasing a semaphore with mismatched release/acquire counts is now an ErrorException, whereas previously it was an AssertionError. See JuliaLang/julia#30061
In Julia 1.2, the error that occurs when releasing a semaphore with mismatched release/acquire counts is now an ErrorException, whereas previously it was an AssertionError. See JuliaLang/julia#30061
@vtjnash I wonder if it was a mistake to allow the old-style, non-thread-safe This makes it somewhat easy to hit the following gotcha (which happened to me today): julia> c = Condition()
Base.GenericCondition{Base.AlwaysLockedST}(Base.InvasiveLinkedList{Task}(nothing, nothing), Base.AlwaysLockedST(1))
julia> lock(c)
julia> Threads.@spawn begin lock(c) ;@info "HI" ;unlock(c) end
[ Info: HI
Task (done) @0x000000011ce3ead0 I forgot that I was using I think it would be better to have Can we throw an error to enforce that, so that users aren't caught by this surprise like I was? :) What do you think? Thanks! :) |
I believe |
It seems that the base The full situation I was in was trying to support When I was experimenting with it, i was accidentally using the wrong kind of Condition and didn't notice, and I was really confused by the behavior. I managed to finally figure it out by adding print-statements to this toy example: julia> using Test; using Base.Threads: @spawn
julia> @sync begin
coordinator = Channel()
c1 = Condition()
c2 = Condition()
wait_on_conds() = begin
@info "lock(c1)"
lock(c1)
@info "lock(c2)"
lock(c2)
put!(coordinator, 0) # Notify main task that we're waiting. (Note that it will block on lock until we're actually asleep.)
@async begin wait(c1); @info "c1" end
@async begin wait(c2); @info "c2" end
unlock(c2)
unlock(c1)
end
t1 = @async wait_on_conds()
t2 = @async wait_on_conds()
take!(coordinator); take!(coordinator) # Wait for both tasks to start waiting
# Now, everyone is ready, so notifying c1 will wake up _both_ tasks
lock(c1)
@test notify(c1) == 2 # ERROR: This returns `1` instead, because the `lock` doesn't actually block until `t2` has waited like I expected.
unlock(c1)
end (In-fact, I finally realized that this test wasn't going to work anyway, because |
I get
I suppose the issue is that |
Weeirrrrd. It happily continues in parallel for me on both 1.3 and my 1-day old master 1.4.
|
But yeah, exactly. That's my thinking as well. I think having the two types can lead to surprises like this, so any amount of failing early would be much appreciated! ❤️ Erroring on Ideallllly, it seems like it would be nice to consider renaming |
This is kind of a corny solution, but I've opened #33162 as a straw-person EDIT: 👍 Thanks, this was merged. :) That should address my concerns |
This extends Condition to assert that it may only be used in the single-threaded case (co-operatively scheduled), and then adds a thread-safe version of the same: `Threads.Condition`. Additionally, it also upgrades ReentrantLock, etc. to be thread-safe.
This extends Condition and RecursiveLock to assert that they may only be used in the single-threaded case (co-operatively scheduled), and then adds thread-safe versions of the same (ConditionMT and RecursiveLockMT). Additionally, the constructor for
ConditionMT
lets you pass in an existingRecursiveLockMT
to allow having multiple notifications off of a single lock.Unlike the existing thread-safe primitives (Threads.AsyncCondition, Threads.SpinLock, Threads.Mutex), these new types integrate with the Task system also, and thus should typically be preferred to those for user code. Unlike the existing task-only primitives, these work in all situations, but require the user to write explicit lock code annotating the protected code regions.
In version 2.0, we can consider switching the default for
Condition
fromConditionST
toConditionMT
, but that would be a breaking change. (ConditionMT requires the user to hold a lock before callingwait
, while that lock is currently implicit in Condition under the assumption of cooperative tasking—as explicitly represented in this PR by theNotALock
type.)Implement and close #30026