-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move mutable globals into a central structure #7722
Conversation
Makefile
Outdated
@@ -676,7 +677,7 @@ HEADER_FILES = \ | |||
CodeGen_PyTorch.h \ | |||
CodeGen_Targets.h \ | |||
CodeGen_WebGPU_Dev.h \ | |||
CompilerLogger.h \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unintended change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops
src/CompilerGlobals.cpp
Outdated
|
||
Globals the_real_globals; | ||
|
||
size_t compute_stack_size() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the compiler stack size doesn't belong in this struct. It's not mutable state - it's configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be changed by set_stack_size()
at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's not mutated as part of a compilation of a Halide pipeline. It's libHalide configuration state rather than Generator state. Someone might feasibly want to set it with code outside of a loop that calls compile_multitarget a bunch of times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. Will change.
src/CompilerGlobals.h
Outdated
std::atomic<int> unique_name_counters[num_unique_name_counters]; | ||
|
||
// The stack for tic/toc utilities | ||
std::vector<TickStackEntry> tick_stack; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no point putting this in the struct either. It's not mutable compiler state that it makes sense to reset. In fact resetting it could result in toc() trying to pop an empty vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more on this -- you're not wrong, but if a multithreaded caller used this facility, we could end up with similarly mismatched vectors. Since this is mostly for transitory debugging usage, maybe it's no big deal, but it is a minor footgun waiting for code that wants to drive Halide from multiple threads.
This is a stab at an alternative to #7717, per discussion there. It doesn't necessarily have all the mutable globals moved here, but at least the ones we discussed. Better? Worse? What do you think?
1528a37
to
1208e22
Compare
PTAL -- sorry for the force-push :-/ |
Oooh, this is interesting, looks like a real injection, presumably caused by interaction between this change and the recently-landed #7715, investigating:
|
OK, there's a subtle bug here that I'm not sure how to resolve, the issue is this:
Thinking out loud, possible ways to fix this might be:
It's not pretty, but I think it would work... am I missing something? |
Perhaps the interface should be the ability to push/pop state, instead of the ability to reset state. popping would roll it back to just before the matching push. |
...let's discuss this design offline before I do any more work on it and come up with a concrete goal. |
Despite my comment above, I couldn't resist, so I restructured to make it an entirely opaque API in order to allow save-restore as you requested. One possible issue here re: thread safety: All the |
Failures unrelated, but changes since you approved are pretty major, PTAL before I consider landing |
@@ -640,6 +641,9 @@ HTML_TEMPLATE_FILES = \ | |||
# Don't include anything here that includes llvm headers. | |||
# Also *don't* include anything that's only used internally (eg SpirvIR.h). | |||
# Keep this list sorted in alphabetical order. | |||
# | |||
# Note that CompilerGlobals.h is deliberately omitted, as we don't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems redundant with the one above. There are lots of .h files not in this list.
Thinking more on this... if we want this to be properly thread-safe, I think what we need to do is:
|
Monday morning review ping |
IIRC at the dev meeting we discussed an API where instead of reset() and restore() we just have get and set, and then the scoped widget does a get in its constructor and a set in the destructor, so that the state is restored to whatever it was before the scoped code ran. I believe the difference between that and this is that get() is non-mutating, whereas reset() is. The advantage of get()/set() over reset()/restore() is that the semantics are simpler, and they can be nested without accidentally reusing names, though I'm not sure what the point of nesting it would be. None of this addresses multi-threading though. I'm not sure what to do about that. If a set()/restore() happens on one thread while another thread is mid-compilation, we're going to have a bad time... That seems like a show-stopper, because I already know of one case in the wild where different generators are being run different threads, and this change would break it. Adding a mutex would just serialize the whole thing. I can't immediately think of a solution. |
Are we sure that that situation is actually robust and not getting lucky? (Have we tried running it with TSAN?)
It wouldn't break it if we added a mutex to serialize the whole thing... but it would clearly impact performance to some extent. How much, I have no idea -- might be interesting to make a fully-mutexed version out of this and actually try it in that scenario to see how much effect we're talking about. (As the saying goes, speed doesn't matter unless the answer is correct...) |
There's no cross-talk between the threads involved except for accessing the same set of atomic counters, so I'm pretty sure TSAN would be clean. Serializing the whole thing would remove the point of threading it in the first place. |
Well, no; seems to me that the mutex contention would only occur when we request unique names (or random-var seeds, but that's so infrequently used that it's unlikely to matter). And yes, our code requests unique names a lot, but it's also the case that a lot of code happens in between those instances. It's not so obvious to me that the contention would destroy the point of threading, but testing would give us data pro or con. |
You'd have to hold the mutex for the entirety of lowering, not just when asking for unique names, because you need to prevent another thread from calling restore() while you're in the process of lowering, or you'll get non-unique temporary names. |
hmm, not necessarily -- you could get away with just holding the mutex while there is a save active (and release it upon restore) -- but I'm not sure that would help much in practice, because the main place this would apply would be in compile_too_multitarget, which does a lot of lowering internally. OK, so, separate thought: what about using thread-local storage? |
That would add the constraint that we can't use multi-threading internally in lowering (which we don't currently do), and you can't construct Halide IR for a single pipeline using multiple threads (which I doubt anyone does). Not sure if repeatable naming is worth those constraints. Another option might be to add the functionality but not turn any of it on by default, and if people want to save/restore state in their own generator drivers where they're familiar with what they do and don't do, they can. In the case you had where you wanted to reset the random number counters, that could just call reset() at the top of generate() for the affected pipeline. |
Sigh... maybe not, but it would sure make life easier when debugging.
...maybe? Not sure how to refactor this so that making it conditional wouldn't be ugly, though.
Fair... I guess I'll go resurrect my original PR that basically just did that as a stopgap.
Can you expand a little bit on the use case here and what specifically is happening (and why), assuming it's not proprietary information? (It might help me wrap my head around the problem space better.) |
This whole thing makes me a little sad... we both know that it's really hard to get things right the first time, but in hindsight, it seems like having Halide objects belong to a Context of some sort would probably have been a win, despite the possible extra overhead involved. Ah, well, hindsight. |
This is a stab at an alternative to #7717, per discussion there. It doesn't necessarily have all the mutable globals moved here, but at least the ones we discussed. Better? Worse? What do you think?