-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory & redundant work for concurrent TimeZones construction #356
base: master
Are you sure you want to change the base?
Conversation
Sorry the diff is hard to read; it'll be easier after #344 is merged. The entirety of this PR is just in this single commit, here: bd68cf9 I think this PR is maybe worth merging separately, since it's nice to see the performance difference and keep it separate from the correctness of the previous PR. Also, we'll want to measure the CPU perf impact to make sure that it's okay before merging this one. I'm not sure how you were measuring it, @omus - can you run that measurement again, for consistency? Thanks! |
This commit improves the thread-local caching scheme introduced in JuliaTime#344, by sharing TimeZones across _all_ thread-local caches (so there's only ever one TimeZone object instance created per process), and reduces redundant work caused by multiple concurrent Tasks starting to construct the same TimeZone at the same time. Before this commit, multiple Tasks that try to construct the same TimeZone on different threads would have constructed the same object multiple times. Whereas now, they would share it, via the "Future". ------------------------------------------------------------ It's difficult to show this effect, but one way to show it is by logging when a TimeZone is constructed, via this diff: ```julia diff --git a/src/types/timezone.jl b/src/types/timezone.jl index 25d36c3..1cea69e 100644 --- a/src/types/timezone.jl +++ b/src/types/timezone.jl @@ -68,6 +68,7 @@ function TimeZone(str::AbstractString, mask::Class=Class(:DEFAULT)) # Note: If the class `mask` does not match the time zone we'll still load the # information into the cache to ensure the result is consistent. tz, class = get!(_tz_cache(), str) do + Core.println("CONSTRUCTING $str") tz_path = joinpath(TZData.COMPILED_DIR, split(str, "/")...) if isfile(tz_path) ``` Before this commit, you can see that every thread constructs the object twice - once before the clear and once after (total of 8 times): ```julia julia> Threads.nthreads() 4 julia> TimeZones.TZData.compile(); julia> foo() = begin @sync for i in 1:20 if (i == 10) @info "---------clear-------" TimeZones.TZData.compile() end Threads.@Spawn begin TimeZone("US/Eastern", TimeZones.Class(:LEGACY)) end end @info "done" end foo (generic function with 1 method) julia> @time foo() [ Info: ---------clear------- CONSTRUCTING US/Eastern CONSTRUCTING US/Eastern CONSTRUCTING US/Eastern CONSTRUCTING US/Eastern CONSTRUCTING US/Eastern CONSTRUCTING US/Eastern CONSTRUCTING US/Eastern CONSTRUCTING US/Eastern [ Info: done 0.391298 seconds (1.51 M allocations: 64.544 MiB, 2.46% gc time, 0.00% compilation time) ``` After this commit, it's constructed only twice - once before the clear and once after: ```julia julia> @time foo() [ Info: ---------clear------- [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: done 0.414059 seconds (1.46 M allocations: 61.972 MiB, 4.55% gc time) ``` ------------------------------------------------------------------ Finally, the other problem this avoids is if we ever accidentally introduce a Task yield inside the constructor, which could happen if we used `@info` instead of `Core.println()`, then without this PR, the old code could potentially do _redundant work_ to construct the TimeZone multiple times - even on the same thread - since each Task's constructor would see that there's no TZ in the cache, start the work, and then yield to the next Task, which would do the same, until finally they all report their work into the cache, overwriting each other. This is what happens if we use `@info` in the above diff, instead: Before this commit - the results are nondeterministic, but in this run, you can see it redundantly constructed the value all 20 times!: ```julia julia> @time foo() [ Info: ---------clear------- [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: done 0.494492 seconds (1.55 M allocations: 66.754 MiB, 16.67% gc time) ``` After this commit, just the two we expect. 😊 : ```julia julia> @time foo() [ Info: ---------clear------- [ Info: CONSTRUCTING US/Eastern [ Info: CONSTRUCTING US/Eastern [ Info: done 0.422677 seconds (1.47 M allocations: 62.228 MiB, 4.66% gc time) ```
bd68cf9
to
5bffed5
Compare
I'm not really sure if this is necessary because probably no one is using threading on 1.0 anyway, but i guess they could be doing something with `@threads`, so better safe than sorry. I didn't find `@lock` in Compat.jl, so this seemed like the easiest approach?
@omus: I think this PR is ready for review whenever you get time. if there's anyone else who you'd prefer to do the review, just let me know. 👍 🙁 |
# Since we use thread-local caches, we spawn a task on _each thread_ to clear that | ||
# thread's local cache. | ||
Threads.@threads for i in 1:Threads.nthreads() | ||
@assert Threads.threadid() === i "TimeZones.TZData.compile() must be called from the main, top-level Task." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this guaranteed when calling compile/_reset_tz_cache
from the main task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember / understand the reason, but the Threads.@threads
macro only works from the top-level task.
The behavior of @threads
is that it evenly divides the for-loop across the number of threads, so if you have exactly nthreads()
iterations, exactly one iteration will go on each thread.
help?> Threads.@threads
Threads.@threads [schedule] for ... end
A macro to parallelize a for loop to run with multiple threads. Splits the iteration space among multiple tasks and runs those tasks on threads according
to a scheduling policy. A barrier is placed at the end of the loop which waits for all tasks to finish execution.
The schedule argument can be used to request a particular scheduling policy. The only currently supported value is :static, which creates one task per
thread and divides the iterations equally among them. Specifying :static is an error if used from inside another @threads loop or from a thread other than
1.
The default schedule (used when no schedule argument is present) is subject to change.
│ Julia 1.5
│
│ The schedule argument is available as of Julia 1.5.
It only works from thread 1, for reasons i can't quite remember, but so this basically means you have to start it from the main Task. (I didn't remember that it was a "thread 1" requirement - i thought it was actually a "main Task" requirement.. i can consider changing the assertion message, perhaps? But i think it's easier guidance to say "don't call this concurrently, dude")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about being slow to review. Overall this is good just needs a couple of touch ups.
Just wanted to call out that the TimeZone
mutex changes is essentially equivalent to a readers/writers lock similar to this implementation: 6522f7a. I'm just calling this out as a possible cleanup to this implementation in the future if a well tested readers/writer lock package emerges.
- Clean up variable namings - Add TODO reference to Compat.jl for `Base.@lock`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, @omus! Sorry it was me who took so long to get back to it this time 😁
Hope you've had a nice holiday! ⭐
FWIW, i still think this will be more performant than a reader/writer lock could be, since this kind of cache locking takes advantage of the cache being append-only, so that you don't need to lock at all on cache hits. I could imagine we could separate out this kind of thread-safe shared cache into a separate package though, so that others could benefit from this implementation 👍 |
12d306e
to
0258929
Compare
Kick starting CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There definitely seems to be an issue with this implementation as the CI consistently stalls out and I see the same thing when running the tests locally. Some additional work needs to be done here before we can proceed
"Unable to find time zone \"$str\". Try running `TimeZones.build()`." | ||
)) | ||
else | ||
throw(ArgumentError("Unknown time zone \"$str\"")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exceptions while constructing will cause threads to be blocked upon waiting for a channel that will never be populated
@NHDaly sorry for not giving this the attention it deserves. Been swamped lately. I'll try to be faster on additional reviews |
No worries at all - i fully believe that there are bugs in the implementation here. We actually split this out into its own package, here: https://github.com/NHDaly/MultiThreadedCaches.jl We found several bugs in the implementation while extracting it out, so the code is much cleaner now. I think this PR can wait until we scrub out all this and use that package instead! :) It's also worth doing another comparison against a Read/Write Lock and also against a ConcurrentHashTable. Feel free to review that package directly, if you'd like - most of the code was added in: JuliaConcurrent/MultiThreadedCaches.jl#1, plus a couple follow ups. Thanks for your patience @omus, sorry for the delay on my end. ALSO i'm about to go on vacation for three weeks, so 👋 we can talk about this again soon! 😅 |
This commit improves the thread-local caching scheme introduced in
#344, by sharing TimeZones
across all thread-local caches (so there's only ever one TimeZone
object instance created per process), and reduces redundant work caused
by multiple concurrent Tasks starting to construct the same TimeZone at
the same time.
Before this commit, multiple Tasks that try to construct the same
TimeZone on different threads would have constructed the same object
multiple times. Whereas now, they would share it, via the "Future".
It's difficult to show this effect, but one way to show it is by logging
when a TimeZone is constructed, via this diff:
Before this commit, you can see that every thread constructs the object
twice - once before the clear and once after (total of 8 times):
After this commit, it's constructed only twice - once before the clear
and once after:
Finally, the other problem this avoids is if we ever accidentally
introduce a Task yield inside the constructor, which could happen if we
used
@info
instead ofCore.println()
, then without this PR, the oldcode could potentially do redundant work to construct the TimeZone
multiple times - even on the same thread - since each Task's constructor
would see that there's no TZ in the cache, start the work, and then
yield to the next Task, which would do the same, until finally they all
report their work into the cache, overwriting each other.
This is what happens if we use
@info
in the above diff, instead:Before this commit - the results are nondeterministic, but in this run,
you can see it redundantly constructed the value all 20 times!:
After this commit, just the two we expect. 😊 :