-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move timezone support to extension #482
base: main
Are you sure you want to change the base?
Conversation
Regardless of breakingness, I'm not sure this is a good idea, since the Arrow format itself has a notion of IMO it would be better to try to fix invalidation / other sources of re-compilation in TimeZones itself, or otherwise improve it's load time, rather than making it optional. |
Codecov Report
@@ Coverage Diff @@
## main #482 +/- ##
==========================================
+ Coverage 87.45% 87.46% +0.01%
==========================================
Files 26 27 +1
Lines 3283 3286 +3
==========================================
+ Hits 2871 2874 +3
Misses 412 412
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I also wasn't sure that this was the best path -- I really did it just to see what would happen. And I also agree that the (re)compilation overhead in TimeZones is a good target for optimization. That said, there is a different path forward (which I'm not saying is a good idea): define a very restricted |
Ugh, we just really need a cleaned up/nice TimeZones.jl package where people don't feel like they need to work-around/avoid it. That's possibly the problem of the Dates stdlib making it easier for TimeZones.jl to integrate cleanly. |
Perhaps this is no longer needed when these TimeZones load time improvements are merged: JuliaTime/TimeZones.jl#457. |
With the release of TimeZones.jl 1.16 I decide to re-run the timingsJulia 1.9.4 with TimeZones 1.15: (Arrow) pkg> st
Project Arrow v2.7.2
Status `~/.julia/dev/Arrow/Project.toml`
[31f734f8] ArrowTypes v2.3.0 `src/ArrowTypes`
[c3b6d118] BitIntegers v0.3.1
[5ba52731] CodecLz4 v0.4.3
[6b39b394] CodecZstd v0.8.2
[f0e56b4a] ConcurrentUtilities v2.4.1
[9a962f9c] DataAPI v1.16.0
[4e289a0a] EnumX v1.0.4
[e6f89c97] LoggingExtras v1.0.3
[2dfb63ee] PooledArrays v1.4.3
[91c51154] SentinelArrays v1.4.3
[bd369af6] Tables v1.11.1
⌃ [f269a46b] TimeZones v1.15.0
[3bb67fe8] TranscodingStreams v0.10.8
[ade2ca70] Dates
[a63ad114] Mmap
[cf7118a7] UUIDs
Info Packages marked with ⌃ have new versions available and may be upgradable.
julia> @time @time_imports using Arrow
1.6 ms LoggingExtras
0.8 ms DataAPI
0.4 ms DataValueInterfaces
0.4 ms IteratorInterfaceExtensions
0.3 ms TableTraits
4.2 ms OrderedCollections
17.0 ms Tables
17.1 ms SentinelArrays
10.9 ms PooledArrays
7.7 ms Preferences
0.5 ms JLLWrappers
27.5 ms Lz4_jll 93.79% compilation time
1.4 ms TranscodingStreams
0.4 ms TranscodingStreams → TestExt
3.9 ms CodecLz4
0.8 ms Zstd_jll
2.6 ms CodecZstd
0.4 ms Scratch
0.3 ms PrecompileTools
16.6 ms Parsers
4.3 ms InlineStrings
0.4 ms TZJData
0.7 ms Compat
0.4 ms Compat → CompatLinearAlgebraExt
0.5 ms ExprTools
0.8 ms Mocking
253.5 ms TimeZones 55.54% compilation time (79% recompilation)
10.9 ms BitIntegers
1.8 ms ConcurrentUtilities
0.5 ms EnumX
2.8 ms ArrowTypes
152.1 ms Arrow
0.621405 seconds (1.35 M allocations: 83.477 MiB, 4.91% gc time, 31.84% compilation time: 56% of which was recompilation) Julia 1.9.4 and TimeZones 1.16: (Arrow) pkg> st
Project Arrow v2.7.2
Status `~/.julia/dev/Arrow/Project.toml`
[31f734f8] ArrowTypes v2.3.0 `src/ArrowTypes`
[c3b6d118] BitIntegers v0.3.1
[5ba52731] CodecLz4 v0.4.3
[6b39b394] CodecZstd v0.8.2
[f0e56b4a] ConcurrentUtilities v2.4.1
[9a962f9c] DataAPI v1.16.0
[4e289a0a] EnumX v1.0.4
[e6f89c97] LoggingExtras v1.0.3
[2dfb63ee] PooledArrays v1.4.3
[91c51154] SentinelArrays v1.4.3
[bd369af6] Tables v1.11.1
[f269a46b] TimeZones v1.16.0
[3bb67fe8] TranscodingStreams v0.10.8
[ade2ca70] Dates
[a63ad114] Mmap
[cf7118a7] UUIDs
julia> @time @time_imports using Arrow
1.7 ms LoggingExtras
0.7 ms DataAPI
0.4 ms DataValueInterfaces
0.3 ms IteratorInterfaceExtensions
0.3 ms TableTraits
4.3 ms OrderedCollections
16.6 ms Tables
17.3 ms SentinelArrays
11.0 ms PooledArrays
7.4 ms Preferences
0.4 ms JLLWrappers
27.8 ms Lz4_jll 92.37% compilation time
1.4 ms TranscodingStreams
0.4 ms TranscodingStreams → TestExt
3.6 ms CodecLz4
0.8 ms Zstd_jll
2.7 ms CodecZstd
0.4 ms Scratch
0.3 ms PrecompileTools
16.2 ms Parsers
4.3 ms InlineStrings
0.5 ms TZJData
0.8 ms Compat
1.8 ms Compat → CompatLinearAlgebraExt
0.5 ms ExprTools
1.1 ms Mocking
28.9 ms TimeZones
25.9 ms BitIntegers
1.7 ms ConcurrentUtilities
0.6 ms EnumX
2.9 ms ArrowTypes
259.1 ms Arrow
0.523419 seconds (574.55 k allocations: 40.298 MiB, 2.85% gc time, 11.04% compilation time) The increase in import time for Arrow is most likely to is due to having Arrow.jl code triggering the loading of time zone data. |
It looks like the use of julia> @time @time_imports using Arrow
...
25.9 ms TimeZones
...
2.8 ms ArrowTypes
19.3 ms Arrow
0.284661 seconds (574.62 k allocations: 41.319 MiB, 14.86% gc time, 20.92% compilation time) I'll attempt to address this in TimeZones.jl |
When looking at
@time_imports
on a transitive reverse dependency, I saw that TimeZones adds a substantial time to a package that doesn't need that part of Arrow. So I took a stab at moving time zone support to an extension. I've done this in a way that should still be Julia pre 1.9 compatible. There is one slightly hacky thing: the extension mechanism is based primarily on the idea of adding additional methods to existing functions and not for defining additional types. As such, it's a little bit weird (setglobal!
in 1.9) to get types defined in an extension available at the top-level module again.This is almost assuredly a breaking change because deserialization behavior will change when TimeZones isn't loaded either directly or through some other package. For serialization, I feel like it's less of an issue because if you've writing
ZonedDateTime
to disk, then you've already got TimeZones loaded. Still, I thought this was a nice experiment to show a potential speed up. I'm opening this PR as the starting point of a discussion, not because I expect it to be merged immediately.current main
as an extension
to do if this change is desired