Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in h5read #752

Closed
laborg opened this issue Nov 30, 2020 · 16 comments
Closed

Performance regression in h5read #752

laborg opened this issue Nov 30, 2020 · 16 comments

Comments

@laborg
Copy link
Contributor

laborg commented Nov 30, 2020

My system:

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

On 0.13.7:

@btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5","acc", (1,1))
  117.233 μs (53 allocations: 2.16 KiB)
-0.0027361426f0

on 14.1

@btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5", "acc", (1,1))
  1.799 ms (31 allocations: 1.22 KiB)
-0.0027361426f0

Any way to work around this, besides downgrading?

@musm
Copy link
Member

musm commented Nov 30, 2020

Dupe of #742. We have outstanding PRs that address these performance issues, should be fixed in the next week or so.

@musm
Copy link
Member

musm commented Nov 30, 2020

Best bet is to wait if you can.

@musm musm closed this as completed Nov 30, 2020
@jmert
Copy link
Contributor

jmert commented Nov 30, 2020

Is this a duplicate? The output suggests this is just reading a single Float32 value, while PR #745 is all about avoiding the (ab)use of large tuples in fixed-length arrays/compound data types. I think this case should be independent of that.

Edit: I just noticed the indexing — What is the complete data type and shape of the "acc" field? If it's a fixed array, then @musm's comments are much more likely to be true.

@laborg — the one thing I'd be interested in seeing further is whether you can show the same regression on a bare read rather than h5read which also must open and close the file on each invocation. That's to say, if there's been a regression in h5open/close time (maybe from switching to libhdf5 v1.12), the obvious suggestion is to open and hold a handle to the open file rather than opening and closing often.

@musm
Copy link
Member

musm commented Nov 30, 2020

I'll leave this open until, we have feedback from OP.

@musm musm reopened this Nov 30, 2020
@laborg
Copy link
Contributor Author

laborg commented Dec 1, 2020

An MWE, taken from the HDF5 docs, shown on 0.13.7 and master:

0.13.7

(jl_KT5sZW) pkg> st -m
Status `/tmp/jl_KT5sZW/Manifest.toml`
...
  [f67ccb44] HDF5 v0.13.7
  [0234f1f7] HDF5_jll v1.10.5+7
...

julia> f = tempname();

julia> h5write(f, "mygroup2/A", collect(reshape(1:120, 15, 8)));

julia> @btime h5read(f, "mygroup2/A", (2:3:15, 3:5))
  128.048 μs (73 allocations: 3.02 KiB)
5×3 Array{Int64,2}:
 32  47  62
 35  50  65
 38  53  68
 41  56  71
 44  59  74

julia> file = h5open(f, "r")
HDF5 data file: /tmp/jl_7TdQ2Y

julia> @btime read(file, "mygroup2/A");
  18.658 μs (25 allocations: 2.06 KiB)

master

(jl_OQXA8A) pkg> st -m
Status `/tmp/jl_OQXA8A/Manifest.toml`
...
  [f67ccb44] HDF5 v0.14.1 `https://github.com/JuliaIO/HDF5.jl.git#master`
  [0234f1f7] HDF5_jll v1.12.0+1
...


julia> f = tempname();

julia> h5write(f, "mygroup2/A", collect(reshape(1:120, 15, 8)));

julia> @btime h5read(f, "mygroup2/A", (2:3:15, 3:5))

  4.857 ms (45 allocations: 2.00 KiB)
5×3 Array{Int64,2}:
 32  47  62
 35  50  65
 38  53  68
 41  56  71
 44  59  74

julia> file = h5open(f, "r")
🗂️ HDF5.File: (read-only) /tmp/jl_Z4tLyB
└─ 📂 mygroup2
   └─ 🔢 A

julia> @btime read(file, "mygroup2/A");
  50.675 μs (16 allocations: 1.70 KiB)

So there is also a regression in reading data for an h5opened file without fancy indexing...

@jmert
Copy link
Contributor

jmert commented Dec 1, 2020

With that extra bit of info, I don't think it's the read time which has change significantly — rather, the time to open the dataset itself is longer.

julia> using HDF5, BenchmarkTools
[ Info: Precompiling HDF5 [f67ccb44-e63f-5c2f-98bd-6dc0ccc4ba2f]

julia> h5write("/tmp/test.h5", "mygroup/A", collect(reshape(1:120, 15, 8)))

julia> fid = h5open("/tmp/test.h5", "r")
HDF5 data file: /tmp/test.h5


# HDF5 v0.13.7:

julia> HDF5.libversion
v"1.10.4"

julia> @btime read($(fid["mygroup/A"]));
  6.478 μs (21 allocations: 1.94 KiB)

julia> @btime $(fid["mygroup/A"])[$(2:3:15), $(3:5)];
  17.027 μs (64 allocations: 2.83 KiB)

julia> @btime fid["mygroup/A"];
  3.514 μs (1 allocation: 32 bytes)


# HDF5 master:

julia> HDF5.libversion
v"1.12.0"

julia> @btime read($(fid["mygroup/A"]));
  7.566 μs (13 allocations: 1.61 KiB)

julia> @btime $(fid["mygroup/A"])[$(2:3:15), $(3:5)];
  11.769 μs (36 allocations: 1.83 KiB)

julia> @btime fid["mygroup/A"];
  13.872 μs (3 allocations: 96 bytes)



# HDF5 PR #745

julia> HDF5.libversion
v"1.12.0"

julia> @btime read($(fid["mygroup/A"]));
  8.718 μs (14 allocations: 1.62 KiB)

julia> @btime $(fid["mygroup/A"])[$(2:3:15), $(3:5)];
  13.382 μs (36 allocations: 1.80 KiB)

julia> @btime fid["mygroup/A"];
  13.592 μs (3 allocations: 96 bytes)

Read-all is slightly slower, but there's a corresponding improvement in indexed reads going from HDF5.jl v0.13.7 to 0.14/master when the opened HDF5.Dataset is interpolated into the @btime expression. The big regression appears to be just in opening the dataset itself, as the third benchmark in each grouping shows — 3 μs on v0.13.7 vs ~14 μs on v0.14.

I haven't yet tried isolating whether that's due to libhdf5 moving from v1.10 to v1.12 (in which case there's not much to do here) or whether it's something else in the restructuring in this package between the two releases.

jmert added a commit to jmert/HDF5.jl that referenced this issue Dec 1, 2020
When there are no keyword arguments, just use the global
`DEFAULT_PROPERTIES` rather than creating new `HDF5.Properties` with
default values.

This improves the regression identified in issue JuliaIO#752.
Specifically, the benchmark used in
JuliaIO#752 (comment)
is improved from
```julia
julia> @‌btime fid["mygroup/A"];
  13.872 μs (3 allocations: 96 bytes)
```
on master to
```julia
julia> @‌btime fid["mygroup/A"];
  6.778 μs (1 allocation: 32 bytes)
```
with this commit.
@jmert
Copy link
Contributor

jmert commented Dec 1, 2020

PR #753 helps, but it's not conclusive.

There's a regression in reads on Julia v1.5 that doesn't show up as clearly on Julia's master branch. I've run the test

julia> @btime read(fid, "mygroup/A")

using both Julia versions for various combinations of HDF5.jl: v0.13.7, master (~0.14.1), PR #753 (makes fewer properties objects), PR #745 (reworks read type normalization), and finally a branch which merges 745 and 753 onto master. The results look good for Julia 1.6/master, but there's still a regression on v1.5 (and presumably all v1.3–1.5):

Julia v1.5.3 Julia v1.6.0-DEV.1595
HDF5.jl v0.13.7 23.818 μs (25 allocations: 2.06 KiB) 24.344 μs (22 allocations: 1.97 KiB)
HDF5.jl master 54.040 μs (16 allocations: 1.70 KiB) 24.759 μs (16 allocations: 1.70 KiB)
HDF5.jl PR #753 43.468 μs (14 allocations: 1.64 KiB) 15.004 μs (14 allocations: 1.64 KiB)
HDF5.jl PR #745 56.561 μs (17 allocations: 1.72 KiB) 26.645 μs (17 allocations: 1.72 KiB)
HDF5.jl 753+745 44.348 μs (17 allocations: 1.73 KiB) 15.935 μs (17 allocations: 1.73 KiB)

@musm
Copy link
Member

musm commented Dec 1, 2020

Is the regression related to v1.5 compiler issues? The v1.6 numbers look very good. If it's related to the compiler failing to optimize, perhaps there's not much we can do for now.

@jmert
Copy link
Contributor

jmert commented Dec 1, 2020

Yes, there must be something in the code that Julia v1.6 can better optimize than older versions can. A while back when I started investigating for #745 I had assumed there'd be a performance break between Julia v1.4 and v1.5 due to v1.5's ability to inline more structs, but since it's actually a break between v1.5 and v1.6, the cause must be something else. I haven't yet figured out what new feature it must be, though.

There are probably more fixes like #753 which can eat away at the regression and slowly improve performance (across all Julia versions), but I suspect there's always going to be a performance gap between v1.5- versus v1.6+ just due to the compiler being more capable.

musm pushed a commit that referenced this issue Dec 2, 2020
When there are no keyword arguments, just use the global
`DEFAULT_PROPERTIES` rather than creating new `HDF5.Properties` with
default values.

This improves the regression identified in issue #752.
Specifically, the benchmark used in
#752 (comment)
is improved from
```julia
julia> @‌btime fid["mygroup/A"];
  13.872 μs (3 allocations: 96 bytes)
```
on master to
```julia
julia> @‌btime fid["mygroup/A"];
  6.778 μs (1 allocation: 32 bytes)
```
with this commit.
@jmert
Copy link
Contributor

jmert commented Dec 2, 2020

So, some unfortunate news: I can't reproduce my own timings from #752 (comment), particularly where Julia v1.6 seemed to be able to outperform v1.5. My guess at this point is that somehow I managed to get extra method specializations from whatever route of branch-switching I did while Revise kept updating the methods in the background.

I wanted to figure out what feature might be most responsible for the apparent performance difference between Julia v1.5 and v1.6, so in prepping for the ability to run across more git branches, I coded up a script to do the timing rather than doing it all by hand. Notably, the script launches a new Julia process (without my startup file) for each timing, so there can't be any interaction with Revise. The following is what I get on Julia v1.5:
image
versus what I get on Julia v1.6:
image
The rows step through various points in the commit history and made-up branches (left-most column), and the two remaining columns show the timings for operations like that in the header.

  • v0.13.7 — latest release on previous major version
  • v0.14.1 — latest release on current major version
  • v0.14.1+libhdf5_1.10 — HDF5_jll isn't actually bounded to only libhdf5 1.10 (see Restrict HDF5_jll to v1.10 series #743 for the 0.13 branch), so HDF5.jl is actually using libhdf5 v1.12 before Upgrade HDF5 to v1.12 #739 updates the compat requirements. This branch adds the version restriction
  • master — up to commit 11ba575
  • master+revert_libhdf5_1.12 — forces revert back down to libhdf5 1.10
  • master+fix_742 — merge between master and Fix #742, avoid instantiating large tuples #745
  • master+fix_742+revert_libhdf54_1.12 — like previous, but also forces back down to libhdf5 1.10

Quick observations:

  1. Julia v1.5 and v1.6 generally have the same performance.
  2. v0.13.7 vs v0.14.1+libhdf5_1.10 shows the bulk of the read(fid, "name") regression is in the large number of commits across major versions, but the switch from v0.14.1+libhdf5_1.10 up to v0.14.1 shows the underlying library switch has a non-negligible impact.
  3. v0.14.1 up to master shows a few performance improvements again, but the gains are small.
  4. read(::Group, ::String) varies more than read(::Dataset), suggesting changes in the higher levels (such as maybe HDF5 type -> Julia type translation??) are probably where gains can be found to close the gap again...
  5. ...but there's apparently just a flat performance hit from upgrading library versions that we probably can't do much about.

@laborg
Copy link
Contributor Author

laborg commented Dec 3, 2020

I think this might be a separate issue but continuous calls to h5read hurt performance in a linear fashion...

On Julia master + HDF5 14.1 + HDF5_jll 1.12 and 1.10!

julia> @btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5", "acc", (1,1))
  709.036 μs (26 allocations: 1.14 KiB)
-0.0027361426f0

julia> @btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5", "acc", (1,1))
  1.950 ms (26 allocations: 1.14 KiB)
-0.0027361426f0

julia> @btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5", "acc", (1,1))
  2.881 ms (26 allocations: 1.14 KiB)
-0.0027361426f0

julia> @btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5", "acc", (1,1))
  3.637 ms (26 allocations: 1.14 KiB)
-0.0027361426f0

Maybe some resources aren't freed properly?

on Julia master + HDF 0.13.7 + HDF5_jll 1.10

julia> @btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5", "acc", (1,1))
  122.234 μs (53 allocations: 2.16 KiB)
^[[A-0.0027361426f0

julia> @btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5", "acc", (1,1))
  121.866 μs (53 allocations: 2.16 KiB)
-0.0027361426f0

julia> @btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5", "acc", (1,1))
  122.208 μs (53 allocations: 2.16 KiB)
-0.0027361426f0

@laborg
Copy link
Contributor Author

laborg commented Dec 3, 2020

I had a hunch and with this changes:

diff --git a/src/HDF5.jl b/src/HDF5.jl
index 2c1fddc..570c587 100644
--- a/src/HDF5.jl
+++ b/src/HDF5.jl
@@ -536,6 +536,7 @@ function h5read(filename, name::AbstractString; pv...)
     try
         obj = getindex(fid, name; pv...)
         dat = read(obj)
+        close(obj)
     finally
         close(fid)
     end
@@ -548,6 +549,8 @@ function h5read(filename, name_type_pair::Pair{<:AbstractString,DataType}; pv...
     try
         obj = getindex(fid, name_type_pair[1]; pv...)
         dat = read(obj, name_type_pair[2])
+        close(obj)
+
     finally
         close(fid)
     end
@@ -560,6 +563,8 @@ function h5read(filename, name::AbstractString, indices::Tuple{Vararg{Union{Abst
     try
         dset = getindex(fid, name; pv...)
         dat = dset[indices...]
+        close(dset)
+
     finally
         close(fid)
     end

...performance of h5read is on par with 0.13.7 again. Looks like HDF5 "delayed close" comes into play. I guess you know in detail the architectural changes from 0.13.7 to assess why this is suddenly triggered...

@jmert
Copy link
Contributor

jmert commented Dec 3, 2020

Thanks for that tip — that's easy enough to patch; see #754.

Adding another test to my script (but reducing the range of commits being tested) on Julia v1.5.3:
image

@musm
Copy link
Member

musm commented Dec 3, 2020

@jmert that scripts looks very useful, do you mind sharing it?

@jmert
Copy link
Contributor

jmert commented Dec 3, 2020

Sure — it's a bit of a mess and should probably be properly using the full tools of BenchmarkTools, but it grew organically from what I thought was going to be a much smaller test.

https://gist.github.com/jmert/b4af1ba8928985f990afbe0284dcaf2e

@musm
Copy link
Member

musm commented Dec 9, 2020

closing this as I think we've addressed most of the "major" regressions

@musm musm closed this as completed Dec 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants