Performance regression in h5read #752

laborg · 2020-11-30T12:50:11Z

My system:

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

On 0.13.7:

@btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5","acc", (1,1))
  117.233 μs (53 allocations: 2.16 KiB)
-0.0027361426f0

on 14.1

@btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5", "acc", (1,1))
  1.799 ms (31 allocations: 1.22 KiB)
-0.0027361426f0

Any way to work around this, besides downgrading?

The text was updated successfully, but these errors were encountered:

musm · 2020-11-30T17:13:45Z

Dupe of #742. We have outstanding PRs that address these performance issues, should be fixed in the next week or so.

musm · 2020-11-30T17:14:12Z

Best bet is to wait if you can.

jmert · 2020-11-30T17:22:46Z

Is this a duplicate? The output suggests this is just reading a single Float32 value, while PR #745 is all about avoiding the (ab)use of large tuples in fixed-length arrays/compound data types. I think this case should be independent of that.

Edit: I just noticed the indexing — What is the complete data type and shape of the "acc" field? If it's a fixed array, then @musm's comments are much more likely to be true.

@laborg — the one thing I'd be interested in seeing further is whether you can show the same regression on a bare read rather than h5read which also must open and close the file on each invocation. That's to say, if there's been a regression in h5open/close time (maybe from switching to libhdf5 v1.12), the obvious suggestion is to open and hold a handle to the open file rather than opening and closing often.

musm · 2020-11-30T17:25:20Z

I'll leave this open until, we have feedback from OP.

laborg · 2020-12-01T06:51:53Z

An MWE, taken from the HDF5 docs, shown on 0.13.7 and master:

0.13.7

(jl_KT5sZW) pkg> st -m
Status `/tmp/jl_KT5sZW/Manifest.toml`
...
  [f67ccb44] HDF5 v0.13.7
  [0234f1f7] HDF5_jll v1.10.5+7
...

julia> f = tempname();

julia> h5write(f, "mygroup2/A", collect(reshape(1:120, 15, 8)));

julia> @btime h5read(f, "mygroup2/A", (2:3:15, 3:5))
  128.048 μs (73 allocations: 3.02 KiB)
5×3 Array{Int64,2}:
 32  47  62
 35  50  65
 38  53  68
 41  56  71
 44  59  74

julia> file = h5open(f, "r")
HDF5 data file: /tmp/jl_7TdQ2Y

julia> @btime read(file, "mygroup2/A");
  18.658 μs (25 allocations: 2.06 KiB)

master

(jl_OQXA8A) pkg> st -m
Status `/tmp/jl_OQXA8A/Manifest.toml`
...
  [f67ccb44] HDF5 v0.14.1 `https://github.com/JuliaIO/HDF5.jl.git#master`
  [0234f1f7] HDF5_jll v1.12.0+1
...


julia> f = tempname();

julia> h5write(f, "mygroup2/A", collect(reshape(1:120, 15, 8)));

julia> @btime h5read(f, "mygroup2/A", (2:3:15, 3:5))

  4.857 ms (45 allocations: 2.00 KiB)
5×3 Array{Int64,2}:
 32  47  62
 35  50  65
 38  53  68
 41  56  71
 44  59  74

julia> file = h5open(f, "r")
🗂️ HDF5.File: (read-only) /tmp/jl_Z4tLyB
└─ 📂 mygroup2
   └─ 🔢 A

julia> @btime read(file, "mygroup2/A");
  50.675 μs (16 allocations: 1.70 KiB)

So there is also a regression in reading data for an h5opened file without fancy indexing...

jmert · 2020-12-01T16:49:34Z

With that extra bit of info, I don't think it's the read time which has change significantly — rather, the time to open the dataset itself is longer.

julia> using HDF5, BenchmarkTools
[ Info: Precompiling HDF5 [f67ccb44-e63f-5c2f-98bd-6dc0ccc4ba2f]

julia> h5write("/tmp/test.h5", "mygroup/A", collect(reshape(1:120, 15, 8)))

julia> fid = h5open("/tmp/test.h5", "r")
HDF5 data file: /tmp/test.h5


# HDF5 v0.13.7:

julia> HDF5.libversion
v"1.10.4"

julia> @btime read($(fid["mygroup/A"]));
  6.478 μs (21 allocations: 1.94 KiB)

julia> @btime $(fid["mygroup/A"])[$(2:3:15), $(3:5)];
  17.027 μs (64 allocations: 2.83 KiB)

julia> @btime fid["mygroup/A"];
  3.514 μs (1 allocation: 32 bytes)


# HDF5 master:

julia> HDF5.libversion
v"1.12.0"

julia> @btime read($(fid["mygroup/A"]));
  7.566 μs (13 allocations: 1.61 KiB)

julia> @btime $(fid["mygroup/A"])[$(2:3:15), $(3:5)];
  11.769 μs (36 allocations: 1.83 KiB)

julia> @btime fid["mygroup/A"];
  13.872 μs (3 allocations: 96 bytes)



# HDF5 PR #745

julia> HDF5.libversion
v"1.12.0"

julia> @btime read($(fid["mygroup/A"]));
  8.718 μs (14 allocations: 1.62 KiB)

julia> @btime $(fid["mygroup/A"])[$(2:3:15), $(3:5)];
  13.382 μs (36 allocations: 1.80 KiB)

julia> @btime fid["mygroup/A"];
  13.592 μs (3 allocations: 96 bytes)

Read-all is slightly slower, but there's a corresponding improvement in indexed reads going from HDF5.jl v0.13.7 to 0.14/master when the opened HDF5.Dataset is interpolated into the @btime expression. The big regression appears to be just in opening the dataset itself, as the third benchmark in each grouping shows — 3 μs on v0.13.7 vs ~14 μs on v0.14.

I haven't yet tried isolating whether that's due to libhdf5 moving from v1.10 to v1.12 (in which case there's not much to do here) or whether it's something else in the restructuring in this package between the two releases.

When there are no keyword arguments, just use the global `DEFAULT_PROPERTIES` rather than creating new `HDF5.Properties` with default values. This improves the regression identified in issue JuliaIO#752. Specifically, the benchmark used in JuliaIO#752 (comment) is improved from ```julia julia> @‌btime fid["mygroup/A"]; 13.872 μs (3 allocations: 96 bytes) ``` on master to ```julia julia> @‌btime fid["mygroup/A"]; 6.778 μs (1 allocation: 32 bytes) ``` with this commit.

jmert · 2020-12-01T17:44:38Z

PR #753 helps, but it's not conclusive.

There's a regression in reads on Julia v1.5 that doesn't show up as clearly on Julia's master branch. I've run the test

julia> @btime read(fid, "mygroup/A")

using both Julia versions for various combinations of HDF5.jl: v0.13.7, master (~0.14.1), PR #753 (makes fewer properties objects), PR #745 (reworks read type normalization), and finally a branch which merges 745 and 753 onto master. The results look good for Julia 1.6/master, but there's still a regression on v1.5 (and presumably all v1.3–1.5):

	Julia v1.5.3	Julia v1.6.0-DEV.1595
HDF5.jl v0.13.7	23.818 μs (25 allocations: 2.06 KiB)	24.344 μs (22 allocations: 1.97 KiB)
HDF5.jl master	54.040 μs (16 allocations: 1.70 KiB)	24.759 μs (16 allocations: 1.70 KiB)
HDF5.jl PR #753	43.468 μs (14 allocations: 1.64 KiB)	15.004 μs (14 allocations: 1.64 KiB)
HDF5.jl PR #745	56.561 μs (17 allocations: 1.72 KiB)	26.645 μs (17 allocations: 1.72 KiB)
HDF5.jl 753+745	44.348 μs (17 allocations: 1.73 KiB)	15.935 μs (17 allocations: 1.73 KiB)

musm · 2020-12-01T19:29:15Z

Is the regression related to v1.5 compiler issues? The v1.6 numbers look very good. If it's related to the compiler failing to optimize, perhaps there's not much we can do for now.

jmert · 2020-12-01T19:35:42Z

Yes, there must be something in the code that Julia v1.6 can better optimize than older versions can. A while back when I started investigating for #745 I had assumed there'd be a performance break between Julia v1.4 and v1.5 due to v1.5's ability to inline more structs, but since it's actually a break between v1.5 and v1.6, the cause must be something else. I haven't yet figured out what new feature it must be, though.

There are probably more fixes like #753 which can eat away at the regression and slowly improve performance (across all Julia versions), but I suspect there's always going to be a performance gap between v1.5- versus v1.6+ just due to the compiler being more capable.

When there are no keyword arguments, just use the global `DEFAULT_PROPERTIES` rather than creating new `HDF5.Properties` with default values. This improves the regression identified in issue #752. Specifically, the benchmark used in #752 (comment) is improved from ```julia julia> @‌btime fid["mygroup/A"]; 13.872 μs (3 allocations: 96 bytes) ``` on master to ```julia julia> @‌btime fid["mygroup/A"]; 6.778 μs (1 allocation: 32 bytes) ``` with this commit.

jmert · 2020-12-02T22:38:41Z

So, some unfortunate news: I can't reproduce my own timings from #752 (comment), particularly where Julia v1.6 seemed to be able to outperform v1.5. My guess at this point is that somehow I managed to get extra method specializations from whatever route of branch-switching I did while Revise kept updating the methods in the background.

I wanted to figure out what feature might be most responsible for the apparent performance difference between Julia v1.5 and v1.6, so in prepping for the ability to run across more git branches, I coded up a script to do the timing rather than doing it all by hand. Notably, the script launches a new Julia process (without my startup file) for each timing, so there can't be any interaction with Revise. The following is what I get on Julia v1.5:

versus what I get on Julia v1.6:

The rows step through various points in the commit history and made-up branches (left-most column), and the two remaining columns show the timings for operations like that in the header.

v0.13.7 — latest release on previous major version
v0.14.1 — latest release on current major version
v0.14.1+libhdf5_1.10 — HDF5_jll isn't actually bounded to only libhdf5 1.10 (see Restrict HDF5_jll to v1.10 series #743 for the 0.13 branch), so HDF5.jl is actually using libhdf5 v1.12 before Upgrade HDF5 to v1.12 #739 updates the compat requirements. This branch adds the version restriction
master — up to commit 11ba575
master+revert_libhdf5_1.12 — forces revert back down to libhdf5 1.10
master+fix_742 — merge between master and Fix #742, avoid instantiating large tuples #745
master+fix_742+revert_libhdf54_1.12 — like previous, but also forces back down to libhdf5 1.10

Quick observations:

Julia v1.5 and v1.6 generally have the same performance.
v0.13.7 vs v0.14.1+libhdf5_1.10 shows the bulk of the read(fid, "name") regression is in the large number of commits across major versions, but the switch from v0.14.1+libhdf5_1.10 up to v0.14.1 shows the underlying library switch has a non-negligible impact.
v0.14.1 up to master shows a few performance improvements again, but the gains are small.
read(::Group, ::String) varies more than read(::Dataset), suggesting changes in the higher levels (such as maybe HDF5 type -> Julia type translation??) are probably where gains can be found to close the gap again...
...but there's apparently just a flat performance hit from upgrading library versions that we probably can't do much about.

laborg · 2020-12-03T06:51:32Z

I think this might be a separate issue but continuous calls to h5read hurt performance in a linear fashion...

On Julia master + HDF5 14.1 + HDF5_jll 1.12 and 1.10!

julia> @btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5", "acc", (1,1))
  709.036 μs (26 allocations: 1.14 KiB)
-0.0027361426f0

julia> @btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5", "acc", (1,1))
  1.950 ms (26 allocations: 1.14 KiB)
-0.0027361426f0

julia> @btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5", "acc", (1,1))
  2.881 ms (26 allocations: 1.14 KiB)
-0.0027361426f0

julia> @btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5", "acc", (1,1))
  3.637 ms (26 allocations: 1.14 KiB)
-0.0027361426f0

Maybe some resources aren't freed properly?

on Julia master + HDF 0.13.7 + HDF5_jll 1.10

julia> @btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5", "acc", (1,1))
  122.234 μs (53 allocations: 2.16 KiB)
^[[A-0.0027361426f0

julia> @btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5", "acc", (1,1))
  121.866 μs (53 allocations: 2.16 KiB)
-0.0027361426f0

julia> @btime h5read("3bc6206c-bad8-43ba-9642-b2e06c4c7cd4.h5", "acc", (1,1))
  122.208 μs (53 allocations: 2.16 KiB)
-0.0027361426f0

laborg · 2020-12-03T09:56:12Z

I had a hunch and with this changes:

diff --git a/src/HDF5.jl b/src/HDF5.jl
index 2c1fddc..570c587 100644
--- a/src/HDF5.jl
+++ b/src/HDF5.jl
@@ -536,6 +536,7 @@ function h5read(filename, name::AbstractString; pv...)
     try
         obj = getindex(fid, name; pv...)
         dat = read(obj)
+        close(obj)
     finally
         close(fid)
     end
@@ -548,6 +549,8 @@ function h5read(filename, name_type_pair::Pair{<:AbstractString,DataType}; pv...
     try
         obj = getindex(fid, name_type_pair[1]; pv...)
         dat = read(obj, name_type_pair[2])
+        close(obj)
+
     finally
         close(fid)
     end
@@ -560,6 +563,8 @@ function h5read(filename, name::AbstractString, indices::Tuple{Vararg{Union{Abst
     try
         dset = getindex(fid, name; pv...)
         dat = dset[indices...]
+        close(dset)
+
     finally
         close(fid)
     end

...performance of h5read is on par with 0.13.7 again. Looks like HDF5 "delayed close" comes into play. I guess you know in detail the architectural changes from 0.13.7 to assess why this is suddenly triggered...

jmert · 2020-12-03T15:49:40Z

Thanks for that tip — that's easy enough to patch; see #754.

Adding another test to my script (but reducing the range of commits being tested) on Julia v1.5.3:

musm · 2020-12-03T16:40:51Z

@jmert that scripts looks very useful, do you mind sharing it?

jmert · 2020-12-03T16:44:46Z

Sure — it's a bit of a mess and should probably be properly using the full tools of BenchmarkTools, but it grew organically from what I thought was going to be a much smaller test.

https://gist.github.com/jmert/b4af1ba8928985f990afbe0284dcaf2e

musm · 2020-12-09T19:00:03Z

closing this as I think we've addressed most of the "major" regressions

musm closed this as completed Nov 30, 2020

musm reopened this Nov 30, 2020

jmert mentioned this issue Dec 1, 2020

Avoid creating default properties unnecessarily #753

Merged

jmert mentioned this issue Dec 3, 2020

Close objects in h5* immediate functions #754

Merged

jmert mentioned this issue Dec 4, 2020

Add size, ndims, length, etc methods to Dataspace #758

Merged

musm added the performance label Dec 7, 2020

jmert mentioned this issue Dec 7, 2020

RFC: Use fewer objects in h5open and read #762

Merged

musm closed this as completed Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression in h5read #752

Performance regression in h5read #752

laborg commented Nov 30, 2020 •

edited

Loading

musm commented Nov 30, 2020

musm commented Nov 30, 2020

jmert commented Nov 30, 2020 •

edited

Loading

musm commented Nov 30, 2020

laborg commented Dec 1, 2020 •

edited

Loading

jmert commented Dec 1, 2020

jmert commented Dec 1, 2020

musm commented Dec 1, 2020

jmert commented Dec 1, 2020

jmert commented Dec 2, 2020

laborg commented Dec 3, 2020

laborg commented Dec 3, 2020

jmert commented Dec 3, 2020

musm commented Dec 3, 2020

jmert commented Dec 3, 2020

musm commented Dec 9, 2020

Performance regression in h5read #752

Performance regression in h5read #752

Comments

laborg commented Nov 30, 2020 • edited Loading

musm commented Nov 30, 2020

musm commented Nov 30, 2020

jmert commented Nov 30, 2020 • edited Loading

musm commented Nov 30, 2020

laborg commented Dec 1, 2020 • edited Loading

0.13.7

master

jmert commented Dec 1, 2020

jmert commented Dec 1, 2020

musm commented Dec 1, 2020

jmert commented Dec 1, 2020

jmert commented Dec 2, 2020

laborg commented Dec 3, 2020

On Julia master + HDF5 14.1 + HDF5_jll 1.12 and 1.10!

on Julia master + HDF 0.13.7 + HDF5_jll 1.10

laborg commented Dec 3, 2020

jmert commented Dec 3, 2020

musm commented Dec 3, 2020

jmert commented Dec 3, 2020

musm commented Dec 9, 2020

laborg commented Nov 30, 2020 •

edited

Loading

jmert commented Nov 30, 2020 •

edited

Loading

laborg commented Dec 1, 2020 •

edited

Loading