Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for parallel I/O #1486

Closed
Tracked by #1332
JoshuaLampert opened this issue May 24, 2023 · 16 comments
Closed
Tracked by #1332

Tests for parallel I/O #1486

JoshuaLampert opened this issue May 24, 2023 · 16 comments
Labels
parallelization Related to MPI, threading, tasks etc. testing

Comments

@JoshuaLampert
Copy link
Member

I was also thinking about testing the parallel I/O. As long as HDF5_jll.jl doesn't support MPI, we would need to install a
custom HDF5 with MPI support on github-actions and set the environment variables and preferences accordingly. We
could use a similar setup as we have in P4est.jl, I guess.

That would be an option. We could add another CI job running MPI tests on Ubuntu with parallel HDF5 enabled. This shouldn't be too expensive.

Ok, I can try that. Maybe I'll need some help of you to set up the CI job correctly.

Not sure if this is necessary. If we get parallel I/O support from an MPI-enabled HDF5_jll soon, I don't think we need to invest time in a sophisticated CI setup right now. I'd be ok with omitting this but creating an issue for it once JuliaPackaging/Yggdrasil#6551 is merged and HDF5 is updated to use it.

Originally posted by @sloede in #1399 (comment)

@JoshuaLampert JoshuaLampert added testing parallelization Related to MPI, threading, tasks etc. labels May 25, 2023
@JoshuaLampert JoshuaLampert mentioned this issue May 26, 2023
8 tasks
@JoshuaLampert
Copy link
Member Author

With the new version of HDF5_jll (v1.14.0+0) parallel HDF5 should be enabled by default, however in our CI tests the old version is still used, see e.g. this run. Do you know why it doesn't use the new version, @ranocha? The compat of HDF5.jl allows the new version, see here.

@ranocha
Copy link
Member

ranocha commented Jun 14, 2023

That's a Project.toml stating version v17.0 of HDF5.jl. Is it already released?

@JoshuaLampert
Copy link
Member Author

No, v0.17 is not yet released, but the compat is already included in the current release v0.16.15, see here.

@ranocha
Copy link
Member

ranocha commented Jun 14, 2023

Maybe something else is holding it back? Do you get the new JLL locally? You could create a temporary project with Trixi.jl and all test dependencies and then try to install the new HDF5 JLL, too.

@JoshuaLampert
Copy link
Member Author

When I do locally

julia> Pkg.activate(temp = true); Pkg.add("Trixi")

it adds HDF_jll.jl v1.14.0+0. After

julia> Pkg.test("Trixi")

it reports to use HDF5_jll.jl v1.12.2+2.

julia> Pkg.add("HDF5_jll")

gives the new version. What can I do now?

@ranocha
Copy link
Member

ranocha commented Jun 16, 2023

What happens if you create a new directory, copy test/Project.toml into it, add you local clone of Trixi.jl to it, and Pkg.instantiate it? By doing this, you should basically get the test environment that we use. Then, you can use the Pkg tools to see why you do not get the newest release of the HDF5 JLL package

@JoshuaLampert
Copy link
Member Author

I did as you described and then tried:

julia> Pkg.why("HDF5_jll")
  Trixi  HDF5  HDF5_jll
  Trixi  StartUpDG  HDF5  HDF5_jll
julia> Pkg.status(; outdated = true, mode = PKGMODE_MANIFEST)
Status `Manifest.toml`
⌅ [587475ba] Flux v0.13.12 (<v0.13.16) [compat]
⌅ [61eb1bfa] GPUCompiler v0.20.3 (<v0.21.0): CUDA
⌅ [872c559c] NNlib v0.8.21 (<v0.9.1): Flux, NNlibCUDA
⌅ [356022a1] NamedDims v0.2.50 (<v1.2.1): Kronecker
⌅ [c0aeaf25] SciMLOperators v0.2.12 (<v0.3.0): DiffEqBase, LinearSolve, OrdinaryDiffEq, SciMLBase, SparseDiffTools
⌅ [2913bbd2] StatsBase v0.33.21 (<v0.34.0): Flux
⌅ [62b44479] CUDNN_jll v8.8.1+0 (<v8.9.2+0): cuDNN
⌃ [0234f1f7] HDF5_jll v1.12.2+2 (<v1.14.0+0)
⌅ [e9f186c6] Libffi_jll v3.2.2+1 (<v3.4.4+0): Glib_jll, HarfBuzz_jll, Wayland_jll
⌅ [856f044c] MKL_jll v2022.2.0+0 (<v2023.1.0+0): FFTW
⌅ [458c3c95] OpenSSL_jll v1.1.21+0 (<v3.0.9+0): FFMPEG_jll, HDF5_jll, Qt5Base_jll
⌅ [784f63db] Qhull_jll v8.0.1003+0 (<v2020.2.0+0): QhullMiniWrapper_jll
⌅ [214eeab7] fzf_jll v0.29.0+0 (<v0.35.1+0): JLFzf
julia> Pkg.update("HDF5_jll")
    Updating registry at `~/.julia/registries/General.toml`
  No Changes to `Project.toml`
  No Changes to `Manifest.toml`
julia> Pkg.update(mode = PKGMODE_MANIFEST)
    Updating registry at `~/.julia/registries/General.toml`
  No Changes to `Project.toml`
  No Changes to `Manifest.toml`
julia> Pkg.add(name = "HDF5_jll", version = "1.14")
   Resolving package versions...
ERROR: Unsatisfiable requirements detected for package FFMPEG_jll [b22a6f82]:
 FFMPEG_jll [b22a6f82] log:
 ├─possible versions are: 4.1.0-4.4.2 or uninstalled
 ├─restricted by compatibility requirements with OpenSSL_jll [458c3c95] to versions: uninstalled
 │ └─OpenSSL_jll [458c3c95] log:
 │   ├─possible versions are: 1.1.1-3.0.9 or uninstalled
 │   └─restricted by compatibility requirements with HDF5_jll [0234f1f7] to versions: 3.0.8-3.0.9
 │     └─HDF5_jll [0234f1f7] log:
 │       ├─possible versions are: 1.10.5-1.14.0 or uninstalled
 │       └─restricted to versions 1.14 by an explicit requirement, leaving only versions: 1.14.0

In the status overview HDF5_jll has a green ^, but it can't be updated. Isn't that strange? The problem seems to be with FFMPEG_jll?

@ranocha
Copy link
Member

ranocha commented Jun 17, 2023

It looks like FFFMPEG_jll has OpenSSL_jll in its list of dependencies but does not set a [compat] entry for it. I guess this seems to be the problem here, see
https://github.com/JuliaBinaryWrappers/FFMPEG_jll.jl/blob/main/Project.toml

Could you please file an issue there and ask the maintainers whether they could add this compat entry to fix the issue? An MWE should be to install both FFMPEG_jll and the new version of HDF5_jll in a temporary environment.

@JoshuaLampert
Copy link
Member Author

I filed an issue: JuliaIO/FFMPEG.jl#56.

@ranocha
Copy link
Member

ranocha commented Jun 17, 2023

Thanks!

@JoshuaLampert
Copy link
Member Author

The issue regarding FFMPEG_jll is now fixed. However, CI still uses HDF5_jll v1.12.2+2. It did the same again as in the post above and now get

julia> Pkg.status(; outdated = true, mode = PKGMODE_MANIFEST)
Status `~/Schreibtisch/test/Manifest.toml`
⌅ [47edcb42] ADTypes v0.1.6 (<v0.2.0): OrdinaryDiffEq, SciMLBase, SparseDiffTools
⌅ [67c07d97] Automa v0.8.4 (<v1.0.0): MathTeXEngine
⌅ [c3611d14] ColorVectorSpace v0.9.10 (<v0.10.0): FreeTypeAbstraction, ImageCore
⌅ [927a84f5] DelaunayTriangulation v0.7.2 (<v0.8.7): Makie
⌅ [587475ba] Flux v0.13.12 (<v0.14.2) [compat]
⌅ [61eb1bfa] GPUCompiler v0.21.4 (<v0.22.0): CUDA
⌃ [c817782e] ImageBase v0.1.5 (<v0.1.7)
⌅ [a09fc81d] ImageCore v0.9.4 (<v0.10.1): ImageBase
⌃ [692b3bcd] JLLWrappers v1.4.1 (<v1.4.2)
⌅ [7eb4fadd] Match v1.2.0 (<v2.0.0): Makie
⌅ [872c559c] NNlib v0.8.21 (<v0.9.4): Flux, NNlibCUDA
⌅ [356022a1] NamedDims v0.2.50 (<v1.2.1): Kronecker
⌅ [f27b6e38] Polynomials v3.2.13 (<v4.0.0): SimplePolynomials
⌅ [c5dd0088] StableHashTraits v0.3.1 (<v1.0.1): Makie
⌅ [2913bbd2] StatsBase v0.33.21 (<v0.34.0): Flux
⌅ [4ee394cb] CUDA_Driver_jll v0.5.0+1 (<v0.6.0+2): CUDA, CUDA_Runtime_jll
⌅ [76a88914] CUDA_Runtime_jll v0.6.0+0 (<v0.7.0+1): CUDA
⌃ [b22a6f82] FFMPEG_jll v4.4.2+2 (<v4.4.4+1)
⌅ [d7e528f0] FreeType2_jll v2.10.4+0 (<v2.13.1+0): FreeType
⌃ [0234f1f7] HDF5_jll v1.12.2+2 (<v1.14.1+0)
⌅ [e9f186c6] Libffi_jll v3.2.2+1 (<v3.4.4+0): Glib_jll, HarfBuzz_jll, Wayland_jll
⌅ [458c3c95] OpenSSL_jll v1.1.22+0 (<v3.0.10+0): FFMPEG_jll, HDF5_jll, Qt6Base_jll
⌅ [214eeab7] fzf_jll v0.29.0+0 (<v0.35.1+0): JLFzf

julia> Pkg.add(name = "HDF5_jll", version = "1.14")
   Resolving package versions...
ERROR: Unsatisfiable requirements detected for package Bzip2_jll [6e34b625]:
 Bzip2_jll [6e34b625] log:
 ├─possible versions are: 1.0.6-1.0.8 or uninstalled
 ├─restricted by compatibility requirements with Cairo_jll [83423d85] to versions: [1.0.6, 1.0.8]
 │ └─Cairo_jll [83423d85] log:
 │   ├─possible versions are: 1.14.12-1.16.1 or uninstalled
 │   └─restricted by compatibility requirements with Cairo [159f3aea] to versions: 1.16.0-1.16.1
 │     └─Cairo [159f3aea] log:
 │       ├─possible versions are: 0.5.3-1.0.5 or uninstalled
 │       └─restricted by compatibility requirements with CairoMakie [13f3f980] to versions: 1.0.4-1.0.5
 │         └─CairoMakie [13f3f980] log:
 │           ├─possible versions are: 0.0.1-0.10.8 or uninstalled
 │           ├─restricted to versions 0.6-0.10 by an explicit requirement, leaving only versions: 0.6.0-0.10.8
 │           └─restricted by compatibility requirements with Makie [ee78f7c6] to versions: [0.1.1-0.4.7, 0.8.4-0.10.8] or uninstalled, leaving only versions: 0.8.4-0.10.8
 │             └─Makie [ee78f7c6] log:
 │               ├─possible versions are: 0.9.0-0.19.8 or uninstalled
 │               ├─restricted by compatibility requirements with CairoMakie [13f3f980] to versions: [0.14.0-0.15.3, 0.16.1-0.19.8]
 │               │ └─CairoMakie [13f3f980] log: see above
 │               └─restricted by compatibility requirements with DocStringExtensions [ffbed154] to versions: [0.9.0-0.12.0, 0.17.4-0.19.8] or uninstalled, leaving only versions: 0.17.4-0.19.8
 │                 └─DocStringExtensions [ffbed154] log:
 │                   ├─possible versions are: 0.4.6-0.9.3 or uninstalled
 │                   ├─restricted by compatibility requirements with OrdinaryDiffEq [1dea7af3] to versions: 0.8.0-0.9.3
 │                   │ └─OrdinaryDiffEq [1dea7af3] log:
 │                   │   ├─possible versions are: 4.0.0-6.53.4 or uninstalled
 │                   │   └─restricted to versions 6.49.1-6 by an explicit requirement, leaving only versions: 6.49.1-6.53.4
 │                   └─restricted by compatibility requirements with DiffEqBase [2b5f629d] to versions: 0.9.0-0.9.3
 │                     └─DiffEqBase [2b5f629d] log:
 │                       ├─possible versions are: 3.13.2-6.128.1 or uninstalled
 │                       └─restricted by compatibility requirements with OrdinaryDiffEq [1dea7af3] to versions: 6.122.0-6.128.1
 │                         └─OrdinaryDiffEq [1dea7af3] log: see above
 ├─restricted by compatibility requirements with GR_jll [d2c73de3] to versions: 1.0.6
 │ └─GR_jll [d2c73de3] log:
 │   ├─possible versions are: 0.51.2-0.72.9 or uninstalled
 │   ├─restricted by compatibility requirements with GR [28b8d3ca] to versions: [0.53.0, 0.57.0-0.72.9]
 │   │ └─GR [28b8d3ca] log:
 │   │   ├─possible versions are: 0.35.0-0.72.9 or uninstalled
 │   │   ├─restricted by compatibility requirements with Plots [91a5bcdd] to versions: [0.53.0-0.66.2, 0.68.0, 0.69.1-0.72.9]
 │   │   │ └─Plots [91a5bcdd] log:
 │   │   │   ├─possible versions are: 0.12.1-1.38.17 or uninstalled
 │   │   │   └─restricted to versions 1.16.0-1 by an explicit requirement, leaving only versions: 1.16.0-1.38.17
 │   │   └─restricted by compatibility requirements with GR_jll [d2c73de3] to versions: 0.35.0-0.69.5 or uninstalled, leaving only versions: [0.53.0-0.66.2, 0.68.0, 0.69.1-0.69.5]
 │   │     └─GR_jll [d2c73de3] log: see above
 │   ├─restricted by compatibility requirements with Qt5Base_jll [ea2cea3b] to versions: [0.51.2-0.56.1, 0.72.9] or uninstalled, leaving only versions: [0.53.0, 0.72.9]
 │   │ └─Qt5Base_jll [ea2cea3b] log:
 │   │   ├─possible versions are: 5.15.2-5.15.3 or uninstalled
 │   │   └─restricted by compatibility requirements with OpenSSL_jll [458c3c95] to versions: uninstalled
 │   │     └─OpenSSL_jll [458c3c95] log:
 │   │       ├─possible versions are: 1.1.1-3.0.10 or uninstalled
 │   │       ├─restricted by compatibility requirements with HDF5_jll [0234f1f7] to versions: 3.0.8-3.0.10
 │   │       │ └─HDF5_jll [0234f1f7] log:
 │   │       │   ├─possible versions are: 1.10.5-1.14.1 or uninstalled
 │   │       │   └─restricted to versions 1.14 by an explicit requirement, leaving only versions: 1.14.0-1.14.1
 │   │       └─restricted by compatibility requirements with FFMPEG_jll [b22a6f82] to versions: 3.0.9-3.0.10
 │   │         └─FFMPEG_jll [b22a6f82] log:
 │   │           ├─possible versions are: 4.1.0-4.4.4 or uninstalled
 │   │           ├─restricted by compatibility requirements with OpenSSL_jll [458c3c95] to versions: 4.4.4 or uninstalled
 │   │           │ └─OpenSSL_jll [458c3c95] log: see above
 │   └─restricted by compatibility requirements with Qt6Base_jll [c0090381] to versions: 0.51.2-0.72.8 or uninstalled, leaving only versions: 0.53.0
 │     └─Qt6Base_jll [c0090381] log:
 │       ├─possible versions are: 6.0.3-6.4.2 or uninstalled
 │       └─restricted by compatibility requirements with OpenSSL_jll [458c3c95] to versions: uninstalled
 │         └─OpenSSL_jll [458c3c95] log: see above
 └─restricted by compatibility requirements with FFMPEG_jll [b22a6f82] to versions: 1.0.8 — no versions left
   └─FFMPEG_jll [b22a6f82] log: see above

So now the problem somehow seems to be with Bzip2_jll.

@ranocha
Copy link
Member

ranocha commented Aug 17, 2023

So it looks like GR_jll would need to update its compat bounds on Bzip2_jll (allowing v1.0.8) to get it to work - or let FFMPEG_jll allow v1.0.6 of Bzip2_jll

@sloede
Copy link
Member

sloede commented Aug 17, 2023

GR_jll already allows Bzip2_jll@1.0.8, as far as I can tell:
https://github.com/JuliaPackaging/Yggdrasil/blob/85714fd83fef160410de74ce6f5c982588f85196/G/GR/build_tarballs.jl#L91

So it must be somewhere further down the rabbit hole...

@ranocha
Copy link
Member

ranocha commented Aug 17, 2023

Maybe it's only the newest version of GR_jll that allows it? Note that GR_jll is restricted by Qt6Base_jll [c0090381] to versions: 0.51.2-0.72.8 (excluding 0.72.9)

@JoshuaLampert
Copy link
Member Author

The version conflict finally seems to be resolved (JuliaPackaging/Yggdrasil#7619). Let's see in the next CI run if HDF5_jll@1.14 and thus the parallel HDF5 function are used.

@JoshuaLampert
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelization Related to MPI, threading, tasks etc. testing
Projects
None yet
Development

No branches or pull requests

3 participants