Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Biological Observation Matrix (BIOM) files #3715

Merged
merged 27 commits into from
Dec 2, 2024
Merged

Conversation

chrisvanrun
Copy link
Contributor

@chrisvanrun chrisvanrun commented Nov 22, 2024

Part of the pitch:

This PR introduces support for BIOM files. It validates them by attempting to parse them via the biom-format library.

An explicit check is made that all uploaded BIOM are in hd5f format. This is the binary compressed version and I'd argue it's the only one we should support.

Once this is merged:
- #3713

BIOM file validation is hooked to a larger worker queue.

Isolated virtualenv

After some (painful) debugging I've refactored the way the BIOM parser is called. This alleviates the problem with incorrect versioned HDF5 libraries being pre-loaded.

The base container now also generates a 'biom' virtual environment under /opt/virtualenvs. Apart from fixing the import bug, it has the benefit of isolating the parser libraries from the web app.

@chrisvanrun
Copy link
Contributor Author

/opt/poetry/.venv/lib/python3.11/site-packages/h5py/__init__.py:36: UserWarning: h5py is running against HDF5 1.10.8 when it was built against 1.14.4, this may cause problems
  _warn(("h5py is running against HDF5 {0} when it was built against {1}, "

Interesting, what is going on here?

@chrisvanrun
Copy link
Contributor Author

chrisvanrun commented Nov 22, 2024

/opt/poetry/.venv/lib/python3.11/site-packages/h5py/__init__.py:36: UserWarning: h5py is running against HDF5 1.10.8 when it was built against 1.14.4, this may cause problems
  _warn(("h5py is running against HDF5 {0} when it was built against {1}, "

Interesting, what is going on here?

The local HDF5 libs are likely outdated in the testing container, so they need to be rebuilt. I've honestly never bothered checking how it was build.

After installing hdf5-tools locally on my WSL Unbuntu 20.04.6 the h5cc confirms that 1.10.4 is the default installed:

$ h5cc -showconfig
  SUMMARY OF THE HDF5 CONFIGURATION
  =================================

General Information:
-------------------
         HDF5 Version: 1.10.4
        Configured on: Mon, 13 Apr 2020 12:15:08 +0000
        Configured by: Debian
          Host system: x86_64-pc-linux-gnu
    Uname information: Debian
             Byte sex: little-endian
   Installation point: /usr
          Flavor name: serial
          [...]

This sounds like a problem for Monday Chris.

@chrisvanrun
Copy link
Contributor Author

Monday Chris. Checkout: https://github.com/HDFGroup/hdf5/releases

@chrisvanrun
Copy link
Contributor Author

Or, Monday Chris, you might consider pinning the python wrappers on the version of the Ubunu binaries.

@chrisvanrun
Copy link
Contributor Author

Ok. Can replicate locally. So that is a start. The testing container reports this version of hdf5: 1.10.8. As such it is newer than my passing local setup that tests that run on 1.10.4.

@chrisvanrun
Copy link
Contributor Author

chrisvanrun commented Nov 25, 2024

[WIP]

h5py version checks. The current version is 3.12.1

The HDF5 lib is currently at 1.14.5.

Different versions of the h5py wrapper:

3.11.0

Release: Apr 10, 2024

Errors on 1 with:

Warning! ***HDF5 library version mismatched error***
The HDF5 header files used to compile this application do not match
the version used by the HDF5 library to which this application is linked.
Data corruption or segmentation faults may occur if the application continues.
This can happen when an application was compiled by one version of HDF5 but
linked with a different version of static or shared HDF5 library.
You should recompile the application or check your shared library related
settings such as 'LD_LIBRARY_PATH'.
You can, at your own risk, disable this warning by setting the environment
variable 'HDF5_DISABLE_VERSION_CHECK' to a value of '1'.
Setting it to 2 or higher will suppress the warning messages totally.
Headers are 1.14.2, library is 1.10.8
            SUMMARY OF THE HDF5 CONFIGURATION
            =================================

General Information:
-------------------
                   HDF5 Version: 1.10.8
                  Configured on: Sun, 18 Dec 2022 17:20:33 +0000
                  Configured by: Debian
                    Host system: x86_64-pc-linux-gnu
              Uname information: Debian
                       Byte sex: little-endian
             Installation point: /usr
                    Flavor name: serial

Compiling Options:
------------------
                     Build Mode: production
              Debugging Symbols: no
                        Asserts: no
                      Profiling: no
             Optimization Level: high

Linking Options:
----------------
                      Libraries: static, shared
  Statically Linked Executables:
                        LDFLAGS: -Wl,-z,relro
                     H5_LDFLAGS: -Wl,--version-script,$(top_srcdir)/debian/map_serial.ver
                     AM_LDFLAGS:
                Extra libraries: -lcrypto -lcurl -lpthread -lsz -lz -ldl -lm
                       Archiver: ar
                       AR_FLAGS: cr
                         Ranlib: x86_64-linux-gnu-ranlib

Languages:
----------
                              C: yes
                     C Compiler: /usr/bin/gcc
                       CPPFLAGS: -Wdate-time -D_FORTIFY_SOURCE=2
                    H5_CPPFLAGS: -D_GNU_SOURCE -D_POSIX_C_SOURCE=200809L   -DNDEBUG -UH5_DEBUG_API
                    AM_CPPFLAGS:
                        C Flags: -g -O2 -ffile-prefix-map=$(top_srcdir)=. -fstack-protector-strong -Wformat -Werror=format-security
                     H5 C Flags:  -std=c99  -Wall -Wcast-qual -Wconversion -Wextra -Wfloat-equal -Wformat=2 -Winit-self -Winvalid-pch -Wmissing-include-dirs -Wno-c++-compat -Wno-format-nonliteral -Wshadow -Wundef -Wwrite-strings -pedantic -Wlarger-than=2560 -Wlogical-op -Wframe-larger-than=16384 -Wpacked-bitfield-compat -Wsync-nand -Wstrict-overflow=5 -Wno-unsuffixed-float-constants -Wdouble-promotion -Wtrampolines -Wstack-usage=8192 -Wmaybe-uninitialized -Wdate-time -Warray-bounds=2 -Wc99-c11-compat -Wduplicated-cond -Whsa -Wnormalized -Wnull-dereference -Wunused-const-variable -Walloca -Walloc-zero -Wduplicated-branches -Wformat-overflow=2 -Wformat-truncation=1 -Wrestrict -Wattribute-alias -Wcast-align=strict -Wshift-overflow=2 -Wattribute-alias=2 -Wmissing-profile -Wc11-c2x-compat -fstdarg-opt -fdiagnostics-urls=never -fno-diagnostics-color -s -Wno-aggregate-return -Wno-inline -Wno-missing-format-attribute -Wno-missing-noreturn -Wno-overlength-strings -Wno-jump-misses-init -Wno-suggest-attribute=const -Wno-suggest-attribute=noreturn -Wno-suggest-attribute=pure -Wno-suggest-attribute=format -Wno-suggest-attribute=cold -Wno-suggest-attribute=malloc  -Wbad-function-cast -Wimplicit-function-declaration -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpacked -Wpointer-sign -Wpointer-to-int-cast -Wredundant-decls -Wstrict-prototypes -Wswitch -Wunused-function -Wunused-variable -Wunused-parameter -Wcast-align -Wunused-but-set-variable -Wformat -Wincompatible-pointer-types -Wshadow -Wcast-function-type -Wmaybe-uninitialized -O3 @H5_ECFLAGS@
                     AM C Flags:
               Shared C Library: yes
               Static C Library: yes


                        Fortran: yes
               Fortran Compiler: /usr/bin/gfortran
                  Fortran Flags: -g -O2 -ffile-prefix-map=$(top_srcdir)=. -fstack-protector-strong
               H5 Fortran Flags:  -std=f2008  -Waliasing -Wall -Wcharacter-truncation -Wextra -Wimplicit-interface -Wsurprising -Wunderflow -pedantic -Warray-temporaries -Wintrinsics-std -Wimplicit-procedure -Wreal-q-constant -Wfunction-elimination -Wrealloc-lhs -Wrealloc-lhs-all -Wno-c-binding-type -Winteger-division -Wfrontend-loop-interchange  -fdiagnostics-urls=never -fno-diagnostics-color -s -O3
               AM Fortran Flags:
         Shared Fortran Library: yes
         Static Fortran Library: yes

                            C++: yes
                   C++ Compiler: /usr/bin/g++
                      C++ Flags: -g -O2 -ffile-prefix-map=$(top_srcdir)=. -fstack-protector-strong -Wformat -Werror=format-security
                   H5 C++ Flags:   -Wall -Wcast-qual -Wconversion -Wctor-dtor-privacy -Weffc++ -Wextra -Wfloat-equal -Wformat=2 -Winit-self -Winvalid-pch -Wmissing-include-dirs -Wno-format-nonliteral -Wnon-virtual-dtor -Wold-style-cast -Woverloaded-virtual -Wreorder -Wshadow -Wsign-promo -Wundef -Wwrite-strings -pedantic -Wlarger-than=2560 -Wlogical-op -Wframe-larger-than=16384 -Wpacked-bitfield-compat -Wsync-nand -Wstrict-overflow=5 -Wdouble-promotion -Wtrampolines -Wstack-usage=8192 -Wmaybe-uninitialized -Wdate-time -Wopenmp-simd -Warray-bounds=2 -Wduplicated-cond -Whsa -Wnormalized -Wnull-dereference -Wunused-const-variable -Walloca -Walloc-zero -Wduplicated-branches -Wformat-overflow=2 -Wformat-truncation=1 -Wrestrict -Wattribute-alias -Wcast-align=strict -Wshift-overflow=2 -Wattribute-alias=2 -Wmissing-profile -Wno-deprecated-copy -fstdarg-opt -fdiagnostics-urls=never -fno-diagnostics-color -s  -Wcast-align -Wmissing-declarations -Wpacked -Wredundant-decls -Wswitch -Wunused-but-set-variable -Wunused-function -Wunused-variable -Wunused-parameter -Wshadow -O3 @H5_ECXXFLAGS@
                   AM C++ Flags:  -DOLD_HEADER_FILENAME -DHDF_NO_NAMESPACE -DNO_STATIC_CAST
             Shared C++ Library: yes
             Static C++ Library: yes

                           Java: yes
                  Java Compiler: /usr/bin/java (openjdk 17.0.5 2022-10-18)


Features:
---------
                   Parallel HDF5: no
Parallel Filtered Dataset Writes: no
              Large Parallel I/O: no
              High-level library: yes
                Build HDF5 Tests: yes
                Build HDF5 Tools: yes
                    Threadsafety: yes
             Default API mapping: v18
  With deprecated public symbols: yes
          I/O filters (external): deflate(zlib),szip(encoder)
                             MPE: no
                      Direct VFD: no
                      Mirror VFD: no
              (Read-Only) S3 VFD: yes
            (Read-Only) HDFS VFD: no
                         dmalloc: no
  Packages w/ extra debug output: none
                     API tracing: no
            Using memory checker: no
 Memory allocation sanity checks: no
          Function stack tracing: no
                Use file locking: best-effort
       Strict file format checks: no
    Optimization instrumentation: no
Bye...
bash: line 1:     7 Aborted                 python manage.py migrate
make: *** [Makefile:101: development_fixtures] Error 134

3.8.0 / 3.10.0

Release: Jan 23, 2023

Errors:

 File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/app/grandchallenge/challenges/models.py", line 48, in <module>
    from grandchallenge.components.models import GPUTypeChoices
  File "/app/grandchallenge/components/models.py", line 54, in <module>
    from grandchallenge.components.validators import (
  File "/app/grandchallenge/components/validators.py", line 4, in <module>
    import biom
  File "/opt/poetry/.venv/lib/python3.11/site-packages/biom/__init__.py", line 51, in <module>
    from .table import Table
  File "/opt/poetry/.venv/lib/python3.11/site-packages/biom/table.py", line 176, in <module>
    import h5py
  File "/opt/poetry/.venv/lib/python3.11/site-packages/h5py/__init__.py", line 25, in <module>
    from . import _errors
  File "h5py/_errors.pyx", line 1, in init h5py._errors
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
make: *** [Makefile:101: development_fixtures] Error 1

Suspected

@chrisvanrun
Copy link
Contributor Author

Guh, auto rebase seems to have messed some things up. Will clean up tomorrow.

@chrisvanrun
Copy link
Contributor Author

Nudged the poetry.lock to force a rebuild of web-base. Let's see if gets the ball running.

@chrisvanrun chrisvanrun marked this pull request as ready for review November 27, 2024 19:39
@chrisvanrun chrisvanrun requested a review from jmsmkn as a code owner November 27, 2024 19:39
@chrisvanrun
Copy link
Contributor Author

chrisvanrun commented Nov 28, 2024

Can prob utilize the virtual envs to very quickly address:

@jmsmkn jmsmkn self-assigned this Dec 2, 2024
@jmsmkn jmsmkn assigned chrisvanrun and unassigned jmsmkn Dec 2, 2024
@jmsmkn jmsmkn merged commit efac102 into main Dec 2, 2024
8 checks passed
@jmsmkn jmsmkn deleted the BIOM-support branch December 2, 2024 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants