Running the tests sometimes fails with OOM SIGKILL / MemoryError #112

lysnikolaou · 2024-10-03T13:08:01Z

Hey! 👋

I've been working on trying imagecodecs out with the 3.13 free-threaded build. Everything seems to be working as expected (most of the C heavylifting is done by Cython), except for one thing.

Running test_compressors fails when testing the lz4h5 codec. I first encountered this on my PC, but only when there were enough other programs open that my RAM wasn't enough. In this case a SIGKILL would kills the test process.

I then started testing in a Docker container. The test failure can be reproduced fairly consistently on a Linux aarch64 Docker container, hard-coded to provide 6GBs of RAM with ulimit -v 6000000. When doing so, running pytest fails with a MemoryError and CPython outputs a warnings. The failure only happens with the free-threaded build when the GIL is actually disabled, which points to an upstream CPython bug.

However, I still did a deep-dive and it turns out that lz4h5_encode does ask for a lot of memory, 1077952680 bytes to be exact. It calls PyBytes_FromStringAndSize with a value in the order of the size that's returned from LZ4_compressBound.

My questions is: This is a CPython bug, probably related to the GC, but allocating 1GB of RAM still seems excessive. Is this expected? Can it be reduced somehow or is this the best we can do?

Full test log

_____________________________________________________ test_compressors[lz4h5-decode-3069-new] ______________________________________________________

codec = 'lz4h5', func = 'decode', output = 'new', length = 3069

    @pytest.mark.filterwarnings('ignore: PY_SSIZE_T_CLEAN')
    @pytest.mark.parametrize(
        'output', ['new', 'bytearray', 'out', 'size', 'excess', 'trunc']
    )
    @pytest.mark.parametrize('length', [0, 2, 31 * 33 * 3])
    @pytest.mark.parametrize('func', ['encode', 'decode'])
    @pytest.mark.parametrize(
        'codec',
        [
            'bitshuffle',
            'brotli',
            'blosc',
            'blosc2',
            'bz2',
            'deflate',
            'gzip',
            'lz4',
            'lz4h',
            'lz4h5',
            'lz4f',
            'lzf',
            'lzfse',
            'lzham',
            'lzma',
            'lzw',
            'snappy',
            'szip',
            'zlib',
            'zlibng',
            'zopfli',
            'zstd',
        ],
    )
    def test_compressors(codec, func, output, length):
        """Test various non-image codecs."""
        if length:
            data = numpy.random.randint(255, size=length, dtype='uint8').tobytes()
        else:
            data = b''
    
        level = None
        if codec == 'blosc':
            if not imagecodecs.BLOSC.available or blosc is None:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.blosc_encode
            decode = imagecodecs.blosc_decode
            check = imagecodecs.blosc_check
            level = 9
            encoded = blosc.compress(data, clevel=level, typesize=1)
        elif codec == 'blosc2':
            if not imagecodecs.BLOSC2.available or blosc2 is None:
                pytest.skip(f'{codec} missing')
            if IS_PYPY:
                pytest.xfail('blosc2.compress fails under PyPy')
            encode = imagecodecs.blosc2_encode
            decode = imagecodecs.blosc2_decode
            check = imagecodecs.blosc2_check
            level = 5
            encoded = blosc2.compress2(data, clevel=level, typesize=8)
        elif codec == 'zlib':
            if not imagecodecs.ZLIB.available or zlib is None:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.zlib_encode
            decode = imagecodecs.zlib_decode
            check = imagecodecs.zlib_check
            level = 5
            encoded = zlib.compress(data, level)
        elif codec == 'zlibng':
            if not imagecodecs.ZLIBNG.available or zlib is None:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.zlibng_encode
            decode = imagecodecs.zlibng_decode
            check = imagecodecs.zlibng_check
            level = 5
            encoded = zlib.compress(data, level)
        elif codec == 'deflate':
            if not imagecodecs.DEFLATE.available:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.deflate_encode
            decode = imagecodecs.deflate_decode
            check = imagecodecs.deflate_check
            level = 8
            # TODO: use a 3rd party libdeflate wrapper
            # encoded = deflate.compress(data, level)
            encoded = encode(data, level)
        elif codec == 'gzip':
            if not imagecodecs.GZIP.available:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.gzip_encode
            decode = imagecodecs.gzip_decode
            check = imagecodecs.gzip_check
            level = 8
            encoded = encode(data, level)
            # encoded = gzip.compress(data, level)
        elif codec == 'lzma':
            if not imagecodecs.LZMA.available or lzma is None:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.lzma_encode
            decode = imagecodecs.lzma_decode
            check = imagecodecs.lzma_check
            level = 6
            encoded = lzma.compress(data)
        elif codec == 'lzw':
            if not imagecodecs.LZW.available:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.lzw_encode
            decode = imagecodecs.lzw_decode
            check = imagecodecs.lzw_check
            encoded = encode(data)
        elif codec == 'zstd':
            if not imagecodecs.ZSTD.available or zstd is None:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.zstd_encode
            decode = imagecodecs.zstd_decode
            check = imagecodecs.zstd_check
            level = 5
            if length == 0:
                # bug in zstd.compress?
                encoded = encode(data, level)
            else:
                encoded = zstd.compress(data, level)
        elif codec == 'lzf':
            if not imagecodecs.LZF.available or lzf is None:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.lzf_encode
            decode = imagecodecs.lzf_decode
            check = imagecodecs.lzf_check
            encoded = lzf.compress(data, ((len(data) * 33) >> 5) + 1)
            if encoded is None:
                pytest.xfail("lzf can't compress empty input")
        elif codec == 'lzfse':
            if not imagecodecs.LZFSE.available or lzfse is None:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.lzfse_encode
            decode = imagecodecs.lzfse_decode
            check = imagecodecs.lzfse_check
            encoded = lzfse.compress(data)
        elif codec == 'lzham':
            # TODO: test against pylzham?
            if not imagecodecs.LZHAM.available:  # or lzham is None
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.lzham_encode
            decode = imagecodecs.lzham_decode
            check = imagecodecs.lzham_check
            level = 5
            encoded = encode(data, level)
        elif codec == 'lz4':
            if not imagecodecs.LZ4.available or lz4 is None:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.lz4_encode
            decode = imagecodecs.lz4_decode
            check = imagecodecs.lz4_check
            level = 1
            encoded = lz4.block.compress(data, store_size=False)
        elif codec == 'lz4h':
            if not imagecodecs.LZ4.available or lz4 is None:
                pytest.skip(f'{codec} missing')
    
            def encode(*args, **kwargs):
                return imagecodecs.lz4_encode(*args, header=True, **kwargs)
    
            def decode(*args, **kwargs):
                return imagecodecs.lz4_decode(*args, header=True, **kwargs)
    
            check = imagecodecs.lz4_check
            level = 1
            encoded = lz4.block.compress(data, store_size=True)
        elif codec == 'lz4h5':
            if not imagecodecs.LZ4H5.available:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.lz4h5_encode
            decode = imagecodecs.lz4h5_decode
            check = imagecodecs.lz4h5_check
            level = 1
>           encoded = encode(data)

tests/test_imagecodecs.py:1621: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
imagecodecs/_lz4.pyx:267: in imagecodecs._lz4.lz4h5_encode
    out = _create_output(outtype, dstsize)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   obj = PyBytes_FromStringAndSize(string, size)
E   MemoryError

imagecodecs/_shared.pyx:100: MemoryError
--------------------------------------------------------------- Captured stderr call ---------------------------------------------------------------
mimalloc: warning: unable to allocate OS memory (error: 12 (0xc), size: 0x40c00000 bytes, align: 0x2000000, commit: 1, allow large: 1)
mimalloc: warning: unable to allocate OS memory (error: 12 (0xc), size: 0x40c00000 bytes, align: 0x2000000, commit: 1, allow large: 1)
mimalloc: error: unable to allocate memory (1077952680 bytes)

The text was updated successfully, but these errors were encountered:

cgohlke · 2024-10-03T15:31:42Z

allocating 1GB of RAM still seems excessive. Is this expected? Can it be reduced somehow or is this the best we can do?

1 GB is the default block size for the HDF5 filter, but that should be capped at the input size. The fix will be in the next release.

Thanks for reporting.

lysnikolaou mentioned this issue Oct 3, 2024

Add Cython flag to signal free-threading compatibility #113

Open

cgohlke closed this as completed Oct 3, 2024

cgohlke added the bug Something isn't working label Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running the tests sometimes fails with OOM SIGKILL / MemoryError #112

Running the tests sometimes fails with OOM SIGKILL / MemoryError #112

lysnikolaou commented Oct 3, 2024

cgohlke commented Oct 3, 2024

Running the tests sometimes fails with OOM SIGKILL / MemoryError #112

Running the tests sometimes fails with OOM SIGKILL / MemoryError #112

Comments

lysnikolaou commented Oct 3, 2024

cgohlke commented Oct 3, 2024