Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running the tests sometimes fails with OOM SIGKILL / MemoryError #112

Closed
lysnikolaou opened this issue Oct 3, 2024 · 1 comment
Closed
Labels
bug Something isn't working

Comments

@lysnikolaou
Copy link

Hey! 👋

I've been working on trying imagecodecs out with the 3.13 free-threaded build. Everything seems to be working as expected (most of the C heavylifting is done by Cython), except for one thing.

Running test_compressors fails when testing the lz4h5 codec. I first encountered this on my PC, but only when there were enough other programs open that my RAM wasn't enough. In this case a SIGKILL would kills the test process.

I then started testing in a Docker container. The test failure can be reproduced fairly consistently on a Linux aarch64 Docker container, hard-coded to provide 6GBs of RAM with ulimit -v 6000000. When doing so, running pytest fails with a MemoryError and CPython outputs a warnings. The failure only happens with the free-threaded build when the GIL is actually disabled, which points to an upstream CPython bug.

However, I still did a deep-dive and it turns out that lz4h5_encode does ask for a lot of memory, 1077952680 bytes to be exact. It calls PyBytes_FromStringAndSize with a value in the order of the size that's returned from LZ4_compressBound.

My questions is: This is a CPython bug, probably related to the GC, but allocating 1GB of RAM still seems excessive. Is this expected? Can it be reduced somehow or is this the best we can do?

Full test log
_____________________________________________________ test_compressors[lz4h5-decode-3069-new] ______________________________________________________

codec = 'lz4h5', func = 'decode', output = 'new', length = 3069

    @pytest.mark.filterwarnings('ignore: PY_SSIZE_T_CLEAN')
    @pytest.mark.parametrize(
        'output', ['new', 'bytearray', 'out', 'size', 'excess', 'trunc']
    )
    @pytest.mark.parametrize('length', [0, 2, 31 * 33 * 3])
    @pytest.mark.parametrize('func', ['encode', 'decode'])
    @pytest.mark.parametrize(
        'codec',
        [
            'bitshuffle',
            'brotli',
            'blosc',
            'blosc2',
            'bz2',
            'deflate',
            'gzip',
            'lz4',
            'lz4h',
            'lz4h5',
            'lz4f',
            'lzf',
            'lzfse',
            'lzham',
            'lzma',
            'lzw',
            'snappy',
            'szip',
            'zlib',
            'zlibng',
            'zopfli',
            'zstd',
        ],
    )
    def test_compressors(codec, func, output, length):
        """Test various non-image codecs."""
        if length:
            data = numpy.random.randint(255, size=length, dtype='uint8').tobytes()
        else:
            data = b''
    
        level = None
        if codec == 'blosc':
            if not imagecodecs.BLOSC.available or blosc is None:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.blosc_encode
            decode = imagecodecs.blosc_decode
            check = imagecodecs.blosc_check
            level = 9
            encoded = blosc.compress(data, clevel=level, typesize=1)
        elif codec == 'blosc2':
            if not imagecodecs.BLOSC2.available or blosc2 is None:
                pytest.skip(f'{codec} missing')
            if IS_PYPY:
                pytest.xfail('blosc2.compress fails under PyPy')
            encode = imagecodecs.blosc2_encode
            decode = imagecodecs.blosc2_decode
            check = imagecodecs.blosc2_check
            level = 5
            encoded = blosc2.compress2(data, clevel=level, typesize=8)
        elif codec == 'zlib':
            if not imagecodecs.ZLIB.available or zlib is None:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.zlib_encode
            decode = imagecodecs.zlib_decode
            check = imagecodecs.zlib_check
            level = 5
            encoded = zlib.compress(data, level)
        elif codec == 'zlibng':
            if not imagecodecs.ZLIBNG.available or zlib is None:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.zlibng_encode
            decode = imagecodecs.zlibng_decode
            check = imagecodecs.zlibng_check
            level = 5
            encoded = zlib.compress(data, level)
        elif codec == 'deflate':
            if not imagecodecs.DEFLATE.available:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.deflate_encode
            decode = imagecodecs.deflate_decode
            check = imagecodecs.deflate_check
            level = 8
            # TODO: use a 3rd party libdeflate wrapper
            # encoded = deflate.compress(data, level)
            encoded = encode(data, level)
        elif codec == 'gzip':
            if not imagecodecs.GZIP.available:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.gzip_encode
            decode = imagecodecs.gzip_decode
            check = imagecodecs.gzip_check
            level = 8
            encoded = encode(data, level)
            # encoded = gzip.compress(data, level)
        elif codec == 'lzma':
            if not imagecodecs.LZMA.available or lzma is None:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.lzma_encode
            decode = imagecodecs.lzma_decode
            check = imagecodecs.lzma_check
            level = 6
            encoded = lzma.compress(data)
        elif codec == 'lzw':
            if not imagecodecs.LZW.available:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.lzw_encode
            decode = imagecodecs.lzw_decode
            check = imagecodecs.lzw_check
            encoded = encode(data)
        elif codec == 'zstd':
            if not imagecodecs.ZSTD.available or zstd is None:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.zstd_encode
            decode = imagecodecs.zstd_decode
            check = imagecodecs.zstd_check
            level = 5
            if length == 0:
                # bug in zstd.compress?
                encoded = encode(data, level)
            else:
                encoded = zstd.compress(data, level)
        elif codec == 'lzf':
            if not imagecodecs.LZF.available or lzf is None:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.lzf_encode
            decode = imagecodecs.lzf_decode
            check = imagecodecs.lzf_check
            encoded = lzf.compress(data, ((len(data) * 33) >> 5) + 1)
            if encoded is None:
                pytest.xfail("lzf can't compress empty input")
        elif codec == 'lzfse':
            if not imagecodecs.LZFSE.available or lzfse is None:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.lzfse_encode
            decode = imagecodecs.lzfse_decode
            check = imagecodecs.lzfse_check
            encoded = lzfse.compress(data)
        elif codec == 'lzham':
            # TODO: test against pylzham?
            if not imagecodecs.LZHAM.available:  # or lzham is None
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.lzham_encode
            decode = imagecodecs.lzham_decode
            check = imagecodecs.lzham_check
            level = 5
            encoded = encode(data, level)
        elif codec == 'lz4':
            if not imagecodecs.LZ4.available or lz4 is None:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.lz4_encode
            decode = imagecodecs.lz4_decode
            check = imagecodecs.lz4_check
            level = 1
            encoded = lz4.block.compress(data, store_size=False)
        elif codec == 'lz4h':
            if not imagecodecs.LZ4.available or lz4 is None:
                pytest.skip(f'{codec} missing')
    
            def encode(*args, **kwargs):
                return imagecodecs.lz4_encode(*args, header=True, **kwargs)
    
            def decode(*args, **kwargs):
                return imagecodecs.lz4_decode(*args, header=True, **kwargs)
    
            check = imagecodecs.lz4_check
            level = 1
            encoded = lz4.block.compress(data, store_size=True)
        elif codec == 'lz4h5':
            if not imagecodecs.LZ4H5.available:
                pytest.skip(f'{codec} missing')
            encode = imagecodecs.lz4h5_encode
            decode = imagecodecs.lz4h5_decode
            check = imagecodecs.lz4h5_check
            level = 1
>           encoded = encode(data)

tests/test_imagecodecs.py:1621: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
imagecodecs/_lz4.pyx:267: in imagecodecs._lz4.lz4h5_encode
    out = _create_output(outtype, dstsize)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   obj = PyBytes_FromStringAndSize(string, size)
E   MemoryError

imagecodecs/_shared.pyx:100: MemoryError
--------------------------------------------------------------- Captured stderr call ---------------------------------------------------------------
mimalloc: warning: unable to allocate OS memory (error: 12 (0xc), size: 0x40c00000 bytes, align: 0x2000000, commit: 1, allow large: 1)
mimalloc: warning: unable to allocate OS memory (error: 12 (0xc), size: 0x40c00000 bytes, align: 0x2000000, commit: 1, allow large: 1)
mimalloc: error: unable to allocate memory (1077952680 bytes)
@cgohlke
Copy link
Owner

cgohlke commented Oct 3, 2024

allocating 1GB of RAM still seems excessive. Is this expected? Can it be reduced somehow or is this the best we can do?

1 GB is the default block size for the HDF5 filter, but that should be capped at the input size. The fix will be in the next release.

Thanks for reporting.

@cgohlke cgohlke closed this as completed Oct 3, 2024
@cgohlke cgohlke added the bug Something isn't working label Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants