Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[c++/python] Memory leak in Python sparse-array write #2928

Closed
johnkerl opened this issue Aug 26, 2024 · 1 comment · Fixed by #2989
Closed

[c++/python] Memory leak in Python sparse-array write #2928

johnkerl opened this issue Aug 26, 2024 · 1 comment · Fixed by #2989
Assignees
Labels
blocker bug Something isn't working

Comments

@johnkerl
Copy link
Member

Problem

There is a memory leak when writing SOMA sparse arrays (maybe other types). Perhaps the Arrow table being written is leaked. Test code below.

Note:

  • Issue only occurs in 1.13 or later
  • Workaround is to pin to <1.13

The leak is large enough that it prevents copying any large-ish array. For example, trying to read and rewrite a Census X layer (with a new schema), and that OOMs quickly on a 128GiB host.

Reported by @bkmartinjr.

[sc-53264]

Repro

Test code (will create a test_array in the current working directory):

from __future__ import annotations

import gc
import sys

import tiledbsoma as soma


def copy(from_uri, to_uri, context):
    with soma.SparseNDArray.open(from_uri, mode="r", context=context) as X_from:
        with soma.SparseNDArray.open(to_uri, mode="w", context=context) as X_to:
            for i, tbl in enumerate(X_from.read().tables()):
                print(f"Read {len(tbl)}")
                X_to.write(tbl)

                del tbl
                gc.collect()

                if i == 10:  # OOMs w/o this break
                    break


def create(src_uri, dst_uri, context):
    with soma.open(src_uri, mode="r", context=context) as X:
        n_obs, n_var = X.shape
        type = X.schema.field("soma_data").type

    a = soma.SparseNDArray.create(
        dst_uri,
        type=type,
        shape=(n_obs, n_var),
        platform_config={
            "tiledb": {
                "create": {
                    "capacity": 2**16,
                    "dims": {
                        "soma_dim_0": {
                            "tile": 64,
                            "filters": [{"_type": "ZstdFilter", "level": 9}],
                        },
                        "soma_dim_1": {
                            "tile": 2048,
                            "filters": [
                                "ByteShuffleFilter",
                                {"_type": "ZstdFilter", "level": 9},
                            ],
                        },
                    },
                    "attrs": {
                        "soma_data": {
                            "filters": [
                                "ByteShuffleFilter",
                                {"_type": "ZstdFilter", "level": 9},
                            ]
                        }
                    },
                    "cell_order": "row-major",
                    "tile_order": "row-major",
                    "allows_duplicates": True,
                },
            }
        },
        context=context,
    )
    a.close()
    print(f"Array created at {dst_uri}")


def main():
    src_uri = "s3://cellxgene-census-public-us-west-2/cell-census/2024-07-01/soma/census_data/homo_sapiens/ms/RNA/X/raw/"
    dst_uri = "./test_array/"

    context = soma.SOMATileDBContext(
        tiledb_config={
            "vfs.s3.region": "us-west-2",
            "soma.init_buffer_bytes": 1 * 1024**3,
        }
    )

    create(src_uri, dst_uri, context=context)
    copy(src_uri, dst_uri, context=context)


if __name__ == "__main__":
    sys.exit(main())
@johnkerl johnkerl added bug Something isn't working blocker labels Aug 26, 2024
@eddelbuettel
Copy link
Contributor

eddelbuettel commented Aug 26, 2024

I had similar issues in both tiledb-r and tiledbsoma-r which were identifiable and addressed by measuring with valgrind and making (much) more extended use of nanoarrow. In principle it is very easy to ensure each arrow allocated object has a finalizer, in practice I found not all ended up being freed. Of course, 'code being code', this leak may also be caused by something completely different but valgrind is still gold.

(I had noticed the nighlies for tiledb-r changed from '216 bytes definitely lost' to '224 bytes' but as I just verfied both are due to the same initialization within the AWS SDK we can do nothing about, and for which CRAN does not point fingers at us.)

@johnkerl johnkerl self-assigned this Aug 26, 2024
@johnkerl johnkerl changed the title [c++/python] Memory leak in Python sparse array [c++/python] Memory leak in Python sparse-array write Aug 26, 2024
@nguyenv nguyenv linked a pull request Sep 13, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants