Skip to content

[Parquet Metadata Cache]: Limit memory used #17001

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

However, as implemented there is no bound on the amount of memory that is in the cache, which will result in a "leak" over time (aka memory usage always goes up and never down)

Describe the solution you'd like

I would like the cache to have an upper memory limit so we people can turn it on / off and its resource use is capped

Describe alternatives you've considered

I personally recommend:

  1. Adding another Runtime Configuration Setting datafusion.runtime.file_metadata_cache_limit with the same interface as datafusion.runtime.memory_limit
  2. Implement a basic LRU strategy for the cache (when the limit is exceeded, evict the least recently used elements until there is space)
  3. Tests for the above

You can get the memory usage for ParquetMetaData using the following API: https://docs.rs/parquet/latest/parquet/file/metadata/struct.ParquetMetaData.html#method.memory_size

Some care will be needed to make this work with the traits (e.g you may have to change FileMetadata into a trait)

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions