Skip to content

Add disk usage limit configuration to datafusion-cli #15553

@2010YOUY01

Description

@2010YOUY01

Is your feature request related to a problem or challenge?

During external queries, temporary computation results can be spilled to disk, to let the queries to complete under limited memory.
A new configuration to limit the maximum total spilled file disk usage is being added in #15520

Describe the solution you'd like

Adding a configuration to datafusion-cli like

# By default, disk usage is not limited
datafusion-cli -c 'select 1, 2 from foo';

# Limit disk usage to 10GB
datafusion-cli --disk-limit 10G -c 'select 1, 2 from foo'; 

Here is a reference implementation for a very similar feature (adding configuration for memory pool) #7419

Describe alternatives you've considered

This solution is a little bit hack, in the long term it should better be configurable through SQL interface like

set datafusion.runtime.disk_limit = 1GB;
set datafusion.runtime.memory_limit = 100MB;

select * from tbl order by c1;

Tracking issue is in #15552

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions