-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Description
Background
Currently:
- SpillMetrics (per operator) are updated only at the end of a spill.
- DiskManager tracks
used_disk_space(current total) but doesn't expose a structured "progress" view.
Proposed Changes
- Real-time Metric Updates in SpillMetrics: modify
InProgressSpillFileto ensurespilled_bytes
andspill_file_countmetrics are updated as soon as the data is written to disk.
- Initial update: In append_batch, when the IPCStreamWriter is first created, immediately call
update_disk_usage()on the file and add the size (schema/header) tospilled_bytes - Incremental update: After each writer.write(batch) call, call update_disk_usage() and add the delta size to
spilled_bytes - Final update: In finish() call update_disk_usage() after finishing the writer and add the remaining delta size (footer/metadata) to spilled_bytes
.
- Spilling Progress Interface in DiskManager: expose the current global state of the disk manager.
- New SpillingProgress struct
pub struct SpillingProgress { /// Total bytes currently used on disk for spilling pub current_bytes: u64, /// Total number of active spill files pub active_files_count: usize, }
- Implement
spilling_progress(&self) -> SpillingProgress
- Delegate Interface in RuntimeEnv: provide a convenient entry point for users.
let progress = ctx.runtime_env().spilling_progress();
Then users could call the API to get the real-time spilling progress, for our use case, we want to call this from the SQL UI to give users the real-time feedback about their SQLs.
feniljain and adriangb
Metadata
Metadata
Assignees
Labels
No labels