Skip to content

Conversation

zhuqi-lucas
Copy link
Contributor

…ng datafusion-cli

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Add unlimited case for bounded streaming.

Are these changes tested?

Yes

The following result will not be empty, before this PR is empty.

/usr/bin/time -l cargo run --release --   -m 20G --mem-pool-type fair  --maxrows inf -f '/Users/zhuqi/arrow-datafusion/benchmarks/data/external_sort.sql'

Are there any user-facing changes?

Support unlimited case for bounded streaming

@zhuqi-lucas
Copy link
Contributor Author

cc @alamb @2010YOUY01

Found one bug, please help review, thanks!

reservation.try_grow(get_record_batch_memory_size(&batch))?;
results.push(batch);
}
} else if let MaxRows::Unlimited = print_options.maxrows {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good catch, wondering to avoid code duplication would it be better to

                let max_rows = match print_options.maxrows {
                    MaxRows::Unlimited => usize::MAX,
                    MaxRows::Limited(n) => n
                };

           while let Some(batch) = stream.next().await {
                let batch = batch?;
                let curr_num_rows = batch.num_rows();
                if row_count < max_rows + curr_num_rows {
                        // Try to grow the reservation to accommodate the batch in memory
                        reservation.try_grow(get_record_batch_memory_size(&batch))?;
                        results.push(batch);
                    }
                row_count += curr_num_rows;
            }

so we also remove unnecessary branch and literal evaulation every batch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @comphead for review, this is a good suggestion, i have addressed in latest PR!

@zhuqi-lucas zhuqi-lucas requested a review from comphead February 22, 2025 03:49
@alamb alamb merged commit 6bfeb02 into apache:main Feb 22, 2025
24 checks passed
ozankabak pushed a commit to synnada-ai/datafusion-upstream that referenced this pull request Feb 25, 2025
apache#14815)

* fix: we are missing the unlimited case for bounded streaming when using datafusion-cli

* Address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Datafusion-cli: when the max rows setting inf, we are missing the unlimited case for bounded streaming.
3 participants