Skip to content

Commit ec2402a

Browse files
2010YOUY01alamb
andauthored
feat: Support configurable EXPLAIN ANALYZE detail level (apache#18098)
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> `EXPLAIN ANALYZE` can be used for profiling and displays the results alongside the EXPLAIN plan. The issue is that it currently shows too many low-level details. It would provide a better user experience if only the most commonly used metrics were shown by default, with more detailed metrics available through specific configuration options. ### Example In `datafusion-cli`: ``` > CREATE EXTERNAL TABLE IF NOT EXISTS lineitem STORED AS parquet LOCATION '/Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem'; 0 row(s) fetched. Elapsed 0.000 seconds. explain analyze select * from lineitem where l_orderkey = 3000000; ``` The parquet reader includes a large number of low-level details: ``` metrics=[output_rows=19813, elapsed_compute=14ns, batches_split=0, bytes_scanned=2147308, file_open_errors=0, file_scan_errors=0, files_ranges_pruned_statistics=18, num_predicate_creation_errors=0, page_index_rows_matched=19813, page_index_rows_pruned=729088, predicate_cache_inner_records=0, predicate_cache_records=0, predicate_evaluation_errors=0, pushdown_rows_matched=0, pushdown_rows_pruned=0, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=1, row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=0, bloom_filter_eval_time=21.997µs, metadata_load_time=273.83µs, page_index_eval_time=29.915µs, row_pushdown_eval_time=42ns, statistics_eval_time=76.248µs, time_elapsed_opening=4.02146ms, time_elapsed_processing=24.787461ms, time_elapsed_scanning_total=24.17671ms, time_elapsed_scanning_until_data=23.103665ms] ``` I believe only a subset of it is commonly used, for example `output_rows`, `metadata_load_time`, and how many file/row-group/pages are pruned, and it would better to only display the most common ones by default. ### Existing `VERBOSE` keyword There is a existing verbose keyword in `EXPLAIN ANALYZE VERBOSE`, however it's turning on per-partition metrics instead of controlling detail level. I think it would be hard to mix this partition control and the detail level introduced in this PR, so they're separated: the following config will be used for detail level and the semantics of `EXPLAIN ANALYZE VERBOSE` keep unchanged. ### This PR: configurable explain analyze level 1. Introduced a new config option `datafusion.explain.analyze_level`. When set to `dev` (default value), all existing metrics will be shown. If set to `summary`, only `BaselineMetrics` will be displayed (i.e. `output_rows` and `elapsed_compute`). Note now we only include `BaselineMetrics` for simplicity, in the follow-up PRs we can figure out what's the commonly used metrics for each operator, and add them to `summary` analyze level, finally set the `summary` analyze level to default. 2. Add a `MetricType` field associated with `Metric` for detail level or potentially category in the future. For different configurations, a certain `MetricType` set will be shown accordingly. #### Demo ``` -- continuing the above example > set datafusion.explain.analyze_level = summary; 0 row(s) fetched. Elapsed 0.000 seconds. > explain analyze select * from lineitem where l_orderkey = 3000000; +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Plan with Metrics | CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=5, elapsed_compute=25.339µs] | | | FilterExec: l_orderkey@0 = 3000000, metrics=[output_rows=5, elapsed_compute=81.221µs] | | | DataSourceExec: file_groups={14 groups: [[Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:0..11525426], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:11525426..20311205, Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:0..2739647], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:2739647..14265073], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:14265073..20193593, Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-2.parquet:0..5596906], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-2.parquet:5596906..17122332], ...]}, projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment], file_type=parquet, predicate=l_orderkey@0 = 3000000, pruning_predicate=l_orderkey_null_count@2 != row_count@3 AND l_orderkey_min@0 <= 3000000 AND 3000000 <= l_orderkey_max@1, required_guarantees=[l_orderkey in (3000000)], metrics=[output_rows=19813, elapsed_compute=14ns] | | | | +-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.025 seconds. ``` Only `BaselineMetrics` are shown. ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 4. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> UT ## Are there any user-facing changes? No <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
1 parent c956104 commit ec2402a

File tree

18 files changed

+251
-18
lines changed

18 files changed

+251
-18
lines changed

datafusion/common/src/config.rs

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ use arrow_ipc::CompressionType;
2222
#[cfg(feature = "parquet_encryption")]
2323
use crate::encryption::{FileDecryptionProperties, FileEncryptionProperties};
2424
use crate::error::_config_err;
25-
use crate::format::ExplainFormat;
25+
use crate::format::{ExplainAnalyzeLevel, ExplainFormat};
2626
use crate::parsers::CompressionTypeVariant;
2727
use crate::utils::get_available_parallelism;
2828
use crate::{DataFusionError, Result};
@@ -991,6 +991,11 @@ config_namespace! {
991991
/// (format=tree only) Maximum total width of the rendered tree.
992992
/// When set to 0, the tree will have no width limit.
993993
pub tree_maximum_render_width: usize, default = 240
994+
995+
/// Verbosity level for "EXPLAIN ANALYZE". Default is "dev"
996+
/// "summary" shows common metrics for high-level insights.
997+
/// "dev" provides deep operator-level introspection for developers.
998+
pub analyze_level: ExplainAnalyzeLevel, default = ExplainAnalyzeLevel::Dev
994999
}
9951000
}
9961001

datafusion/common/src/format.rs

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,3 +205,48 @@ impl ConfigField for ExplainFormat {
205205
Ok(())
206206
}
207207
}
208+
209+
/// Verbosity levels controlling how `EXPLAIN ANALYZE` renders metrics
210+
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
211+
pub enum ExplainAnalyzeLevel {
212+
/// Show a compact view containing high-level metrics
213+
Summary,
214+
/// Show a developer-focused view with per-operator details
215+
Dev,
216+
// When adding new enum, update the error message in `from_str()` accordingly.
217+
}
218+
219+
impl FromStr for ExplainAnalyzeLevel {
220+
type Err = DataFusionError;
221+
222+
fn from_str(level: &str) -> Result<Self, Self::Err> {
223+
match level.to_lowercase().as_str() {
224+
"summary" => Ok(ExplainAnalyzeLevel::Summary),
225+
"dev" => Ok(ExplainAnalyzeLevel::Dev),
226+
other => Err(DataFusionError::Configuration(format!(
227+
"Invalid explain analyze level. Expected 'summary' or 'dev'. Got '{other}'"
228+
))),
229+
}
230+
}
231+
}
232+
233+
impl Display for ExplainAnalyzeLevel {
234+
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
235+
let s = match self {
236+
ExplainAnalyzeLevel::Summary => "summary",
237+
ExplainAnalyzeLevel::Dev => "dev",
238+
};
239+
write!(f, "{s}")
240+
}
241+
}
242+
243+
impl ConfigField for ExplainAnalyzeLevel {
244+
fn visit<V: Visit>(&self, v: &mut V, key: &str, description: &'static str) {
245+
v.some(key, self, description)
246+
}
247+
248+
fn set(&mut self, _: &str, value: &str) -> Result<()> {
249+
*self = ExplainAnalyzeLevel::from_str(value)?;
250+
Ok(())
251+
}
252+
}

datafusion/core/src/datasource/physical_plan/parquet.rs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,9 @@ mod tests {
6464
use datafusion_physical_expr::planner::logical2physical;
6565
use datafusion_physical_plan::analyze::AnalyzeExec;
6666
use datafusion_physical_plan::collect;
67-
use datafusion_physical_plan::metrics::{ExecutionPlanMetricsSet, MetricsSet};
67+
use datafusion_physical_plan::metrics::{
68+
ExecutionPlanMetricsSet, MetricType, MetricsSet,
69+
};
6870
use datafusion_physical_plan::{ExecutionPlan, ExecutionPlanProperties};
6971

7072
use chrono::{TimeZone, Utc};
@@ -238,6 +240,7 @@ mod tests {
238240
let analyze_exec = Arc::new(AnalyzeExec::new(
239241
false,
240242
false,
243+
vec![MetricType::SUMMARY, MetricType::DEV],
241244
// use a new ParquetSource to avoid sharing execution metrics
242245
self.build_parquet_exec(
243246
Arc::clone(table_schema),

datafusion/core/src/physical_planner.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ use arrow::compute::SortOptions;
6262
use arrow::datatypes::Schema;
6363
use datafusion_catalog::ScanArgs;
6464
use datafusion_common::display::ToStringifiedPlan;
65+
use datafusion_common::format::ExplainAnalyzeLevel;
6566
use datafusion_common::tree_node::{TreeNode, TreeNodeRecursion, TreeNodeVisitor};
6667
use datafusion_common::TableReference;
6768
use datafusion_common::{
@@ -90,6 +91,7 @@ use datafusion_physical_expr::{
9091
use datafusion_physical_optimizer::PhysicalOptimizerRule;
9192
use datafusion_physical_plan::empty::EmptyExec;
9293
use datafusion_physical_plan::execution_plan::InvariantLevel;
94+
use datafusion_physical_plan::metrics::MetricType;
9395
use datafusion_physical_plan::placeholder_row::PlaceholderRowExec;
9496
use datafusion_physical_plan::recursive_query::RecursiveQueryExec;
9597
use datafusion_physical_plan::unnest::ListUnnest;
@@ -2073,9 +2075,15 @@ impl DefaultPhysicalPlanner {
20732075
let input = self.create_physical_plan(&a.input, session_state).await?;
20742076
let schema = Arc::clone(a.schema.inner());
20752077
let show_statistics = session_state.config_options().explain.show_statistics;
2078+
let analyze_level = session_state.config_options().explain.analyze_level;
2079+
let metric_types = match analyze_level {
2080+
ExplainAnalyzeLevel::Summary => vec![MetricType::SUMMARY],
2081+
ExplainAnalyzeLevel::Dev => vec![MetricType::SUMMARY, MetricType::DEV],
2082+
};
20762083
Ok(Arc::new(AnalyzeExec::new(
20772084
a.verbose,
20782085
show_statistics,
2086+
metric_types,
20792087
input,
20802088
schema,
20812089
)))

datafusion/core/tests/sql/explain_analyze.rs

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ use rstest::rstest;
2222
use datafusion::config::ConfigOptions;
2323
use datafusion::physical_plan::display::DisplayableExecutionPlan;
2424
use datafusion::physical_plan::metrics::Timestamp;
25+
use datafusion_common::format::ExplainAnalyzeLevel;
2526
use object_store::path::Path;
2627

2728
#[tokio::test]
@@ -158,6 +159,40 @@ async fn explain_analyze_baseline_metrics() {
158159
fn nanos_from_timestamp(ts: &Timestamp) -> i64 {
159160
ts.value().unwrap().timestamp_nanos_opt().unwrap()
160161
}
162+
163+
// Test different detail level for config `datafusion.explain.analyze_level`
164+
#[tokio::test]
165+
async fn explain_analyze_level() {
166+
async fn collect_plan(level: ExplainAnalyzeLevel) -> String {
167+
let mut config = SessionConfig::new();
168+
config.options_mut().explain.analyze_level = level;
169+
let ctx = SessionContext::new_with_config(config);
170+
let sql = "EXPLAIN ANALYZE \
171+
SELECT * \
172+
FROM generate_series(10) as t1(v1) \
173+
ORDER BY v1 DESC";
174+
let dataframe = ctx.sql(sql).await.unwrap();
175+
let batches = dataframe.collect().await.unwrap();
176+
arrow::util::pretty::pretty_format_batches(&batches)
177+
.unwrap()
178+
.to_string()
179+
}
180+
181+
for (level, needle, should_contain) in [
182+
(ExplainAnalyzeLevel::Summary, "spill_count", false),
183+
(ExplainAnalyzeLevel::Summary, "output_rows", true),
184+
(ExplainAnalyzeLevel::Dev, "spill_count", true),
185+
(ExplainAnalyzeLevel::Dev, "output_rows", true),
186+
] {
187+
let plan = collect_plan(level).await;
188+
assert_eq!(
189+
plan.contains(needle),
190+
should_contain,
191+
"plan for level {level:?} unexpected content: {plan}"
192+
);
193+
}
194+
}
195+
161196
#[tokio::test]
162197
async fn csv_explain_plans() {
163198
// This test verify the look of each plan in its full cycle plan creation

datafusion/physical-plan/src/analyze.rs

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ use super::{
2626
SendableRecordBatchStream,
2727
};
2828
use crate::display::DisplayableExecutionPlan;
29+
use crate::metrics::MetricType;
2930
use crate::{DisplayFormatType, ExecutionPlan, Partitioning};
3031

3132
use arrow::{array::StringBuilder, datatypes::SchemaRef, record_batch::RecordBatch};
@@ -44,6 +45,8 @@ pub struct AnalyzeExec {
4445
verbose: bool,
4546
/// If statistics should be displayed
4647
show_statistics: bool,
48+
/// Which metric categories should be displayed
49+
metric_types: Vec<MetricType>,
4750
/// The input plan (the plan being analyzed)
4851
pub(crate) input: Arc<dyn ExecutionPlan>,
4952
/// The output schema for RecordBatches of this exec node
@@ -56,13 +59,15 @@ impl AnalyzeExec {
5659
pub fn new(
5760
verbose: bool,
5861
show_statistics: bool,
62+
metric_types: Vec<MetricType>,
5963
input: Arc<dyn ExecutionPlan>,
6064
schema: SchemaRef,
6165
) -> Self {
6266
let cache = Self::compute_properties(&input, Arc::clone(&schema));
6367
AnalyzeExec {
6468
verbose,
6569
show_statistics,
70+
metric_types,
6671
input,
6772
schema,
6873
cache,
@@ -145,6 +150,7 @@ impl ExecutionPlan for AnalyzeExec {
145150
Ok(Arc::new(Self::new(
146151
self.verbose,
147152
self.show_statistics,
153+
self.metric_types.clone(),
148154
children.pop().unwrap(),
149155
Arc::clone(&self.schema),
150156
)))
@@ -182,6 +188,7 @@ impl ExecutionPlan for AnalyzeExec {
182188
let captured_schema = Arc::clone(&self.schema);
183189
let verbose = self.verbose;
184190
let show_statistics = self.show_statistics;
191+
let metric_types = self.metric_types.clone();
185192

186193
// future that gathers the results from all the tasks in the
187194
// JoinSet that computes the overall row count and final
@@ -201,6 +208,7 @@ impl ExecutionPlan for AnalyzeExec {
201208
duration,
202209
captured_input,
203210
captured_schema,
211+
&metric_types,
204212
)
205213
};
206214

@@ -219,6 +227,7 @@ fn create_output_batch(
219227
duration: std::time::Duration,
220228
input: Arc<dyn ExecutionPlan>,
221229
schema: SchemaRef,
230+
metric_types: &[MetricType],
222231
) -> Result<RecordBatch> {
223232
let mut type_builder = StringBuilder::with_capacity(1, 1024);
224233
let mut plan_builder = StringBuilder::with_capacity(1, 1024);
@@ -227,6 +236,7 @@ fn create_output_batch(
227236
type_builder.append_value("Plan with Metrics");
228237

229238
let annotated_plan = DisplayableExecutionPlan::with_metrics(input.as_ref())
239+
.set_metric_types(metric_types.to_vec())
230240
.set_show_statistics(show_statistics)
231241
.indent(verbose)
232242
.to_string();
@@ -238,6 +248,7 @@ fn create_output_batch(
238248
type_builder.append_value("Plan with Full Metrics");
239249

240250
let annotated_plan = DisplayableExecutionPlan::with_full_metrics(input.as_ref())
251+
.set_metric_types(metric_types.to_vec())
241252
.set_show_statistics(show_statistics)
242253
.indent(verbose)
243254
.to_string();
@@ -282,7 +293,13 @@ mod tests {
282293

283294
let blocking_exec = Arc::new(BlockingExec::new(Arc::clone(&schema), 1));
284295
let refs = blocking_exec.refs();
285-
let analyze_exec = Arc::new(AnalyzeExec::new(true, false, blocking_exec, schema));
296+
let analyze_exec = Arc::new(AnalyzeExec::new(
297+
true,
298+
false,
299+
vec![MetricType::SUMMARY, MetricType::DEV],
300+
blocking_exec,
301+
schema,
302+
));
286303

287304
let fut = collect(analyze_exec, task_ctx);
288305
let mut fut = fut.boxed();

0 commit comments

Comments
 (0)