feat: Implement `DFSchema.print_schema_tree()` method #17459

comphead · 2025-09-07T04:55:47Z

Which issue does this PR close?

Closes Create schema print out method #17460.

Rationale for this change

What changes are included in this PR?

This PR adds a new print_schema_tree() method to DFSchema that formats schema information in a tree-like structure similar to Apache Spark's schema display format, with proper nested indentation for complex data types.

Core Implementation
File: datafusion/common/src/dfschema.rs

Added print_schema_tree() method
Public method that formats the entire schema in tree structure
Handles both qualified and unqualified field names
Returns formatted string with "root" header and proper indentation

Added format_field_with_indent() helper function

Recursive function that handles nested indentation for complex types
Supports proper tree structure with |-- and | indentation
Handles all Arrow data types

Example output for map of array

root
 |-- array_map_field: list (nullable = false)
 |    |-- item: map (nullable = false)
 |    |    |-- key: string (nullable = false)
 |    |    |-- value: string (nullable = false)

Are these changes tested?

Are there any user-facing changes?

comphead · 2025-09-07T05:20:28Z

datafusion/sql/src/statement.rs

-                .map(|c| self.ident_normalizer.normalize(c))
                .enumerate()
                .map(|(i, c)| {
+                    let c = self.ident_normalizer.normalize(c);


this is unrelated change, just removing unnecessary loop

It seems to me like this change just moves the call to normalize the identifier into the map function (I don't think there are any loops removed 🤔 )

it was 2 map before, I hope rustc is smart enough to merge them in runtime, but in this case it slighlty more readable as well.

I think the change is fine, I just wanted to make sure I understood what was going on. Thank you @comphead

alamb

Thank you @comphead -- I have some suggestions, but nothing that I think would prevent this PR from merging

BTW I was wondering if this would be a better default Display for DFSchema, but it seems like there is already a default implementation

datafusion/common/src/dfschema.rs

alamb · 2025-09-08T16:11:00Z

datafusion/common/src/dfschema.rs

+        .unwrap();
+
+        let output = schema.print_schema_tree();
+        let expected = "root\n |-- id: int32 (nullable = false)\n |-- name: string (nullable = true)\n |-- age: int64 (nullable = true)\n |-- active: boolean (nullable = false)";


using insta might make these cases easier to update / easier to see. Something like this perhaps

datafusion/datafusion/core/tests/user_defined/user_defined_aggregates.rs

Lines 214 to 220 in b8bf7c5

insta::assert_snapshot!(batches_to_string(&actual), @r###"

+---------------------------------------+

| sum(arrow_cast(t.time,Utf8("Int64"))) |

+---------------------------------------+

| 19000 |

+---------------------------------------+

"###);

alamb · 2025-09-08T16:11:24Z

datafusion/common/src/dfschema.rs

+        )
+        .unwrap();
+
+        let output = schema.print_schema_tree();


i think an insta snapshot would be better here

alamb · 2025-09-08T16:12:24Z

datafusion/sql/src/statement.rs

-                .map(|c| self.ident_normalizer.normalize(c))
                .enumerate()
                .map(|(i, c)| {
+                    let c = self.ident_normalizer.normalize(c);


It seems to me like this change just moves the call to normalize the identifier into the map function (I don't think there are any loops removed 🤔 )

datafusion/common/src/dfschema.rs

comphead · 2025-09-08T16:53:41Z

BTW I was wondering if this would be a better default Display for DFSchema, but it seems like there is already a default implementation

Display provides more information now, like metadata, dict_ordering. Not sure if it is a right time to replace but in future why not.

To display tree schema we need to ship 1 more DDL function to show the schema in the CLI or by calling DataFusion sql.
Alternatives what DDL can be chosen are #17466, I'm planning to make a vote on this

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

alamb · 2025-09-09T13:35:26Z

datafusion/sql/src/statement.rs

-                .map(|c| self.ident_normalizer.normalize(c))
                .enumerate()
                .map(|(i, c)| {
+                    let c = self.ident_normalizer.normalize(c);


I think the change is fine, I just wanted to make sure I understood what was going on. Thank you @comphead

feat: Implement DFSchema.print_schema() method

ee89a16

github-actions bot added sql SQL Planner common Related to common crate labels Sep 7, 2025

feat: Implement DFSchema.print_schema() method

8e1cdc3

comphead force-pushed the dev2 branch from aaa7906 to 8e1cdc3 Compare September 7, 2025 05:07

comphead commented Sep 7, 2025

View reviewed changes

comphead changed the title ~~feat: Implement DFSchema.print_schema() method~~ feat: Implement DFSchema.print_schema_tree() method Sep 7, 2025

alamb approved these changes Sep 8, 2025

View reviewed changes

comphead and others added 4 commits September 8, 2025 10:38

Update datafusion/common/src/dfschema.rs

011305f

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Update datafusion/common/src/dfschema.rs

00521d2

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

feedback

e3280aa

feedback

c6da975

alamb approved these changes Sep 9, 2025

View reviewed changes

alamb merged commit fcd820e into apache:main Sep 9, 2025
28 checks passed

AdamGS mentioned this pull request Oct 3, 2025

[Branch-50] Backport: Support Decimal32/64 types (#17501) #17907

Closed

comphead mentioned this pull request Oct 4, 2025

chore: rename Schema print_schema_tree to tree_string #17919

Merged

comphead mentioned this pull request Oct 26, 2025

Implement DESCRIBE SELECT to show schema rather than EXPLAIN plan #18238

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Implement `DFSchema.print_schema_tree()` method #17459

feat: Implement `DFSchema.print_schema_tree()` method #17459

Uh oh!

comphead commented Sep 7, 2025 •

edited

Loading

Uh oh!

comphead Sep 7, 2025

Uh oh!

alamb Sep 8, 2025

Uh oh!

comphead Sep 8, 2025

Uh oh!

alamb Sep 9, 2025

Uh oh!

alamb left a comment

Uh oh!

Uh oh!

alamb Sep 8, 2025

Uh oh!

alamb Sep 8, 2025

Uh oh!

alamb Sep 8, 2025

Uh oh!

Uh oh!

comphead commented Sep 8, 2025

Uh oh!

alamb Sep 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	insta::assert_snapshot!(batches_to_string(&actual), @r###"
	+---------------------------------------+
	\| sum(arrow_cast(t.time,Utf8("Int64"))) \|
	+---------------------------------------+
	\| 19000 \|
	+---------------------------------------+
	"###);

feat: Implement DFSchema.print_schema_tree() method #17459

feat: Implement DFSchema.print_schema_tree() method #17459

Uh oh!

Conversation

comphead commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

comphead commented Sep 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Implement `DFSchema.print_schema_tree()` method #17459

feat: Implement `DFSchema.print_schema_tree()` method #17459

comphead commented Sep 7, 2025 •

edited

Loading