-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat: Implement DFSchema.print_schema_tree() method
#17459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| .map(|c| self.ident_normalizer.normalize(c)) | ||
| .enumerate() | ||
| .map(|(i, c)| { | ||
| let c = self.ident_normalizer.normalize(c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is unrelated change, just removing unnecessary loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me like this change just moves the call to normalize the identifier into the map function (I don't think there are any loops removed 🤔 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was 2 map before, I hope rustc is smart enough to merge them in runtime, but in this case it slighlty more readable as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the change is fine, I just wanted to make sure I understood what was going on. Thank you @comphead
DFSchema.print_schema() methodDFSchema.print_schema_tree() method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @comphead -- I have some suggestions, but nothing that I think would prevent this PR from merging
BTW I was wondering if this would be a better default Display for DFSchema, but it seems like there is already a default implementation
datafusion/common/src/dfschema.rs
Outdated
| .unwrap(); | ||
|
|
||
| let output = schema.print_schema_tree(); | ||
| let expected = "root\n |-- id: int32 (nullable = false)\n |-- name: string (nullable = true)\n |-- age: int64 (nullable = true)\n |-- active: boolean (nullable = false)"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using insta might make these cases easier to update / easier to see. Something like this perhaps
datafusion/datafusion/core/tests/user_defined/user_defined_aggregates.rs
Lines 214 to 220 in b8bf7c5
| insta::assert_snapshot!(batches_to_string(&actual), @r###" | |
| +---------------------------------------+ | |
| | sum(arrow_cast(t.time,Utf8("Int64"))) | | |
| +---------------------------------------+ | |
| | 19000 | | |
| +---------------------------------------+ | |
| "###); |
| ) | ||
| .unwrap(); | ||
|
|
||
| let output = schema.print_schema_tree(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think an insta snapshot would be better here
| .map(|c| self.ident_normalizer.normalize(c)) | ||
| .enumerate() | ||
| .map(|(i, c)| { | ||
| let c = self.ident_normalizer.normalize(c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me like this change just moves the call to normalize the identifier into the map function (I don't think there are any loops removed 🤔 )
To display tree schema we need to ship 1 more DDL function to show the schema in the CLI or by calling DataFusion sql. |
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
| .map(|c| self.ident_normalizer.normalize(c)) | ||
| .enumerate() | ||
| .map(|(i, c)| { | ||
| let c = self.ident_normalizer.normalize(c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the change is fine, I just wanted to make sure I understood what was going on. Thank you @comphead
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
This PR adds a new
print_schema_tree()method to DFSchema that formats schema information in a tree-like structure similar to Apache Spark's schema display format, with proper nested indentation for complex data types.Core Implementation
File:
datafusion/common/src/dfschema.rsprint_schema_tree()methodPublic method that formats the entire schema in tree structure
Handles both qualified and unqualified field names
Returns formatted string with "root" header and proper indentation
Added
format_field_with_indent()helper functionRecursive function that handles nested indentation for complex types
Supports proper tree structure with |-- and | indentation
Handles all Arrow data types
Example output for map of array
Are these changes tested?
Are there any user-facing changes?