-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Schema::field_names
method
#5186
Conversation
Schema::fieldNames
methodSchema::field_names
method
Hi @tustvold @viirya any plans to review this PR
it will give more context without debugging |
Sorry it is on my list, but I'm wondering if we want a more general mechanism to walk these fields as opposed to adding lots of utility methods. I will have a play over the coming days |
Thanks @tustvold I reused 1 inner method instead of 2. |
|
||
// Recursively concatenate child names | ||
for child in f.nested_fields() { | ||
nested_field_names_inner(child, format!("{}.", current_name), buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of .
as a separator makes me uncomfortable, there have been a lot of bugs in the past resulting from things incorrectly treating .
as a nesting delimiter...
This is something I'll try to address when I iterate on the visitor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @tustvold
do you prefer another separator or another approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards a different approach, I don't think there is an unambiguous separator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what kind of approach? perhaps I can help here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My vague thought was something similar to how we display parquet schema, where it might be something like
struct batch {
optional int64 column1
optional struct nested {
required float64 f64
}
}
But implemented as some FieldVisitor
abstraction so people can easily do something different should they wish to.
I want to take some time to get my thoughts on this together, and will then get back to you
Apologies for the delay, coming back to this and honestly a little confused as to the use-case. The updated error message uses the new field_names method, but the RecordBatch method itself is not concerned with nested fields, and so printing the nested fields is actually just confusing? It should only print the first level of field names, as that is all RecordBatch is concerned with? |
Thanks @tustvold for having the time for this. The PR is planned to make debugging easier and improve the error text which is currently too unclear.
The idea is to show which fields exists, and which are missing in the schema. But if the field is nested we should display it in user-friendly way. You are correct about using dots to separate parent field and child fields, this is not very good idea as dots also used in catalog/schema/table qualifiers. |
Which issue does this PR close?
Closes #.
Rationale for this change
Introducing
Schema::field_names
method similar to Spark https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/StructType.html#fieldNames()The method returns list of column names, including nested fields.
Later we can use this code to improve schema output to be more user-friendly
or create toDDL() method, (https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/StructField.html#toDDL--) etc
What changes are included in this PR?
Bunch of extensions to query recursively fields and schema. Also changed an error message to have more details when schema and data mismatches
Are there any user-facing changes?
No