-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Document schema merging. #17249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document schema merging. #17249
Conversation
…uction & modification (e.g. LP optimizers)
datafusion/common/src/dfschema.rs
Outdated
/// - For qualified fields: both qualifier and field name must match | ||
/// - For unqualified fields: only field name needs to match | ||
/// | ||
/// Note: the merging operation prefers the first `self` fields, and the second `other_schema` metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this definition of schema merging behaves a differently in precedence for fields (prefer self), vs metadata (prefer other).
/// Merges two optional `FieldMetadata` instances, overwriting any existing | ||
/// keys in `m` with keys from `n` if present | ||
/// keys in `m` with keys from `n` if present. | ||
/// | ||
/// This function is commonly used in alias operations, particularly for literals | ||
/// with metadata. When creating an alias expression, the metadata from the original | ||
/// expression (such as a literal) is combined with any metadata specified on the alias. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This structure is intended for use with the aliasing of literals.
Since it has the same concept of field metadata merge
, I felt it was useful to add these docs in the same PR.
/// | ||
/// This function merges schemas from multiple logical plan inputs using [`DFSchema::merge`]. | ||
/// Refer to that documentation for details on precedence and metadata handling. | ||
pub fn merge_schema(inputs: &[&LogicalPlan]) -> DFSchema { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This prefers us to the other docs:
https://github.com/apache/datafusion/pull/17249/files#r2286497514
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor wording suggestions but overall looks good
I a running some benchmarks on this PR as a way to test my benchmark running script -- I don't expect that this PR changes the speed at all |
🤖 |
🤖: Benchmark completed Details
|
This comment was marked as outdated.
This comment was marked as outdated.
🤖 |
🤖: Benchmark completed Details
|
🤖 |
🤖 |
🤖: Benchmark completed Details
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which issue does this PR close?
Part of #12736.
Rationale for this change
Docs for how schema gets merge.
Note that we have 2 different definitions of schema "merging".
What changes are included in this PR?
Docs.
Are these changes tested?
N/A
Are there any user-facing changes?
Only docs.