-
Notifications
You must be signed in to change notification settings - Fork 875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: DataType::contains support nested type #4042
Conversation
arrow-schema/src/datatype.rs
Outdated
DataType::List(field) | ||
| DataType::LargeList(field) | ||
| DataType::Map(field, _) | ||
| DataType::FixedSizeList(field, _) => field.data_type().contains(other), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should call through to Field::contains, which in turn should call through to DataType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to further clarify the definition of 'contain' through testing. Are the following tests accurate for this purpose?
#[test]
fn test_contains_nested_field() {
let child_field1 = Field::new("child1", DataType::Float16, false);
let child_field2 = Field::new("child2", DataType::Float16, false);
let field1 = Field::new(
"field1",
DataType::Struct(vec![child_field1.clone()].into()),
true,
);
let field2 = Field::new(
"field1",
DataType::Struct(vec![child_field1, child_field2].into()),
true,
);
assert!(field2.contains(&field1));
assert!(!field1.contains(&field2));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, adding fields is not supported. I agree the name is perhaps unhelpful. It is basically used for testing if the layout of a given set of arrays is compatible
arrow-schema/src/datatype.rs
Outdated
| DataType::Map(field, _) | ||
| DataType::FixedSizeList(field, _) => field.data_type().contains(other), | ||
DataType::Struct(fields) => { | ||
fields.iter().any(|field| field.data_type().contains(other)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should call through to Fields::contains
arrow-schema/src/field.rs
Outdated
}) { | ||
true | ||
} else if self.data_type().is_nested() { | ||
// if self is a nested type, check if self contains other | ||
self.data_type().contains(other) | ||
} else { | ||
false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will now return true if a nested field contains, but even if properties like self.name
are not equal?
Perhaps we could just change self.data_type == other.data_type
to be self.data_type.contains(&other.data_type)
arrow-schema/src/datatype.rs
Outdated
/// | ||
/// If DataType is a nested type, then it will check to see if the nested type is a superset of the other nested type | ||
/// else it will check to see if the DataType is equal to the other DataType | ||
pub fn contains(&self, other: &Field) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn contains(&self, other: &Field) -> bool { | |
pub fn contains(&self, other: &DataType) -> bool { |
I would expect this to take a DataType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we take a DataType
, how could we call Field::contains
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match (self, other) {
(DataType::Struct(f1), DataType::Struct(f2)) => f1.contains(f2),
....
}
7dceab6
to
69120f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor point for union arrays, otherwise this looks good, thank you
arrow-schema/src/datatype.rs
Outdated
(DataType::Map(f1, s1), DataType::Map(f2, s2)) => s1 == s2 && f1.contains(f2), | ||
(DataType::Struct(f1), DataType::Struct(f2)) => f1.contains(f2), | ||
(DataType::Union(f1, s1), DataType::Union(f2, s2)) => { | ||
s1 == s2 && f1.iter().all(|f1| f2.iter().any(|f2| f1.1.contains(f2.1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should also check the type IDs match?
be3ea75
to
5269654
Compare
Which issue does this PR close?
Closes #4029
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?