-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
I tried testing struct filter pushdowns in Iceberg by applying these dependent code changes viz.
- Spark pr for struct pushdown
- Iceberg writers for Parquet
- Changes to Metrics collection to add struct metrics in Iceberg
Iceberg rejects it with this validation error:
Caused by: com.netflix.iceberg.exceptions.ValidationException: Cannot find field 'location.lat' in struct: struct<1: age: optional int, 2: name: optional string, 3: friends: optional map<string, int>, 4: location: optional struct<7: lat: optional double, 8: lon: optional double>>
at com.netflix.iceberg.exceptions.ValidationException.check(ValidationException.java:42)
at com.netflix.iceberg.expressions.UnboundPredicate.bind(UnboundPredicate.java:76)
at com.netflix.iceberg.expressions.Projections$BaseProjectionEvaluator.predicate(Projections.java:138)
at com.netflix.iceberg.expressions.Projections$BaseProjectionEvaluator.predicate(Projections.java:94)
at com.netflix.iceberg.expressions.ExpressionVisitors.visit(ExpressionVisitors.java:147)
at com.netflix.iceberg.expressions.ExpressionVisitors.visit(ExpressionVisitors.java:160)
at com.netflix.iceberg.expressions.Projections$BaseProjectionEvaluator.project(Projections.java:108)
at com.netflix.iceberg.expressions.InclusiveManifestEvaluator.<init>(InclusiveManifestEvaluator.java:57)
at com.netflix.iceberg.BaseTableScan$1.load(BaseTableScan.java:153)
at com.netflix.iceberg.BaseTableScan$1.load(BaseTableScan.java:149)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
Test Gist : https://gist.github.com/prodeezy/001cf155ff0675be7d307e9f842e1dac
Based on discussions on dev mailing-list and Issue#78 we want to be able to support nested struct filtering in Iceberg. Although for now we want to avoid mixed fields like struct inside map or struct inside array as that changes the semantics of the expression For example, a.b = 5 can be run on a: struct<b: int> but can't be run on a: list<struct<b: int>>.
Issue#78 focusses on adding the metrics in Iceberg for struct fields, this issue is to address the expression handling for the same once the former is available.
/cc @aokolnychyi @rdblue