Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EXCLUDE to the parser, ast, plan, and plan schema inferencer #1226

Merged
merged 12 commits into from
Sep 29, 2023

Conversation

alancai98
Copy link
Member

Relevant Issues

Description

Experimental addition for the EXCLUDE operator to the

  • parser
  • ast
  • partiql-plan
  • partiql-plan schema inferencer

This operator serves to remove nested attributes and values from binding tuples before they're projected using SELECT/PIVOT.

Other Information

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@alancai98 alancai98 self-assigned this Sep 26, 2023
@github-actions
Copy link

github-actions bot commented Sep 26, 2023

Conformance comparison report

Base (316fe36) 4749525 +/-
% Passing 92.33% 92.33% 0.00%
✅ Passing 5372 5372 0
❌ Failing 446 446 0
🔶 Ignored 0 0 0
Total Tests 5818 5818 0

Number passing in both: 5372

Number failing in both: 446

Number passing in Base (316fe36) but now fail: 0

Number failing in Base (316fe36) but now pass: 0

@codecov-commenter
Copy link

codecov-commenter commented Sep 26, 2023

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (316fe36) 71.26% compared to head (ec6da34) 71.48%.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1226      +/-   ##
============================================
+ Coverage     71.26%   71.48%   +0.21%     
- Complexity     2394     2444      +50     
============================================
  Files           229      229              
  Lines         17455    17595     +140     
  Branches       3215     3244      +29     
============================================
+ Hits          12440    12577     +137     
- Misses         4008     4010       +2     
- Partials       1007     1008       +1     
Flag Coverage Δ
CLI 13.70% <ø> (ø)
EXAMPLES 80.28% <ø> (ø)
LANG 78.62% <92.95%> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
.../visitors/GroupByPathExpressionVisitorTransform.kt 81.48% <100.00%> (+0.46%) ⬆️
...partiql/lang/eval/visitors/VisitorTransformBase.kt 97.56% <100.00%> (+0.12%) ⬆️
...org/partiql/lang/planner/PlanningProblemDetails.kt 74.50% <100.00%> (+2.16%) ⬆️
.../partiql/lang/planner/transforms/plan/PlanUtils.kt 61.53% <100.00%> (+0.75%) ⬆️
...rtiql/lang/planner/transforms/plan/RelConverter.kt 70.04% <100.00%> (+2.57%) ⬆️
...rtiql/lang/planner/transforms/plan/RexConverter.kt 63.15% <100.00%> (ø)
.../org/partiql/lang/syntax/impl/PartiQLPigVisitor.kt 89.14% <94.11%> (+0.12%) ⬆️
.../partiql/lang/planner/transforms/plan/PlanTyper.kt 60.95% <90.00%> (+3.35%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +341 to +342
typeEnv = typeEnv,
properties = input.getProperties()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was for fixing the ORDER BY schema inference behavior.

Tests with corrected behavior:

SuccessTestCase(
name = "ORDER BY int",
catalog = CATALOG_AWS,
catalogPath = listOf("ddb"),
query = "SELECT * FROM pets ORDER BY id",
expected = TABLE_AWS_DDB_PETS_LIST
),
SuccessTestCase(
name = "ORDER BY str",
catalog = CATALOG_AWS,
catalogPath = listOf("ddb"),
query = "SELECT * FROM pets ORDER BY breed",
expected = TABLE_AWS_DDB_PETS_LIST
),
SuccessTestCase(
name = "ORDER BY str",
catalog = CATALOG_AWS,
catalogPath = listOf("ddb"),
query = "SELECT * FROM pets ORDER BY unknown_col",
expected = TABLE_AWS_DDB_PETS_LIST
),
.

Previously these tests would output a bag rather than a list.

Copy link
Contributor

@am357 am357 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking working on this feature!

I didn't pay too much attention to the algorithm as it's implemented in the PlanTyper. I focused more on the schema inferencer tests and left couple of comment. I also didn't bother with Kotlin style recommendation and alike.

@alancai98 alancai98 requested a review from am357 September 27, 2023 20:12
Copy link
Member

@johnedquinn johnedquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to finish reviewing some last files, but sending some comments for immediate feedback.

Comment on lines 340 to 341
excludeExpr
: symbolPrimitive excludeExprSteps+; // Require 1 more `excludeExprSteps`. Disallow `EXCLUDE a`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why a step is required? I imagined that the items within the EXCLUDE clause would follow the same resolution rules as the projection items. If ambiguities are encountered, errors are thrown.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose the conservative approach here to make one step required, so we wouldn't have to deal with the ambiguous case. So users must define the fully qualified exclude path expression that they wish to exclude.

We could always allow for 0 excludeExprSteps in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth mentioning EXCLUDE is currently modeled as a binding tuple operator (evaluated before the SELECT projection). We are trying to disallow excluding a top-level binding explicitly (e.g. disallow SELECT ... EXCLUDE t FROM s, t) since we don't have a clear use case for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binding tuple produced in case of SELECT ... FROM s, t may look like:

B = <<  < s : ..., t: ....>, 
              < s: ..., t: ....>, 
              ......
        >>

Not sure why semantically SELECT ... EXCLUDE t FROM s, t should not be allowed.

That said, if this makes the implementation harder, we can add a notes for now since there is no clear use case at the moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe it made the implementation of EXCLUDE harder. The general idea was that when introducing a binding variable in a FROM source (or group by, let clause, etc.) why would you then exclude the variable itself? That use case wasn't clear to me during this PR. I can make a note of it in the ANTLR grammar.

We can always add binding variable exclusion in the future.

partiql-parser/src/main/antlr/PartiQL.g4 Show resolved Hide resolved
partiql-ast/src/main/pig/partiql.ion Outdated Show resolved Hide resolved
Co-authored-by: John Ed Quinn <40360967+johnedquinn@users.noreply.github.com>
Copy link
Contributor

@yliuuuu yliuuuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Is there any RFC in draft or document that comes along with this feature at the moment?

(product exclude_expr root::identifier steps::(* exclude_step 1))

(sum exclude_step
// `someRoot.someField` which is equivalent to `someRoot['someField']` (case-sensitive)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this case sensitive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry wording was confusing. Meant to say someRoot['someField'] is equivalent to someRoot."someField". I'll update in next commit.

@@ -494,6 +494,7 @@ expr::[
// The PartiQL `<sfw>` query expression, think SQL `<query specification>`
s_f_w::{
select: select, // oneof SELECT / SELECT VALUE / PIVOT
exclude: optional::exclude,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow PIVOT x.v AT x.a EXCLUDE ...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that will be allowed. There's no tests yet since PIVOT is not yet fully implemented in the plan schema inferencer.

I will add a parser test showing that it's allowed though.

Comment on lines 340 to 341
excludeExpr
: symbolPrimitive excludeExprSteps+; // Require 1 more `excludeExprSteps`. Disallow `EXCLUDE a`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binding tuple produced in case of SELECT ... FROM s, t may look like:

B = <<  < s : ..., t: ....>, 
              < s: ..., t: ....>, 
              ......
        >>

Not sure why semantically SELECT ... EXCLUDE t FROM s, t should not be allowed.

That said, if this makes the implementation harder, we can add a notes for now since there is no clear use case at the moment.

val rootId = expr.root
if (attr.name == rootId || (expr.rootCase == Case.INSENSITIVE && attr.name.equals(expr.root, ignoreCase = true))) {
if (expr.steps.isEmpty()) {
// no other exclude steps; don't include attr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on this comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch is casing on whether the excludeExprSteps is empty (e.g. if excludeExpr in EXCLUDE <excludeExpr> was just an identifier without any subsequent excludeExprSteps). This should actually be caught earlier during the parsing phase since we don't allow empty excludeExprSteps. I'll change this to be an illegal state exception.

@@ -91,6 +94,7 @@ internal class RelConverter {
rel = convertHaving(rel, sel.having)
rel = convertOrderBy(rel, sel.order)
rel = convertFetch(rel, sel.limit, sel.offset)
rel = convertExclude(rel, sel.excludeClause)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my understanding:
SELECT a, b, c EXCLUDE tbl.c FROM tbl is valid, and the plan would look like:

Project: 
   |_ a
   |_ b
   |_ c 
   |_ exclude
      |_ c 
      |_ scan 
        |_ tbl 

And the result would have been: << {a: ..., b:..., c: missing}, .... >>

Not sure if this should be allowed, or if we need a semantic pass to disable such cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that plan looks correct. And yes that's the result assuming there is no other c defined.

Not sure if this should be allowed, or if we need a semantic pass to disable such cases.

That is currently allowed. Perhaps we can add another pass at some point or some check in PlanTyper. Will make note of this case in the EXCLUDE code in PlanTyper.

Comment on lines +244 to +246
if (steps.size > 1) {
elementType = excludeExprSteps(elementType, steps.drop(1), lastStepAsOptional = true, ctx)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanity check here.
So if we have {'a' : [0, 1, 2]} and exclude a[0], the result will be {'a' : [1,2]}, hence optional, i.e., unionOf(missing, int).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this?

I would assume that the static type of a would be a ListType(INT). IMO, even if we had:

SELECT * EXCLUDE t.a[0]
FROM << { 'a': [0] } >> AS t

I would assume the static type of t.a would still be ListType(INT) because we can't know how long the list actually is (especially if t is part of the catalog). If t.a started without any elements, I also think it would still be ListType(INT). That being said @alancai98 , do we expect this to be a runtime error when we attempt to exclude an index that doesn't exist (essentially OutOfBoundsException)? How about when we try to exclude an attribute in a tuple that doesn't exist? This one we can sometimes know statically, but what happens at runtime if it's a union type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanity check here. So if we have {'a' : [0, 1, 2]} and exclude a[0], the result will be {'a' : [1,2]}, hence optional, i.e., unionOf(missing, int).

Not quite. In this case, a's type would be ListType(INT). The lastTypeAsOptional is currently only used for when there's a subsequent field to exclude after the collection index (such as EXCLUDE t.a[0].field -- in this case, field's type would be unionOf(missing, <field's original type>)).

This is demonstrated by the below test case:

SuccessTestCase(
name = "EXCLUDE SELECT star list index and list index field",
query = """SELECT *
EXCLUDE
t.a.b.c[0],
t.a.b.c[1].field
FROM [{
'a': {
'b': {
'c': [
{
'field': 0 -- c[0]
},
{
'field': 1 -- c[1]
},
{
'field': 2 -- c[2]
}
]
}
},
'foo': 'bar'
}] AS t""",
expected = BagType(
StructType(
fields = mapOf(
"a" to StructType(
fields = mapOf(
"b" to StructType(
fields = mapOf(
"c" to ListType(
elementType = StructType(
fields = mapOf(
"field" to AnyOfType(
setOf(
StaticType.INT,
StaticType.MISSING // c[1]'s `field` was excluded
)
)
),
contentClosed = true,
constraints = setOf(TupleConstraint.Open(false), TupleConstraint.UniqueAttrs(true))
)
)
),
contentClosed = true,
constraints = setOf(TupleConstraint.Open(false), TupleConstraint.UniqueAttrs(true))
)
),
contentClosed = true,
constraints = setOf(TupleConstraint.Open(false), TupleConstraint.UniqueAttrs(true))
),
"foo" to StaticType.STRING
),
contentClosed = true,
constraints = setOf(TupleConstraint.Open(false), TupleConstraint.UniqueAttrs(true), TupleConstraint.Ordered)
)
)
),
.

^ In above, c's element type is not a union of StructType(...) and MISSING.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding runtime behavior, this hasn't yet been decided and will be part of the upcoming RFC. The schema inferencer doesn't currently give an error if an attribute doesn't exist. We can choose to add errors in the future. I can add this note to the EXCLUDE code.

It currently gives a data type mismatch if the types doesn't match up:

// assuming a is a list
EXCLUDE t.a.*

^ will give an error

Comment on lines 207 to 218
query = """SELECT * EXCLUDE t."a".b.c
FROM <<
{
'a': {
'B': {
'c': 0,
'C': true,
'd': 'foo'
}
}
}
>> AS t""",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that we are seemingly excluding ALL attributes that match the exclusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the current behavior until an RFC formalizes what to do here. I'll make a note of it in the PlanTyper + test case.

Comment on lines 1969 to 1981
ErrorTestCase(
name = "invalid exclude collection wildcard",
query = """SELECT * EXCLUDE t.a[*]
FROM <<
{
'a': {
'b': {
'c': 0,
'd': 'foo'
}
}
}
>> AS t""",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you added a similar, but different, test?

            ErrorTestCase(
                name = "invalid exclude collection wildcard",
                query = """SELECT * EXCLUDE t.a[*]
                    FROM <<
                        {
                            'a': {
                                'b': {
                                    'c': 0,
                                    'd': 'foo'
                                }
                            }
                        },
                        {
                            'a': [
                                {
                                    'c': 0,
                                    'd': 'foo'
                                }
                            ]
                        }
                    >> AS t""",

Since t.a is a union of struct and list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it's a no-op (i.e. nothing in the union gets excluded). This could change in the RFC to possibly exclude on each type in the union. Added a regression test and a note in the PlanTyper.

Copy link
Member

@johnedquinn johnedquinn Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need to add support for it outright, since null is modeled as a distinct StaticType. AKA, very similar query that represents many flows that other SQL implementations might experience:

SELECT * EXCLUDE t.a[*]
FROM <<
  {
    'a': NULL
  },
  {
    'a': [
      {
        'c': 0,
        'd': 'foo'
      }
    ]
  }
>> AS t

Especially since everything is nullable by default in SQL. Similar experience:

-- I know this isn't real syntax, but it's meant for demonstration.
CREATE TABLE t1 (
  a STRUCT (
    c INT,
    d STRING
  )
);

-- A similar syntax that Redshift supports would be
CREATE TABLE t1 (
  a SUPER
);

a in both queries would be nullable, and therefore the following query would be wrongly typed:

SELECT * EXCLUDE t.a.*
FROM t1 AS t

If I'm not mistaken, the type currently returned would be something like:

BagType(
  StructType(c: IntType(), d: StringType())
)

Since t would be the unionOf(NULL, StructType( ... )). But, I imagine the expected would be:

BagType(
  StructType()
)

EDIT: I updated the first example snippet to use:

  {
    'a': NULL
  },

instead of just NULL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree w/ the point that EXCLUDE over type unions are important after some offline discussion. I still need some clarification on what the desired behavior should be w.r.t. typed unions.

For example, consider

SELECT t
EXCLUDE t.a.b
FROM <<
    {
        'a': 1 {
            'b': 2,
            'c': 3
        }
    },
    {
        'a': NULL
    }
>> AS t

Currently the schema is something like:

Bag(
    Struct(
        a: unionOf(null, struct(b: int, c: int))
    )
)

^ basically ignores the union type for a.

Would the schema you propose then be:

Bag(
    Struct(
        a: unionOf(null, struct(c: int)) -- excluding `b` from the struct in the union
    )
)

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would need to be -- or else we would be accidentally letting data pass through. The end-user wants to make sure that the b attribute isn't shown, and IMO nullability shouldn't hinder that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming. Latest commit d60d08c should cover the union case as well as some other edge cases discussed offline.

PlanningProblemDetails.InvalidExcludeExpr
)
}
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test regarding attributes that do not exist in a tuple? I'm curious about the behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add regression test for this behavior. Currently it won't error. Once we decide on the behavior in the RFC, this may/may not change.

PlanningProblemDetails.InvalidExcludeExpr
)
}
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about the behavior when an EXCLUDE attribute is removed and referenced later on in the exclude list. Something like:

SELECT * EXCLUDE t.a, t.a.b FROM << { 'a': { 'b': 1 } } >> AS t 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a regression test. Currently, it won't give any error. This may/may not change depending on what we define in the RFC.

Just for some more info, we may consider adding some normalization pass over these EXCLUDE expressions, such that there won't be "t.a.b" since "t.a" is also excluded.

Comment on lines +244 to +246
if (steps.size > 1) {
elementType = excludeExprSteps(elementType, steps.drop(1), lastStepAsOptional = true, ctx)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this?

I would assume that the static type of a would be a ListType(INT). IMO, even if we had:

SELECT * EXCLUDE t.a[0]
FROM << { 'a': [0] } >> AS t

I would assume the static type of t.a would still be ListType(INT) because we can't know how long the list actually is (especially if t is part of the catalog). If t.a started without any elements, I also think it would still be ListType(INT). That being said @alancai98 , do we expect this to be a runtime error when we attempt to exclude an index that doesn't exist (essentially OutOfBoundsException)? How about when we try to exclude an attribute in a tuple that doesn't exist? This one we can sometimes know statically, but what happens at runtime if it's a union type?

@alancai98
Copy link
Member Author

Is there any RFC in draft or document that comes along with this feature at the moment?

I have parts of the RFC in a draft. Still working on defining the operational semantics in terms of existing clauses (e.g. PIVOT, UNPIVOT, CASE WHEN) before I'm ready to publish the draft.

Copy link
Contributor

@yliuuuu yliuuuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some suggestion on putting in a couple additional test cases.

Also, it would be nice to explicitly list out what the subset of behavior that is considered stable.

Copy link
Contributor

@am357 am357 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not mistaken, all the tests in this PR are working based off of literal data. It is great that we are examining the schema inferencing using various literals but perhaps for most of the use-cases we are doing the schema inference based on some base tables (in a catalog)—which will automatically remove most of the edge cases we see in the tests as a serious concern. This is b/c in my view, literal semi-structured data implies partial or no-schema which doesn't require static validation as such.

@alancai98 apologies to not to mention this earlier, does it make sense adding tests using a schema in the catalog? We don't need to go as deep as the current tests, but just to show the four exclude cases.

partiql-ast/src/main/pig/partiql.ion Show resolved Hide resolved
Copy link
Member

@johnedquinn johnedquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some pretty solid work. Thanks for addressing my comments thus far. I left a single comment regarding nonexistent bindings. One you resolve that comment plus the others from the other maintainers, I think we can ship this.

Comment on lines 201 to 203
attrs.forEach { attr ->
val rootId = expr.root
if (attr.name == rootId || (expr.rootCase == Case.INSENSITIVE && attr.name.equals(expr.root, ignoreCase = true))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this would be required as part of the RFC, but If we wanted to make sure we don't come into a situation such as:

SELECT * EXCLUDE nonsense.a
FROM t

we could:

val attrsExist = attrs.find { attr -> attr.name == .. } != null
if (attrsExist.not()) { handleUnresolvedDescriptor(..) }
attrs.forEach { ->
  if (attr.name == ..) {
    ..

It should make sure that we don't accidentally break future queries. It's IMO better to error now even though we aren't sure what the RFC will say. Then, if we allow nonexistent bindings after the RFC, we can always remove the error? Then you can add the second test that @yliuuuu mentioned above.

@yliuuuu A couple more test cases that can be beneficial:

LMK what you think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it would surely help users who write:

SELECT a
EXCLUDE a.b
FROM <<
  { 'a': { 'b': 1, 'c': 2 } }
>> AS t

Since we are specifically looking in the bindings coming from the SCAN, when we look for the EXCLUDE a.b (and specifically looking for a), we are silently ignoring the fact that a isn't in the EXCLUDE's input bindings. So, even though the projection list can use a, the EXCLUDE is silently letting this data to pass through. The current implementation, if I'm not mistaken, will return:

<<
  { 'a': { 'b': 1, 'c': 2 } }
>>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since resolution of the EXCLUDE expressions is not defined, I think it's perfectly fine to mandate that users write ... EXCLUDE t.a.b explicitly. Though, if we silently let them write a.b without notifying them that a isn't a binding (and therefore they think that their query will strip b), we'll be leaking data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I had originally wanted to punt on the erroring behavior to form a consistent mental model for when to give an error in the RFC. The more I think about it though, if the exclude expr root is not in the binding tuples, then we should give some error. I'll adjust to at least give an error in this case (i.e. when the root doesn't exist)

@johnedquinn
Copy link
Member

Responding to @am357:

This is b/c in my view, literal semi-structured data implies partial or no-schema which doesn't require static validation as such.

...

@alancai98 apologies to not to mention this earlier, does it make sense adding tests using a schema in the catalog? We don't need to go as deep as the current tests, but just to show the four exclude cases.

So you know, @am357, we actually do perform static analysis on literals. That being said, I agree about adding just the base 4 exclude cases to use the catalog too, as our users will likely be using this feature the most.

am357
am357 previously approved these changes Sep 29, 2023
johnedquinn
johnedquinn previously approved these changes Sep 29, 2023
Copy link
Member

@johnedquinn johnedquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this great PR! LGTM.

@alancai98 alancai98 dismissed stale reviews from johnedquinn and am357 via ec6da34 September 29, 2023 22:25
Copy link
Contributor

@yliuuuu yliuuuu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the great work!

@alancai98 alancai98 merged commit c7432aa into main Sep 29, 2023
@alancai98 alancai98 deleted the exclude-binding-tuple-op branch September 29, 2023 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants