You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have personally seen several users of PartiQL with obscenely large queries using SELECT VALUE for the specific use-case of filtering out attributes from the input binding tuple. As an illustrative example, consider a table containing 26 attributes, where each attribute's name is a letter in the English alphabet.
In order to exclude an attribute (say, d) from the table in a PartiQL query, we would need to write something like:
SELECT
a, b, c, e, ...
q, r, s, t, u, v, w, x, y, z
FROM
alphabet_table
Or, another common approach from PartiQL users is:
SELECT VALUE {
'a': a,
'b': b,
...
'y': y,
'z': z
}
FROM
alphabet_table
I've even conversed with some individuals who needed to generate queries to exclude certain attributes from tables that have multiple thousands of attributes.
Possible Solutions
SELECT * EXCEPT (BigQuery)
A possible solution is syntax such as:
SELECT * EXCEPT (d)
FROM alphabet_table
The above is syntax supported by Google's BigQuery. Similarly, they have support for syntax such as:
SELECT * REPLACE (quantity/2 AS quantity)
FROM some_table
We could also accomplish the same as SELECT * EXCEPT by writing:
SELECT * REPLACE (MISSING AS d)
FROM alphabet_table
As MISSING is stripped from the output result in PartiQL, the above is likely valid as a replacement for EXCEPT!
Allowing LET to Modify Binding Tuples
A somewhat similar approach of replacement could be a syntax such as:
SELECT *
FROM some_table AS t
LET MISSING AS t.d
By expanding support for introducing variables in the LET to include replacing attributes of a binding tuple, this approach could be possible. However, there are some immediate cons such as ambiguity in naming for JOINs, projection pushdown, and variables introduced AFTER the LET.
Nested Data (Using both SELECT REPLACE and a function)
While SELECT * EXCEPT could be valuable for removing columns from the final projection, how could we remove attributes from any nested data (struct) in the result projection? As an example, consider the binding tuple:
How could we remove d from b while still outputting the rest of the result?
SELECT * REPLACE (
(SELECT * EXCEPT d FROM src.b) AS b
)
FROM alphabet_nested_table AS src
This could theoretically work if we coerce src.b into a bag, but now we have a bag in the projection list. Perhaps, then, for nested attributes, implementing a function to redact a struct's columns could be easier & worthwhile. I am personally unknowledgeable on the topic of projection pushdown for nested structures, but this custom function could possibly negatively affect our ability to perform the pushdown. However, let's keep the train of thought:
Let's try to remove both y and b.d from the result:
SELECT * REPLACE (
MISSING AS y,
redact_columns(b, ['d']) AS b
)
FROM alphabet_nested_table
Or, with some naming:
SELECT * REPLACE (
MISSING AS y,
redact_columns(b, columns_to_redact_from_b) AS b
)
LET ['d'] AS columns_to_redact_from_b
FROM alphabet_nested_table
Well, that's certainly interesting and readable.
Revisiting Nested Data and Projection Pushdown
Ideally, PartiQL could support filtering of both columns and nested attributes of structs. I'll take this opportunity to explore some possible syntax:
SELECT * REPLACE (
MISSING AS y,
y.* EXCEPT y.b, --- Or maybe just b. Or, even allowing just `y EXCEPT b` (not `y.*`)
z.* REPLACE (
some_nested_attr + 1 AS some_nested_attr,
MISSING AS some_attr_to_be_removed
)
)
FROM alphabet_nested_table
Then, we'd have to define rules for type mismatches, missing, null, etc. But, I'd argue that this is also interesting and readable! Even more interesting is possibly allowing replacing attributes of a struct as a first class expression! So, the following could be a valid query:
SELECT * REPLACE (
-- Option # 1 (Un-nesting to replace)
a REPLACE (
b REPLACE (
c REPLACE (
d + 1 AS d
)))
-- Option # 2 (Unsure about this)
a.b.c REPLACE (
d + 1 AS d
) AS a.b.c
-- Option # 3 (Same as option # 2, but we don't have the nested renaming)
a.b.c REPLACE (
d + 1 AS d
)
) FROM some_highly_nested_table
Description
I have personally seen several users of PartiQL with obscenely large queries using
SELECT VALUE
for the specific use-case of filtering out attributes from the input binding tuple. As an illustrative example, consider a table containing 26 attributes, where each attribute's name is a letter in the English alphabet.In order to exclude an attribute (say,
d
) from the table in a PartiQL query, we would need to write something like:Or, another common approach from PartiQL users is:
I've even conversed with some individuals who needed to generate queries to exclude certain attributes from tables that have multiple thousands of attributes.
Possible Solutions
SELECT * EXCEPT (BigQuery)
A possible solution is syntax such as:
The above is syntax supported by Google's BigQuery. Similarly, they have support for syntax such as:
We could also accomplish the same as
SELECT * EXCEPT
by writing:As MISSING is stripped from the output result in PartiQL, the above is likely valid as a replacement for EXCEPT!
Allowing LET to Modify Binding Tuples
A somewhat similar approach of replacement could be a syntax such as:
By expanding support for introducing variables in the LET to include replacing attributes of a binding tuple, this approach could be possible. However, there are some immediate cons such as ambiguity in naming for JOINs, projection pushdown, and variables introduced AFTER the LET.
Nested Data (Using both SELECT REPLACE and a function)
While
SELECT * EXCEPT
could be valuable for removing columns from the final projection, how could we remove attributes from any nested data (struct) in the result projection? As an example, consider the binding tuple:How could we remove
d
fromb
while still outputting the rest of the result?This could theoretically work if we coerce
src.b
into a bag, but now we have a bag in the projection list. Perhaps, then, for nested attributes, implementing a function to redact a struct's columns could be easier & worthwhile. I am personally unknowledgeable on the topic of projection pushdown for nested structures, but this custom function could possibly negatively affect our ability to perform the pushdown. However, let's keep the train of thought:Let's try to remove both
y
andb.d
from the result:Or, with some naming:
Well, that's certainly interesting and readable.
Revisiting Nested Data and Projection Pushdown
Ideally, PartiQL could support filtering of both columns and nested attributes of structs. I'll take this opportunity to explore some possible syntax:
Then, we'd have to define rules for type mismatches, missing, null, etc. But, I'd argue that this is also interesting and readable! Even more interesting is possibly allowing replacing attributes of a struct as a first class expression! So, the following could be a valid query:
And with highly nested data such as:
Other Links
The text was updated successfully, but these errors were encountered: