-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query: Identifying columns in the case of distinct #15873
Comments
If collection is applying distinct and not projecting required identifying columns then we should throw exception saying that collection.Distinct() is not allowed. |
Consider scenario in #11178 |
Also TVF |
example: #20505 |
Example, GroupBy aggregate in the projection which is not projecting the correlation key column. |
Added validation step for AddCollectionJoin which checks that if subquery contains Distinct or GroupBy, the projection contains all identifying columns needed to correctly bucket the results during materialization. Fixes #15873
Added validation step for AddCollectionJoin which checks that if subquery contains Distinct or GroupBy, the projection contains all identifying columns needed to correctly bucket the results during materialization. Fixes #15873
Added validation step for AddCollectionJoin which checks that if subquery contains Distinct or GroupBy, the projection contains all identifying columns needed to correctly bucket the results during materialization. Fixes #15873
Added validation step for AddCollectionJoin which checks that if subquery contains Distinct or GroupBy, the projection contains all identifying columns needed to correctly bucket the results during materialization. Also making sure that identifying columns can be correctly propagated during pushdown and joining - if they are not we mark them as such (by removing identifying columns altogether), so that we can throw exception when these columns are actually needed. Fixes #15873
Added validation step for AddCollectionJoin which checks that if subquery contains Distinct or GroupBy, the projection contains all identifying columns needed to correctly bucket the results during materialization. Also making sure that identifying columns can be correctly propagated during pushdown and joining - if they are not we mark them as such (by removing identifying columns altogether), so that we can throw exception when these columns are actually needed. Fixes #15873 Fixes #20184
@maumar @smitpatel This issue is marked closed_fixed in rc1, but is still open and doesn't seem to be tracked by a PR. |
Added validation step for AddCollectionJoin which checks that if subquery contains Distinct or GroupBy, the projection contains all identifying columns needed to correctly bucket the results during materialization. Also making sure that identifying columns can be correctly propagated during pushdown and joining - if they are not we mark them as such (by removing identifying columns altogether), so that we can throw exception when these columns are actually needed. Fixes #15873 Fixes #20184
…21990) Added validation step for AddCollectionJoin which checks that if subquery contains Distinct or GroupBy, the projection contains all identifying columns needed to correctly bucket the results during materialization. Also making sure that identifying columns can be correctly propagated during pushdown and joining - if they are not we mark them as such (by removing identifying columns altogether), so that we can throw exception when these columns are actually needed. Fixes #15873 Fixes #20184
…s when projecting a subset of column and adding distinct As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates. Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique. In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key. Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate). Fixes #22049 Fixes #24288
…s when projecting a subset of column and adding distinct As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates. Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique. In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key. Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate). Fixes #22049 Fixes #24288
…s when projecting a subset of column and adding distinct As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates. Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique. In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key. Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate). Fixes #22049 Fixes #24288
…s when projecting a subset of column and adding distinct As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates. Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique. In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key. Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate). Fixes #22049 Fixes #24288
…s when projecting a subset of column and adding distinct As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates. Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique. In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key. Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate). Fixes #22049 Fixes #24288
…s when projecting a subset of column and adding distinct As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates. Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique. In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key. Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate). Fixes #22049 Fixes #24288
…s when projecting a subset of column and adding distinct As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates. Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique. In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key. Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate). Fixes #22049 Fixes #24288
…s when projecting a subset of column and adding distinct As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates. Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique. In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key. Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate). Fixes #22049 Fixes #24288
…s when projecting a subset of column and adding distinct As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates. Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique. In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key. Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate). Fixes #22049 Fixes #24288
…s when projecting a subset of column and adding distinct As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates. Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique. In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key. Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate). Fixes #22049 Fixes #24288
…s when projecting a subset of column and adding distinct As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates. Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique. In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key. Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate). Fixes #22049 Fixes #24288
…s when projecting a subset of column and adding distinct As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates. Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique. In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key. Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate). Fixes #22049 Fixes #24288
…s when projecting a subset of column and adding distinct As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates. Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique. In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key. Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate). Fixes #22049 Fixes #24288
…s when projecting a subset of column and adding distinct As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates. Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique. In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key. Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate). Fixes #22049 Fixes #24288
…s when projecting a subset of column and adding distinct As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates. Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique. In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key. Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate). Fixes #22049 Fixes #24288
…s when projecting a subset of column and adding distinct As fix to #15873 we started blocking some scenarios that used to work (by accident) - when we have a subquery using Distinct or GroupBy that doesn't happen to have any duplicates. Fix is to enable those scenarios (and others) by modifying identifier columns in case of distinct and group by, if the original identifiers are not already present. In case of distinct, the entire projection becomes unique identifier, as distinct guarantees it to be unique. In case of groupby, the grouping key becomes the identifier - since we only support grouping key or group aggregate in the projection, we are also guaranteed to have 1 row per unique grouping key. Also fix to #24288 - Query: add collection join tries to convert correlated collection from APPLY to JOIN for subqueries with Distinct and GroupBy, which is incorrect we would always try to convert subquery with groupby and distinct from apply to join, however we can only do this if the projection already contains the join key. Otherwise, adding the join key to the projection would change the meaning of operation in case of distinct and create invalid query in case of group by (projecting column that is not part of grouping key or aggregate). Fixes #22049 Fixes #24288
No description provided.
The text was updated successfully, but these errors were encountered: