-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate functions in INSERT..SELECT #1010
Conversation
a5729bd
to
4e0446f
Compare
* subquery RTE that returns no results. | ||
*/ | ||
static void | ||
ConvertRteToSubqueryWithEmptyResult(RangeTblEntry *rte) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved (without change) from multi_router_planner.c as a dependency of UpdateRelationToShardNames.
* | ||
*/ | ||
bool | ||
UpdateRelationToShardNames(Node *node, List *relationShardList) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adapted from UpdateRelationNames multi_router_planner.c
I moved it into its own file because it's now being called both directly from the router planner as well as from the router executor via RebuildQueryStrings, it thus didn't seem very specific to the router planner anymore, but rather to deparsing shard queries in general.
be621fb
to
f6b3067
Compare
f6b3067
to
1a216c6
Compare
Current coverage is 89.79% (diff: 99.41%)@@ master #1010 diff @@
==========================================
Files 72 73 +1
Lines 18573 18645 +72
Methods 1143 1147 +4
Messages 0 0
Branches 0 0
==========================================
+ Hits 16674 16742 +68
- Misses 1899 1903 +4
Partials 0 0
|
bed0024
to
6f4659a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an issue with returning incorrect results
@@ -1475,8 +1475,11 @@ int | |||
GetRTEIdentity(RangeTblEntry *rte) | |||
{ | |||
Assert(rte->rtekind == RTE_RELATION); | |||
Assert(IsA(rte->values_lists, IntList)); | |||
Assert(list_length(rte->values_lists) == 1); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we changing the behavior here. Perhaps we should keep asserts after NULL check.
@@ -293,7 +288,7 @@ CreateInsertSelectRouterPlan(Query *originalQuery, | |||
workerJob->jobQuery = originalQuery; | |||
|
|||
/* for now we do not support any function evaluation */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment is obsolete with the change
@@ -1873,7 +1869,7 @@ RouterSelectTask(Query *originalQuery, RelationRestrictionContext *restrictionCo | |||
*/ | |||
static bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a line about relationShardList in function comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a line.
{ | ||
relationShard = (RelationShard *) lfirst(relationShardCell); | ||
|
||
if (newRte->relid == relationShard->relationId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am finding this difficult to understand. The previous GetRteIdentity call was used to differentiate between multiple usages of the same relation id. Especially when the same relation was used in different levels of nested subqueries.
We have given each relation RTE a unique Id in the beginning and search for that particular id since relation Id by itself was not enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I found an example query: a_table is distributed along integer column a. Values 8 and 24 fall into a different shards lets name it shard_8 and shard_24. And those shards happened to be on the same worker.
select * from a_table where a = 8 UNION select * from a_table where a = 24;
After shard renaming we expect that query to be translated to
select * from a_table_shard_8 where a = 8 UNION select * from a_table_shard_24 where a = 24;
However, with these changes it becomes
select * from a_table_shard_8 where a = 8 UNION select * from a_table_shard_8 where a = 24;
and return incorrect results.
We should either reject this query, or return correct results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another query with the same problem
select * from a_table where a > 8 and a = 8 union select * from a_table where a = 24;
it becomes
select * from a_table_shard_8 where a > 8 and a = 8 union select * from a_table_shard_8 where a = 24;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I had rteIdentity in RelationShard at first, but removed it because values_list doesn't get passed down to the executor and using relation ID seemed sufficient, but I keep forgetting that we allow queries that reference multiple shards of the same table if they happen to be on the same machine (even though I opened #692 myself 😶 ).
I would be very much in favour of erroring out on queries with >1 shards per relation, since it seems better to error out early than to wait until a shard rebalance / tenant isolation / placement invalidation / value changes and allowing it leads to some rather ugly hacks. We now also have the necessary information in the metadata to warn/error when using shards that are not co-located or reference tables, which was missing when we were discussing #692.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an error message. Discussed this with @ozgune and he's fine with erroring out for these queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also run this by @anarazel. He did not think this is a good idea.
767473e
to
7f66a62
Compare
@@ -1478,6 +1478,11 @@ GetRTEIdentity(RangeTblEntry *rte) | |||
Assert(IsA(rte->values_lists, IntList)); | |||
Assert(list_length(rte->values_lists) == 1); | |||
|
|||
if (rte->values_lists == NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check should happen before Asserts. In fact GetRTEIdentity functions is never called. Consider removing the function too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing GetRTEIdentity also requires removing matching function IdentifyRTE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, removed.
{ | ||
RangeTblEntry *rte = (RangeTblEntry *) lfirst(rteCell); | ||
|
||
if (rte->rtekind != RTE_SUBQUERY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we ignore CTEs by choice ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. We only support very trivial CTEs right now, but should still be supported and might otherwise slip through later.
* to prevent concurrent DML statements on those shards. | ||
*/ | ||
void | ||
LockRelationShardListResources(List *relationShardList, LOCKMODE lockMode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actual function name and name in comments are different. Consider using more descriptive name. LockRelationShardListResources made me think it is locking the list, not individual shards in ShardList.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was asked to change LockShardResources
to LockShardListResources
in another PR so it seems slightly inconsistent, but I'm fine with either form.
231baaf
to
2a703fe
Compare
} | ||
} | ||
|
||
foreach(cteCell, query->cteList) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would go for only referenced RTEs but guess it does not matter here.
🚢 |
don't forget to squash |
2a703fe
to
11031bc
Compare
Evaluate functions in INSERT..SELECT
While writing https://www.citusdata.com/blog/2016/11/29/event-aggregation-at-scale-with-postgresql/ I ran into a number of usability issues with INSERT..SELECT. In particular, it errors out when using functions like now(), date_trunc(), the timestamptz type, or even simple string||int concatenation, which can be very inconvenient, especially for doing roll-ups.
The problem with doing function evaluation for INSERT .. SELECT is that we perform different transformation on the query trees in the planner for each shard. While we could pass the entire query tree into the plan for each task and subsequently perform function evaluation on each query tree, this can cause substantial overhead.
This PR introduces a generic list of relation<->shard mappings ("RelationShard") that gets passed down with every (router planner) task, such that the per-shard query tree can be reconstructed from the jobQuery in the router executor and applies function evaluation to the SELECT part of the INSERT .. SELECT. The infrastructure will also allow us to deparse and thus perform function evaluation on more complex queries (e.g. SELECT) in the executor.
The RelationShard list replaces the existing selectShardList used for INSERT..SELECT lcoking, since it contains the same information. This change reduces overhead since selectShardList was a list of ShardInterval structs, whereas RelationShard only contains a relationId and a shardId.
Part of #961