-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate restricting router planner to a particular database model #692
Comments
I don't think that's really a problem we can solve via code for now. With
CTEs, subquery joins, and such, as needed by the use-cases targeted, we
can't insist on a join on a partition column without preventing the
use-case in the first place. I think all we can do for now is to improve
our docs and error messages.
|
An example would be if all tables have a tenant_id column by which the tables are partitioned. We could then require all CTEs, subqueries, etc. to have the same restriction on tenant_id. |
On 2016-08-01 11:23:48 -0700, Marco Slot wrote:
That'll prevent "manually broadcasted"/reference type/1-shard tables |
There's a question of whether the default database model should allow such tables (e.g., Citus Cloud doesn't). In any case, it seems this decision can be made per table and we don't need to require a filter on single shard tables. We may need to make replicated, single shard tables and co-located tables explicit in the metadata to properly implement these restrictions. |
It is hard to understand what is supported and what is not supported. Such as, these queries work; SELECT l_orderkey FROM lineitem where l_orderkey = 2 or l_orderkey = 32
SELECT l_orderkey FROM lineitem WHERE l_orderkey in
(SELECT l_orderkey FROM lineitem where l_orderkey = 2); But this one fails; SELECT l_orderkey FROM lineitem WHERE l_orderkey in
(SELECT l_orderkey FROM lineitem where l_orderkey = 2 or l_orderkey = 32); Also, the error message is not very meaningful:
|
having a join on partition column requirement looks very restrictive and difficult to check when you consider all sorts of queries we want to support. |
We discussed this with @ozgune , @anarazel and @marcocitus here are the notes @mtuncer stated there are 2 outstanding issues with usability. 2 - user runs a query, makes a slight change in the filter, query is not supported anymore and the error message does not provide any help to suggest what has happened. @metdos `s last comment describes this. @anarazel said he is for better error/warning messages instead of changing(reducing) query coverage. @marcocitus suggested an alternative to run a heuristics to determine if this query could still be supported despite an environmental change. We can use co-location property of pruned out shards to determine if we can reliably support this query. If not we can display a warning message something meaning you got lucky this time, this query might not be supported next time. Next Steps:
@samay-sharma , @begriffs , @metdos please feel free to join the discussion here. We will hold another session when all parties are present. |
@marcocitus you were going to outline the heuristics algorithm if I recall correctly. Any updates on that ? |
Soon I will be doing some changes related to #692 in router planner and those changes require updating ~5/6 tests related to router planning. And to make those test files runnable by run_test.py multiple times, we need to make some other tests (that they're run in parallel / they badly depend on) ready for run_test.py too.
…ed tables via router planner (#6793) Today we allow planning the queries that reference non-colocated tables if the shards that query targets are placed on the same node. However, this may not be the case, e.g., after rebalancing shards because it's not guaranteed to have those shards on the same node anymore. This commit adds citus.enable_non_colocated_router_query_pushdown GUC that can be used to disallow planning such queries via router planner, when it's set to false. Note that the default value for this GUC will be "true" for 11.3, but we will alter it to "false" on 12.0 to not introduce a breaking change in a minor release. Closes #692. Even more, allowing such queries to go through router planner also causes generating an incorrect plan for the DML queries that reference distributed tables that are sharded based on different replication factor settings. For this reason, #6779 can be closed after altering the default value for this GUC to "false", hence not now. DESCRIPTION: Adds `citus.enable_non_colocated_router_query_pushdown` GUC to ensure generating a consistent distributed plan for the queries that reference non-colocated distributed tables (when set to "false", the default is "true").
Currently the router planner can plan practically anything that can be executed by changing table names into shard names provided that the shards are in the same physical location. This is very powerful, but has significant usability implications.
First-time users often start with 2 worker nodes and a replication factor of 2. In this case, the router planner can run practically any SQL query that hits a single shard in each table. When users start expanding their cluster and moving shards or changing filter values, this may no longer be the case, potentially causing and outage or forcing the user to redesign their application.
Evaluate restricting the router planner to a particular data model by default such that it only permits queries on shards that are guaranteed to be plannable given the database model (e.g. ensure that all tables in the query are co-located and have the same distribution column filter). This would prevent queries from breaking when the physical layout of shards or the specific values of filters change. We can keep the current, unrestricted behaviour as an option, as we do with subquery pushdown.
An alternative approach could be to add some logic to the workers to always allow router queries. For example, if each shard that is not physically located on a worker had a postgres_fdw instead, then we can effectively remove the restriction that shards need to be on the same worker. This would have more general benefits, such as supporting broadcast joins, but comes with its own usability and performance implications.
Some examples of confusing behaviour:
The following query requires task-tracker when changing one value because we're joining tables that are not always co-located:
Union is simultaneously supported and "unsupported" (we should revise our error messages):
Running shard rebalancer might cause the same query to suddenly become "unsupported":
The text was updated successfully, but these errors were encountered: