-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for schema-based-sharding via a GUC #6866
Conversation
e9365fa
to
a286958
Compare
dab92d7
to
31efdde
Compare
7f02706
to
dd346f6
Compare
a18b16b
to
fc3cccd
Compare
400f58d
to
65d2b72
Compare
fc3cccd
to
9494c6a
Compare
3c7050e
to
0c26df3
Compare
d26084a
to
de2ff1f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good overall, after addressing open comments this seems ready to merge.
* Avoid unnecessarily adding the table into metadata if we will | ||
* distribute it as a tenant table later. | ||
*/ | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still kind of wondering whether we can call ConvertNewTableIfNecessary from here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm maybe we can do but would might mean a somewhat big change in the context of this PR --need to understand why we didn't do so for ConvertToManagedTable before.
"by using colocate_with option."), | ||
errhint("Consider using \"ALTER TABLE %s SET SCHEMA %s;\" " | ||
"to move the table to the same schema with the " | ||
"tenant table.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not possible today. But a time ago, I was thinking that we would definitely have such a logic for ALTER TABLE SET SCHEMA, in 12.0. Let me improve the hint and then we can adjust it if we decide making ALTER TABLE SET SCHEMA clever in this release.
Codecov Report
@@ Coverage Diff @@
## main #6866 +/- ##
==========================================
- Coverage 93.26% 93.26% -0.01%
==========================================
Files 269 271 +2
Lines 57558 57813 +255
==========================================
+ Hits 53684 53919 +235
- Misses 3874 3894 +20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need to do something about this syntax, since it currently results in a local table existing in a tenant schema:
create schema t2 create table test (x int, y int);
nit about weird syntax: SET citus.enable_schema_based_sharding TO off;
CREATE SCHEMA "CiTuS.TeeN"; This fails like below: NOTICE: issuing SELECT pg_catalog.citus_internal_add_tenant_schema("CiTuS.TeeN"::regnamespace, 1)
DETAIL: on server aykutbozkurt@localhost:9701 connectionId: 1
ERROR: column "CiTuS.TeeN" does not exist We can use |
$$); | ||
|
||
DROP OWNED BY test_other_super_user; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use a CREATE FOREIGN TABLE Test, also we may need to update the error/hint
create foreign table t6.foreign_test (x int, y int) server local_server options (schema_name 'public', table_name 'test');
ERROR: foreign tables cannot be distributed
HINT: Can add foreign table "foreign_test" to metadata by running: SELECT citus_add_local_table_to_metadata($$t6.foreign_test$$);
CREATE SCHEMA tenant_10; | ||
CREATE TABLE tenant_10.tbl_1(a int, b text); | ||
CREATE TABLE tenant_10.tbl_2(a int, b text); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider adding a CREATE TABLE tenant_10.tbl_3 AS SELECT s, s FROM generate_series(1,100) s;
test
This currently results in a local table being created in the schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now we're throwing an error for that
I had 2 questions. First one looks like works and is resolved. The second one is a problem. We can error and defer it to the second phase or see the solution I suggest.
ALTER SCHEMA tenant1 RENAME TO <NEW_SCHEMA_NAME>;
ALTER TABLE tenant1.test1 SET SCHEMA <NEW_SCHEMA_NAME>;
|
Scenarios like 2) will be tackled in a separate PR. |
We mark objects as distributed objects in Citus metadata only if we need to propagate given the command that creates it to worker nodes. For this reason, we were not doing this for the objects that are created while pg_dist_node is empty. One implication of doing so is that we defer the schema propagation to the time when user creates the first distributed table in the schema. However, this doesn't help for schema-based sharding (#6866) because we want to sync pg_dist_tenant_schema to the worker nodes even for empty schemas too. * Support test dependencies for isolation tests without a schedule * Comment out a test due to a known issue (#6901) * Also, reduce the verbosity for some log messages and make some tests compatible with run_test.py.
@marcocitus, I looked into special variants of CREATE TABLE / CREATE SCHEMA commands and made some changes accordingly.
However, we do not allow the following;
|
DESCRIPTION: Adds citus.enable_schema_based_sharding GUC that allows sharding the database based on schemas when enabled.
Refactor the logic that automatically creates Citus managed tables
Refactor CreateSingleShardTable() to allow specifying colocation id instead
Add support for schema-based-sharding via a GUC
What this PR is about:
Add citus.enable_schema_based_sharding GUC to enable schema-based
sharding. Each schema created while this GUC is ON will be considered
as a tenant schema. Later on, regardless of whether the GUC is ON or
OFF, any table created in a tenant schema will be converted to a
single shard distributed table (without a shard key). All the tenant
tables that belong to a particular schema will be co-located with each
other and will have a shard count of 1.
We introduce a new metadata table --pg_dist_tenant_schema-- to do the
bookkeeping for tenant schemas:
Colocation id column of pg_dist_tenant_schema can never be NULL even
for the tenant schemas that don't have a tenant table yet. This is
because, we assign colocation ids to tenant schemas as soon as they
are created. That way, we can keep associating tenant schemas with
particular colocation groups even if all the tenant tables of a tenant
schema are dropped and recreated later on.
When a tenant schema is dropped, we delete the corresponding row from
pg_dist_tenant_schema. In that case, we delete the corresponding
colocation group from pg_dist_colocation as well.
Future work for 12.0 release:
We're building schema-based sharding on top of the infrastructure that
adds support for creating distributed tables without a shard key (#6867).
However, not all the operations that can be done on distributed tables
without a shard key necessarily make sense (in the same way) in the
context of schema-based sharding. For example, we need to think about
what happens if user attempts altering schema of a tenant table. We
will tackle such scenarios in a future PR.
We will also add a new UDF --citus.schema_tenant_set() or such-- to
allow users to use an existing schema as a tenant schema, and another
one --citus.schema_tenant_unset() or such-- to stop using a schema as
a tenant schema in future PRs.