-
Notifications
You must be signed in to change notification settings - Fork 684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
This implements MERGE phase-III #6696
Conversation
d12e429
to
55db1f9
Compare
Codecov Report
@@ Coverage Diff @@
## main #6696 +/- ##
==========================================
- Coverage 93.21% 93.16% -0.05%
==========================================
Files 259 260 +1
Lines 55926 56073 +147
==========================================
+ Hits 52131 52242 +111
- Misses 3795 3831 +36 |
dd8ef3b
to
8b4c7da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think directionally pushdown planner sounds the first right step. My current feedback is mostly around cleaning-up the PR (e.g., create a new file, open small PRs to main branch if possible etc).
Also, we should consider allowing recursive planning on the "source" part of the merge command, which makes this a lot more flexible.
src/backend/distributed/planner/relation_restriction_equivalence.c
Outdated
Show resolved
Hide resolved
50d63d8
to
62a9698
Compare
@tejeswarm can you also consider getting this in with some tests: #6673 |
f933085
to
c0e8912
Compare
264c83d
to
d75fab3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are getting closer, I need to do one more round of review, I have not checked the tests yet
ef0de1a
to
bfaba94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my only main comment left is about function evaluation. I'd prefer to be a lot more restrictive (at least with this PR) and focus on that later.
d9b3b3c
to
d53abdb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably squash few of the commits, like:
1 and 2 into a single commit
5 and 6 into the first commit.
47c7933
to
70e16df
Compare
This implements the phase - II of MERGE sql support Support routable query where all the tables in the merge-sql are distributed, co-located, and both the source and target relations are joined on the distribution column with a constant qual. This should be a Citus single-task query. Below is an example. SELECT create_distributed_table('t1', 'id'); SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’); MERGE INTO t1 USING s1 ON t1.id = s1.id AND t1.id = 100 WHEN MATCHED THEN UPDATE SET val = s1.val + 10 WHEN MATCHED THEN DELETE WHEN NOT MATCHED THEN INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src) Basically, MERGE checks to see if There are a minimum of two distributed tables (source and a target). All the distributed tables are indeed colocated. MERGE relations are joined on the distribution column MERGE .. USING .. ON target.dist_key = source.dist_key The query should touch only a single shard i.e. JOIN AND with a constant qual MERGE .. USING .. ON target.dist_key = source.dist_key AND target.dist_key = <> If any of the conditions are not met, it raises an exception. (cherry picked from commit 44c387b) This implements MERGE phase3 Support pushdown query where all the tables in the merge-sql are Citus-distributed, co-located, and both the source and target relations are joined on the distribution column. This will generate multiple tasks which execute independently after pushdown. SELECT create_distributed_table('t1', 'id'); SELECT create_distributed_table('s1', 'id', colocate_with => ‘t1’); MERGE INTO t1 USING s1 ON t1.id = s1.id WHEN MATCHED THEN UPDATE SET val = s1.val + 10 WHEN MATCHED THEN DELETE WHEN NOT MATCHED THEN INSERT (id, val, src) VALUES (s1.id, s1.val, s1.src) *The only exception for both the phases II and III is, UPDATEs and INSERTs must be done on the same shard-group as the joined key; for example, below scenarios are NOT supported as the key-value to be inserted/updated is not guaranteed to be on the same node as the id distribution-column. MERGE INTO target t USING source s ON (t.customer_id = s.customer_id) WHEN NOT MATCHED THEN - - INSERT(customer_id, …) VALUES (<non-local-constant-key-value>, ……); OR this scenario where we update the distribution column itself MERGE INTO target t USING source s On (t.customer_id = s.customer_id) WHEN MATCHED THEN UPDATE SET customer_id = 100; (cherry picked from commit fa7b894)
70e16df
to
3746e40
Compare
Fixes #6672 2) Move all MERGE related routines to a new file merge_planner.c 3) Make ConjunctionContainsColumnFilter() static again, and rearrange the code in MergeQuerySupported() 4) Restore the original format in the comments section. 5) Add big serial test. Implement latest set of comments
3746e40
to
37a26d5
Compare
In this release, I tried something different. I experimented with adding the PR number and title to the changelog right before each changelog entry. This way, it is easier to track where a particular changelog entry comes from. After reviews are over, I plan to remove those lines with PR numbers and titles. I went through all the PRs that are merged after 11.2.0 release and came up with a list of PRs that may need help with changelog entries. You can see details on PRs grouped in several sections below. ## PRs with missing entries The following PRs below do not have a changelog entry. If you think that this is a mistake, please share it in this PR along with a suggestion on what the changelog item should be. PR #6846 : fix 3 flaky tests in failure schedule PR #6844 : Add CPU usage to citus_stat_tenants PR #6833 : Fix citus_stat_tenants period updating bug PR #6787 : Add more tests for ddl coverage PR #6842 : Add build-cdc-* temporary directories to .gitignore PR #6841 : Add build-cdc-* temporary directories to .gitignore PR #6840 : Bump Citus to 12.0devel PR #6824 : Fixes flakiness in multi_metadata_sync test PR #6811 : Backport identity column improvements to v11.2 PR #6830 : In run_test.py actually return worker_count PR #6825 : Fixes flakiness in multi_cluster_management test PR #6816 : Refactor run_test.py PR #6817 : Explicitly disallow local rels when inserting into dist table PR #6821 : Rename citus stats tenants PR #6822 : Add some more tests for initial sql support PR #6819 : Fix flakyness in citus_split_shard_by_split_points_deferred_drop PR #6814 : Make python-regress based tests runnable with run_test.py PR #6813 : Fix flaky multi_mx_schema_support test PR #6720 : Convert columnar tap tests to pytest PR #6812 : Revoke statistics permissions from public and grant them to pg_monitor PR #6769 : Citus stats tenants guc PR #6807 : Fix the incorrect (constant) value passed to pointer-to-bool parameter, pass a NULL as the value is not used PR #6797 : Attribute local queries and cached plans on local execution PR #6796 : Parse the annotation string correctly PR #6762 : Add logs to citus_stats_tenants PR #6773 : Add initial sql support for distributed tables that don't have a shard key PR #6792 : Disentangle MERGE planning code from the modify-planning code path PR #6761 : Citus stats tenants collector view PR #6791 : Make 8 more tests runnable multiple times via run_test.py PR #6786 : Refactor some of the planning code to accommodate a new planning path for MERGE SQL PR #6789 : Rename AllRelations.. functions to AllDistributedRelations.. PR #6788 : Actually skip arbitrary_configs_router & nested_execution for AllNullDistKeyDefaultConfig PR #6783 : Add a config for arbitrary config tests where all the tables are null-shard-key tables PR #6784 : Fix attach partition: citus local to null distributed PR #6782 : Add an arbitrary config test heavily based on multi_router_planner_fast_path.sql PR #6781 : Decide what to do with router planner error at one place PR #6778 : Support partitioning for dist tables with null dist keys PR #6766 : fix pip lock file PR #6764 : Make workerCount configurable for regression tests PR #6745 : Add support for creating distributed tables with a null shard key PR #6696 : This implements MERGE phase-III PR #6767 : Add pytest depedencies to Pipfile PR #6760 : Decide core distribution params in CreateCitusTable PR #6759 : Add multi_create_fdw into minimal_schedule PR #6743 : Replace CITUS_TABLE_WITH_NO_DIST_KEY checks with HasDistributionKey() PR #6751 : Stabilize single_node.sql and others that report illegal node removal PR #6742 : Refactor CreateDistributedTable() PR #6747 : Remove unused lock functions PR #6744 : Fix multiple output version arbitrary config tests PR #6741 : Stabilize single node tests PR #6740 : Fix string eval bug in migration files check PR #6736 : Make run_test.py and create_test.py importable without errors PR #6734 : Don't blanket ignore flake8 E402 error PR #6737 : Fixes bookworm packaging pipeline problem PR #6735 : Fix run_test.py on python 3.9 PR #6733 : MERGE: In deparser, add missing check for RETURNING clause. PR #6714 : Remove auto_explain workaround in citus explain hook for ALTER TABLE PR #6719 : Fix flaky test PR #6718 : Add more powerfull dependency tracking to run_test.py PR #6710 : Install non-vulnerable cryptography package PR #6711 : Support compilation and run tests on latest PG versions PR #6700 : Add auto-formatting and linting to our python code PR #6707 : Allow multi_insert_select to run repeatably PR #6708 : Fix flakyness in failure_create_distributed_table_non_empty PR #6698 : Miscellaneous cleanup PR #6704 : Update README for 11.2 PR #6703 : Fix dubious ownership error from git PR #6690 : Bump Citus to 11.3devel ## Too long changelog entries The following PRs have changelog entries that are too long to fit in a single line. I'd expect authors to supply at changelog entries in `DESCRIPTION:` lines that are at most 78 characters. If you want to supply multi-line changelog items, you can have multiple lines that start with `DESCRIPTION:` instead. PR #6837 : fixes update propagation bug when `citus_set_coordinator_host` is called more than once PR #6738 : Identity column implementation refactorings PR #6756 : Schedule parallel shard moves in background rebalancer by removing task dependencies between shard moves across colocation groups. PR #6793 : Add a GUC to disallow planning the queries that reference non-colocated tables via router planner PR #6726 : fix memory leak during altering distributed table with a lot of partition and shards PR #6722 : fix memory leak during distribution of a table with a lot of partitions PR #6693 : prevent memory leak during ConvertTable with a lot of partitions ## Empty changelog entries. The following PR had an empty `DESCRIPTION:` line. This generates an empty changelog line that needs to be removed manually. Please either provide a short entry, or remove `DESCRIPTION:` line completely. PR #6810 : Make CDC decoder an independent extension PR #6827 : Makefile changes to build CDC in builddir for pgoutput and wal2json. --------- Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
In this release, I tried something different. I experimented with adding the PR number and title to the changelog right before each changelog entry. This way, it is easier to track where a particular changelog entry comes from. After reviews are over, I plan to remove those lines with PR numbers and titles. I went through all the PRs that are merged after 11.2.0 release and came up with a list of PRs that may need help with changelog entries. You can see details on PRs grouped in several sections below. The following PRs below do not have a changelog entry. If you think that this is a mistake, please share it in this PR along with a suggestion on what the changelog item should be. PR #6846 : fix 3 flaky tests in failure schedule PR #6844 : Add CPU usage to citus_stat_tenants PR #6833 : Fix citus_stat_tenants period updating bug PR #6787 : Add more tests for ddl coverage PR #6842 : Add build-cdc-* temporary directories to .gitignore PR #6841 : Add build-cdc-* temporary directories to .gitignore PR #6840 : Bump Citus to 12.0devel PR #6824 : Fixes flakiness in multi_metadata_sync test PR #6811 : Backport identity column improvements to v11.2 PR #6830 : In run_test.py actually return worker_count PR #6825 : Fixes flakiness in multi_cluster_management test PR #6816 : Refactor run_test.py PR #6817 : Explicitly disallow local rels when inserting into dist table PR #6821 : Rename citus stats tenants PR #6822 : Add some more tests for initial sql support PR #6819 : Fix flakyness in citus_split_shard_by_split_points_deferred_drop PR #6814 : Make python-regress based tests runnable with run_test.py PR #6813 : Fix flaky multi_mx_schema_support test PR #6720 : Convert columnar tap tests to pytest PR #6812 : Revoke statistics permissions from public and grant them to pg_monitor PR #6769 : Citus stats tenants guc PR #6807 : Fix the incorrect (constant) value passed to pointer-to-bool parameter, pass a NULL as the value is not used PR #6797 : Attribute local queries and cached plans on local execution PR #6796 : Parse the annotation string correctly PR #6762 : Add logs to citus_stats_tenants PR #6773 : Add initial sql support for distributed tables that don't have a shard key PR #6792 : Disentangle MERGE planning code from the modify-planning code path PR #6761 : Citus stats tenants collector view PR #6791 : Make 8 more tests runnable multiple times via run_test.py PR #6786 : Refactor some of the planning code to accommodate a new planning path for MERGE SQL PR #6789 : Rename AllRelations.. functions to AllDistributedRelations.. PR #6788 : Actually skip arbitrary_configs_router & nested_execution for AllNullDistKeyDefaultConfig PR #6783 : Add a config for arbitrary config tests where all the tables are null-shard-key tables PR #6784 : Fix attach partition: citus local to null distributed PR #6782 : Add an arbitrary config test heavily based on multi_router_planner_fast_path.sql PR #6781 : Decide what to do with router planner error at one place PR #6778 : Support partitioning for dist tables with null dist keys PR #6766 : fix pip lock file PR #6764 : Make workerCount configurable for regression tests PR #6745 : Add support for creating distributed tables with a null shard key PR #6696 : This implements MERGE phase-III PR #6767 : Add pytest depedencies to Pipfile PR #6760 : Decide core distribution params in CreateCitusTable PR #6759 : Add multi_create_fdw into minimal_schedule PR #6743 : Replace CITUS_TABLE_WITH_NO_DIST_KEY checks with HasDistributionKey() PR #6751 : Stabilize single_node.sql and others that report illegal node removal PR #6742 : Refactor CreateDistributedTable() PR #6747 : Remove unused lock functions PR #6744 : Fix multiple output version arbitrary config tests PR #6741 : Stabilize single node tests PR #6740 : Fix string eval bug in migration files check PR #6736 : Make run_test.py and create_test.py importable without errors PR #6734 : Don't blanket ignore flake8 E402 error PR #6737 : Fixes bookworm packaging pipeline problem PR #6735 : Fix run_test.py on python 3.9 PR #6733 : MERGE: In deparser, add missing check for RETURNING clause. PR #6714 : Remove auto_explain workaround in citus explain hook for ALTER TABLE PR #6719 : Fix flaky test PR #6718 : Add more powerfull dependency tracking to run_test.py PR #6710 : Install non-vulnerable cryptography package PR #6711 : Support compilation and run tests on latest PG versions PR #6700 : Add auto-formatting and linting to our python code PR #6707 : Allow multi_insert_select to run repeatably PR #6708 : Fix flakyness in failure_create_distributed_table_non_empty PR #6698 : Miscellaneous cleanup PR #6704 : Update README for 11.2 PR #6703 : Fix dubious ownership error from git PR #6690 : Bump Citus to 11.3devel The following PRs have changelog entries that are too long to fit in a single line. I'd expect authors to supply at changelog entries in `DESCRIPTION:` lines that are at most 78 characters. If you want to supply multi-line changelog items, you can have multiple lines that start with `DESCRIPTION:` instead. PR #6837 : fixes update propagation bug when `citus_set_coordinator_host` is called more than once PR #6738 : Identity column implementation refactorings PR #6756 : Schedule parallel shard moves in background rebalancer by removing task dependencies between shard moves across colocation groups. PR #6793 : Add a GUC to disallow planning the queries that reference non-colocated tables via router planner PR #6726 : fix memory leak during altering distributed table with a lot of partition and shards PR #6722 : fix memory leak during distribution of a table with a lot of partitions PR #6693 : prevent memory leak during ConvertTable with a lot of partitions The following PR had an empty `DESCRIPTION:` line. This generates an empty changelog line that needs to be removed manually. Please either provide a short entry, or remove `DESCRIPTION:` line completely. PR #6810 : Make CDC decoder an independent extension PR #6827 : Makefile changes to build CDC in builddir for pgoutput and wal2json. --------- Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com> (cherry picked from commit 9344300)
In this release, I tried something different. I experimented with adding the PR number and title to the changelog right before each changelog entry. This way, it is easier to track where a particular changelog entry comes from. After reviews are over, I plan to remove those lines with PR numbers and titles. I went through all the PRs that are merged after 11.2.0 release and came up with a list of PRs that may need help with changelog entries. You can see details on PRs grouped in several sections below. The following PRs below do not have a changelog entry. If you think that this is a mistake, please share it in this PR along with a suggestion on what the changelog item should be. PR #6846 : fix 3 flaky tests in failure schedule PR #6844 : Add CPU usage to citus_stat_tenants PR #6833 : Fix citus_stat_tenants period updating bug PR #6787 : Add more tests for ddl coverage PR #6842 : Add build-cdc-* temporary directories to .gitignore PR #6841 : Add build-cdc-* temporary directories to .gitignore PR #6840 : Bump Citus to 12.0devel PR #6824 : Fixes flakiness in multi_metadata_sync test PR #6811 : Backport identity column improvements to v11.2 PR #6830 : In run_test.py actually return worker_count PR #6825 : Fixes flakiness in multi_cluster_management test PR #6816 : Refactor run_test.py PR #6817 : Explicitly disallow local rels when inserting into dist table PR #6821 : Rename citus stats tenants PR #6822 : Add some more tests for initial sql support PR #6819 : Fix flakyness in citus_split_shard_by_split_points_deferred_drop PR #6814 : Make python-regress based tests runnable with run_test.py PR #6813 : Fix flaky multi_mx_schema_support test PR #6720 : Convert columnar tap tests to pytest PR #6812 : Revoke statistics permissions from public and grant them to pg_monitor PR #6769 : Citus stats tenants guc PR #6807 : Fix the incorrect (constant) value passed to pointer-to-bool parameter, pass a NULL as the value is not used PR #6797 : Attribute local queries and cached plans on local execution PR #6796 : Parse the annotation string correctly PR #6762 : Add logs to citus_stats_tenants PR #6773 : Add initial sql support for distributed tables that don't have a shard key PR #6792 : Disentangle MERGE planning code from the modify-planning code path PR #6761 : Citus stats tenants collector view PR #6791 : Make 8 more tests runnable multiple times via run_test.py PR #6786 : Refactor some of the planning code to accommodate a new planning path for MERGE SQL PR #6789 : Rename AllRelations.. functions to AllDistributedRelations.. PR #6788 : Actually skip arbitrary_configs_router & nested_execution for AllNullDistKeyDefaultConfig PR #6783 : Add a config for arbitrary config tests where all the tables are null-shard-key tables PR #6784 : Fix attach partition: citus local to null distributed PR #6782 : Add an arbitrary config test heavily based on multi_router_planner_fast_path.sql PR #6781 : Decide what to do with router planner error at one place PR #6778 : Support partitioning for dist tables with null dist keys PR #6766 : fix pip lock file PR #6764 : Make workerCount configurable for regression tests PR #6745 : Add support for creating distributed tables with a null shard key PR #6696 : This implements MERGE phase-III PR #6767 : Add pytest depedencies to Pipfile PR #6760 : Decide core distribution params in CreateCitusTable PR #6759 : Add multi_create_fdw into minimal_schedule PR #6743 : Replace CITUS_TABLE_WITH_NO_DIST_KEY checks with HasDistributionKey() PR #6751 : Stabilize single_node.sql and others that report illegal node removal PR #6742 : Refactor CreateDistributedTable() PR #6747 : Remove unused lock functions PR #6744 : Fix multiple output version arbitrary config tests PR #6741 : Stabilize single node tests PR #6740 : Fix string eval bug in migration files check PR #6736 : Make run_test.py and create_test.py importable without errors PR #6734 : Don't blanket ignore flake8 E402 error PR #6737 : Fixes bookworm packaging pipeline problem PR #6735 : Fix run_test.py on python 3.9 PR #6733 : MERGE: In deparser, add missing check for RETURNING clause. PR #6714 : Remove auto_explain workaround in citus explain hook for ALTER TABLE PR #6719 : Fix flaky test PR #6718 : Add more powerfull dependency tracking to run_test.py PR #6710 : Install non-vulnerable cryptography package PR #6711 : Support compilation and run tests on latest PG versions PR #6700 : Add auto-formatting and linting to our python code PR #6707 : Allow multi_insert_select to run repeatably PR #6708 : Fix flakyness in failure_create_distributed_table_non_empty PR #6698 : Miscellaneous cleanup PR #6704 : Update README for 11.2 PR #6703 : Fix dubious ownership error from git PR #6690 : Bump Citus to 11.3devel The following PRs have changelog entries that are too long to fit in a single line. I'd expect authors to supply at changelog entries in `DESCRIPTION:` lines that are at most 78 characters. If you want to supply multi-line changelog items, you can have multiple lines that start with `DESCRIPTION:` instead. PR #6837 : fixes update propagation bug when `citus_set_coordinator_host` is called more than once PR #6738 : Identity column implementation refactorings PR #6756 : Schedule parallel shard moves in background rebalancer by removing task dependencies between shard moves across colocation groups. PR #6793 : Add a GUC to disallow planning the queries that reference non-colocated tables via router planner PR #6726 : fix memory leak during altering distributed table with a lot of partition and shards PR #6722 : fix memory leak during distribution of a table with a lot of partitions PR #6693 : prevent memory leak during ConvertTable with a lot of partitions The following PR had an empty `DESCRIPTION:` line. This generates an empty changelog line that needs to be removed manually. Please either provide a short entry, or remove `DESCRIPTION:` line completely. PR #6810 : Make CDC decoder an independent extension PR #6827 : Makefile changes to build CDC in builddir for pgoutput and wal2json. --------- Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com> (cherry picked from commit 9344300)
In this release, I tried something different. I experimented with adding the PR number and title to the changelog right before each changelog entry. This way, it is easier to track where a particular changelog entry comes from. After reviews are over, I plan to remove those lines with PR numbers and titles. I went through all the PRs that are merged after 11.2.0 release and came up with a list of PRs that may need help with changelog entries. You can see details on PRs grouped in several sections below. ## PRs with missing entries The following PRs below do not have a changelog entry. If you think that this is a mistake, please share it in this PR along with a suggestion on what the changelog item should be. PR #6846 : fix 3 flaky tests in failure schedule PR #6844 : Add CPU usage to citus_stat_tenants PR #6833 : Fix citus_stat_tenants period updating bug PR #6787 : Add more tests for ddl coverage PR #6842 : Add build-cdc-* temporary directories to .gitignore PR #6841 : Add build-cdc-* temporary directories to .gitignore PR #6840 : Bump Citus to 12.0devel PR #6824 : Fixes flakiness in multi_metadata_sync test PR #6811 : Backport identity column improvements to v11.2 PR #6830 : In run_test.py actually return worker_count PR #6825 : Fixes flakiness in multi_cluster_management test PR #6816 : Refactor run_test.py PR #6817 : Explicitly disallow local rels when inserting into dist table PR #6821 : Rename citus stats tenants PR #6822 : Add some more tests for initial sql support PR #6819 : Fix flakyness in citus_split_shard_by_split_points_deferred_drop PR #6814 : Make python-regress based tests runnable with run_test.py PR #6813 : Fix flaky multi_mx_schema_support test PR #6720 : Convert columnar tap tests to pytest PR #6812 : Revoke statistics permissions from public and grant them to pg_monitor PR #6769 : Citus stats tenants guc PR #6807 : Fix the incorrect (constant) value passed to pointer-to-bool parameter, pass a NULL as the value is not used PR #6797 : Attribute local queries and cached plans on local execution PR #6796 : Parse the annotation string correctly PR #6762 : Add logs to citus_stats_tenants PR #6773 : Add initial sql support for distributed tables that don't have a shard key PR #6792 : Disentangle MERGE planning code from the modify-planning code path PR #6761 : Citus stats tenants collector view PR #6791 : Make 8 more tests runnable multiple times via run_test.py PR #6786 : Refactor some of the planning code to accommodate a new planning path for MERGE SQL PR #6789 : Rename AllRelations.. functions to AllDistributedRelations.. PR #6788 : Actually skip arbitrary_configs_router & nested_execution for AllNullDistKeyDefaultConfig PR #6783 : Add a config for arbitrary config tests where all the tables are null-shard-key tables PR #6784 : Fix attach partition: citus local to null distributed PR #6782 : Add an arbitrary config test heavily based on multi_router_planner_fast_path.sql PR #6781 : Decide what to do with router planner error at one place PR #6778 : Support partitioning for dist tables with null dist keys PR #6766 : fix pip lock file PR #6764 : Make workerCount configurable for regression tests PR #6745 : Add support for creating distributed tables with a null shard key PR #6696 : This implements MERGE phase-III PR #6767 : Add pytest depedencies to Pipfile PR #6760 : Decide core distribution params in CreateCitusTable PR #6759 : Add multi_create_fdw into minimal_schedule PR #6743 : Replace CITUS_TABLE_WITH_NO_DIST_KEY checks with HasDistributionKey() PR #6751 : Stabilize single_node.sql and others that report illegal node removal PR #6742 : Refactor CreateDistributedTable() PR #6747 : Remove unused lock functions PR #6744 : Fix multiple output version arbitrary config tests PR #6741 : Stabilize single node tests PR #6740 : Fix string eval bug in migration files check PR #6736 : Make run_test.py and create_test.py importable without errors PR #6734 : Don't blanket ignore flake8 E402 error PR #6737 : Fixes bookworm packaging pipeline problem PR #6735 : Fix run_test.py on python 3.9 PR #6733 : MERGE: In deparser, add missing check for RETURNING clause. PR #6714 : Remove auto_explain workaround in citus explain hook for ALTER TABLE PR #6719 : Fix flaky test PR #6718 : Add more powerfull dependency tracking to run_test.py PR #6710 : Install non-vulnerable cryptography package PR #6711 : Support compilation and run tests on latest PG versions PR #6700 : Add auto-formatting and linting to our python code PR #6707 : Allow multi_insert_select to run repeatably PR #6708 : Fix flakyness in failure_create_distributed_table_non_empty PR #6698 : Miscellaneous cleanup PR #6704 : Update README for 11.2 PR #6703 : Fix dubious ownership error from git PR #6690 : Bump Citus to 11.3devel ## Too long changelog entries The following PRs have changelog entries that are too long to fit in a single line. I'd expect authors to supply at changelog entries in `DESCRIPTION:` lines that are at most 78 characters. If you want to supply multi-line changelog items, you can have multiple lines that start with `DESCRIPTION:` instead. PR #6837 : fixes update propagation bug when `citus_set_coordinator_host` is called more than once PR #6738 : Identity column implementation refactorings PR #6756 : Schedule parallel shard moves in background rebalancer by removing task dependencies between shard moves across colocation groups. PR #6793 : Add a GUC to disallow planning the queries that reference non-colocated tables via router planner PR #6726 : fix memory leak during altering distributed table with a lot of partition and shards PR #6722 : fix memory leak during distribution of a table with a lot of partitions PR #6693 : prevent memory leak during ConvertTable with a lot of partitions ## Empty changelog entries. The following PR had an empty `DESCRIPTION:` line. This generates an empty changelog line that needs to be removed manually. Please either provide a short entry, or remove `DESCRIPTION:` line completely. PR #6810 : Make CDC decoder an independent extension PR #6827 : Makefile changes to build CDC in builddir for pgoutput and wal2json. --------- Co-authored-by: Onur Tirtir <onurcantirtir@gmail.com>
DESCRIPTION: Support MERGE-SQL, if tables are distributed, co-located, joined on dist-column.
This primarily has 3 commits.
44c387b --> Phase-II commit, which was reverted (This commit was already reviewed and pushed)
4fe2a48 --> Phase-III commit to expand it to multi-shard
d12e429 --> Fixes a bug
Fixes: #6672 #6674 #6676
Phase-III
Support pushdown query where all the tables in the merge-sql are Citus-distributed, co-located, and both
the source and target relations are joined on the distribution column. This will generate multiple tasks
which execute independently after pushdown.
*The only exception for both the phases II and III is, UPDATEs and INSERTs must be done on the same shard-group
as the joined key; for example, below scenarios are NOT supported as the key-value to be inserted/updated is not
guaranteed to be on the same node as the id distribution-column.
OR this scenario where we update the distribution column itself
Subquery/CTE support: We have only very limited subquery/CTE support until we have recursive planner support for MERGE. Complex subqueries as source are not supported at this time, for example