Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not undistribute local table added to metadata when creating reference or distributed table #4692

Closed
onurctirtir opened this issue Feb 11, 2021 · 0 comments · Fixed by #7131

Comments

@onurctirtir
Copy link
Member

With #4489, we enabled creating reference tables and distributed tables from local tables added to metadata.
When doing that, we first undistribute local table and then follow the logic in CreateDistributedTable.

Instead,

  1. When creating reference table, we could maybe just update local table's metadata so that Citus identifies the table as a reference table and just call EnsureReferenceTablesExistOnAllNodes

  2. When creating distributed table, we could just update local table's metadata so that Citus identifies the table as a distributed table and then we could split local table's single shard to shard_count
    (Maybe we can use SplitShardByValue from enterprise ?)

  3. 1 & 2 are more like short/mid term improvements. As a longer term improvement, implementing non-blocking shard-split would even be a greater improvement for 2.

onurctirtir added a commit that referenced this issue Aug 29, 2023
…table / single-shard table (#7131)

Replaces #7120.
Closes #4692.

#7120 added the same functionality by implementing a transactional
--but scoped to Citus local tables-- version of TransferShards().
It was passing all the regression tests but didn't feel like an
intuitive approach.

This PR instead adds that functionality via the functions that we
use when creating a distributed table, namely, CreateShardsOnWorkers()
and CopyLocalDataIntoShards().

We insert entries into pg_dist_placement for the new shard placement(s)
and then call CreateShardsOnWorkers() to create those placement(s) on
workers.

Then we use CopyFromLocalTableIntoDistTable() to copy the data from
the local shard placement to the new shard placement(s).
CopyFromLocalTableIntoDistTable() is a new function that re-uses the
underlying logic of CopyLocalDataIntoShards() that allows copying
data from a local table into a distributed table. We tell
CopyLocalDataIntoShards() to read from local shard placement table
and to write the tuples into shard placement/s of the reference /
single-shard table. Before doing this, we temporarily delete metadata
record for the local placement to avoid from duplicating the data in
the local shard placement.

Finally, we drop the local shard placement if we were creating a
single-shard placement table and that effectively means moving the
local shard placement to the appropriate worker as we've already
created the new shard placement on the worker.

While the main motivation behind adding this functionality is to
avoid from the limitations when UndistributeTable() is called for
a Citus local table (during table conversion), this indeed optimizes
how we convert a Citus local table to a reference table /
single-shard table. This is because, the prior logic was causing
to use more disk space due to the duplication of the data during
UndistributeTable().

DESCRIPTION: Allow creating reference / distributed-schema tables from
local tables added to metadata and that use identity columns

- [x] Add tests.
- [x] Test django-tenants.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant