Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug add-dc [DO NOT MERGE] #324

Closed
wants to merge 18 commits into from
Closed

Conversation

jsanda
Copy link
Contributor

@jsanda jsanda commented Jan 26, 2022

A debug PR for a test failure in #262 that I cannot reproduce locally.

jsanda added 18 commits January 25, 2022 11:06
Summary:
* Add k8ssandra.io/rebuild to CassandraDatacenter when rebuild required
* Use Initialized condition to check if dc is being added to existing cluster
* Add RBAC annotations for CassandraTasks
* Add integration tests for adding dc to existing cluster
    * Add new set of subtests that use an existing cluster as test fixture
* Create CassandraTask for rebuild job
* Update logic for computing replication factor
* Add support for working with arbitrary number of kind clusters
* Update replication of system keyspaces
* Update replication of user keyspaces
    * Use k8ssandra.io/dc-replication annotation
* Update replication of stargate auth and reaper keyspaces

Details:
In Cassandra 4 you cannot declare a non-existent dc in the replication
strategy. If we are creating a K8ssandraCluster with 2 DCs, dc1 and dc2, for
example, we can only declare replicas for dc1 initially. Only after dc2 is
added to the C* cluster can we specify replicas for it.

The cassandra.system_distributed_replication_dc_names and
cassandra.system_distributed_replication_per_dc Java system properties are kind
of a backdoor via the management-api that do allow us to specify non-existent
DCs for system keysapces but only on the initial cluster creation.

The GetDatacentersForReplication function is used for system, stargate, reaper,
and user keyspaces to determine which DCs should be included for the replication.
If the cluster is already initialized then only the DCs that are already part of
the cluster are included.

When adding a new dc replication for user keyspaces is specified via the
k8ssanda.io/dc-replication annotation. If not specified, no replication changes
are made for user keyspaces. If specified, all user keyspaces must be specified.
If you don't want to replicate a particular keyspace, then specify a value of
zero.

Reconcile Stargate auth and Reaper keyspaces after reconciling each dc. This
change is needed to handle rebuild and decommission scenarios. See
k8ssandra#262 (comment)
for a detailed explanation on why the changes are necessary.
Previously we only set the default superuser secret name in memory and did not
persist it. The version check patches that I implemented caused a problem with
that. The setting is lost after the first patch is applied. It makes more sense
to just persist the default.
To date we have relied on setting a couple system properties to configure the
replication of system keyspaces. As I added support for managing replication
of reaper and stargate auth keyspaces, I attempted to consolidate how they are
managed. It makes sense they basically need to be managed in the same way. I
had to undo those changes though until we implement support for running repairs
after replication changes.

If we configure the replication for system_auth with client CQL calls from the
operator instead of back door in the management api, then we need to run a
repair on system_auth when a second dc is added to the cluster; otherwise,
any queries against nodes in the second dc will fail. This applies when
we are deploying a new cluster as well as when adding a dc to an existing
cluster.

This commit also updates the version of cass-operator now that the
CassandraTask API has landed in master in cass-operator.
@jsanda jsanda requested a review from a team as a code owner January 26, 2022 03:39
@jsanda jsanda closed this Jan 26, 2022
@jsanda jsanda deleted the debug-add-dc branch January 26, 2022 05:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant