Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add datacenter to existing cluster [K8SSAND-615][K8SSAND-784] #262

Merged
merged 11 commits into from
Jan 26, 2022

Conversation

jsanda
Copy link
Contributor

@jsanda jsanda commented Dec 17, 2021

What this PR does:

Which issue(s) this PR fixes:
Fixes #21

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated
  • CHANGELOG.md updated (not required for documentation PRs)
  • CLA Signed: DataStax CLA

@jsanda jsanda marked this pull request as ready for review January 6, 2022 05:37
@jsanda jsanda requested a review from a team as a code owner January 6, 2022 05:37
@jsanda
Copy link
Contributor Author

jsanda commented Jan 6, 2022

I think we need to change when we reconcile the Stargate auth and Reaper schemas. The schemas should be reconciled after each CassandraDatacenter is reconciled.

Here is how things are currently done in K8ssandraReconciler (this is in the main branch as well):

	var actualDcs []*cassdcapi.CassandraDatacenter
	if recResult, dcs := r.reconcileDatacenters(ctx, kc, kcLogger); recResult.Completed() {
		return recResult.Output()
	} else {
		actualDcs = dcs
	}

	kcLogger.Info("All dcs reconciled")

	if recResult := r.reconcileStargateAuthSchema(ctx, kc, actualDcs, kcLogger); recResult.Completed() {
		return recResult.Output()
	}

	if recResult := r.reconcileReaperSchema(ctx, kc, actualDcs, kcLogger); recResult.Completed() {
		return recResult.Output()
	}

	if recResult := r.reconcileStargateAndReaper(ctx, kc, actualDcs, kcLogger); recResult.Completed() {
		return recResult.Output()
	}

We first reconcile all CassandraDatacenters. We will requeue until all DCs are ready. Then we reconcile the Stargate auth and Reaper schemas. These keyspaces are created if necessary. Lastly, we reconcile Stargate and Reaper across all DCs. In other words, reconcileStargateAndReaper iterates through the CassandraDatacenters and adds/updates/removes Stargate and Reaper as necessary for each one.

There are a number of changes to the reconcileDatacenters method in my add-dc branch. I'll try to highlight the relevant parts and leave out the rest:

	for idx, dcTemplate := range kc.Spec.Cassandra.Datacenters {
                ...
		if err = remoteClient.Get(ctx, dcKey, actualDc); err == nil {
			...

                         logger.Info("The datacenter is ready")

			actualDcs = append(actualDcs, actualDc)

			if recResult := r.updateReplicationOfSystemKeyspaces(ctx, kc, desiredDc, remoteClient, logger); recResult.Completed() {
				return recResult, actualDcs
			}

			if rebuildNeeded {
				// TODO We need to handle the Stargate auth and Reaper keyspaces here.
				if recResult := r.updateUserKeyspacesReplication(ctx, kc, desiredDc, remoteClient, logger); recResult.Completed() {
					return recResult, actualDcs
				}

				if recResult := reconcileDcRebuild(ctx, actualDc, actualDcs, remoteClient, logger); recResult.Completed() {
					return recResult, actualDcs
				}
			}
		} else {
                        ...
		}
	}

After the current DC is ready, updateReplicationOfSystemKeyspaces is called. It updates replication for system keyspaces, e.g., system_auth, if necessary. For example, suppose the DC has been scaled up from 1 to 3 C* nodes. The replication factor would change to 3.

If rebuildNeeded is true, we call updateUserKeyspacesReplication to add replicas for the new DC. Lastly, we perform the rebuild by calling reconcileDcRebuild.

Note the earlier TODO about handling the Stargate and Reaper keyspaces. If Stargate and Reaper are already installed we need to update the replication for their keyspaces before doing the rebuild so that the keyspaces are included in the streaming. It is easy enough to add the necessary calls where the TODO is.

Now let's say we have an existing K8ssandraCluster that does not have Stargate and Reaper enabled. Suppose we update the K8ssandraCluster by adding a second DC and enabling Stargate and Reaper in each DC. With the code as is, the keyspaces won't get created until after the rebuild of dc2. I think we are ok in this case.

As a last example, consider removing a DC from the cluster. One of the first steps we perform in this process is to stop replicating to the DC. We can't do that if we reconcile the Stargate and Reaper keyspaces after reconciling all DCs.

Based on my findings I am going to move the calls to reconcile the Stargate and Reaper keyspaces to happen after each DC is ready.

@adutra any thoughts/concerns?

jsanda added a commit to jsanda/k8ssandra-operator that referenced this pull request Jan 7, 2022
…ate tests

See
k8ssandra#262 (comment)
for a detailed explanation on why the changes are necessary.
@adutra
Copy link
Contributor

adutra commented Jan 7, 2022

@jsanda sorry for the late reply. The suggested changes make sense and look good to me. I suppose many tests will fail because the order in which they wait for components to show up, but other than that, I don't see any red flags.

@jdonenine jdonenine changed the title Add datacenter to existing cluster Add datacenter to existing cluster [K8SSAND-615] Jan 10, 2022
@jdonenine jdonenine changed the title Add datacenter to existing cluster [K8SSAND-615] Add datacenter to existing cluster [K8SSAND-615][K8SSAND-784] Jan 10, 2022
jsanda added a commit to jsanda/k8ssandra-operator that referenced this pull request Jan 11, 2022
…ate tests

See
k8ssandra#262 (comment)
for a detailed explanation on why the changes are necessary.
jsanda added a commit to jsanda/k8ssandra-operator that referenced this pull request Jan 11, 2022
Summary:
* Add k8ssandra.io/rebuild to CassandraDatacenter when rebuild required
* Use Initialized condition to check if dc is being added to existing cluster
* Add RBAC annotations for CassandraTasks
* Add integration tests for adding dc to existing cluster
    * Add new set of subtests that use an existing cluster as test fixture
* Create CassandraTask for rebuild job
* Update logic for computing replication factor
* Add support for working with arbitrary number of kind clusters
* Update replication of system keyspaces
* Update replication of user keyspaces
    * Use k8ssandra.io/dc-replication annotation
* Update replication of stargate auth and reaper keyspaces

Details:
In Cassandra 4 you cannot declare a non-existent dc in the replication
strategy. If we are creating a K8ssandraCluster with 2 DCs, dc1 and dc2, for
example, we can only declare replicas for dc1 initially. Only after dc2 is
added to the C* cluster can we specify replicas for it.

The cassandra.system_distributed_replication_dc_names and
cassandra.system_distributed_replication_per_dc Java system properties are kind
of a backdoor via the management-api that do allow us to specify non-existent
DCs for system keysapces but only on the initial cluster creation.

The GetDatacentersForReplication function is used for system, stargate, reaper,
and user keyspaces to determine which DCs should be included for the replication.
If the cluster is already initialized then only the DCs that are already part of
the cluster are included.

When adding a new dc replication for user keyspaces is specified via the
k8ssanda.io/dc-replication annotation. If not specified, no replication changes
are made for user keyspaces. If specified, all user keyspaces must be specified.
If you don't want to replicate a particular keyspace, then specify a value of
zero.

Reconcile Stargate auth and Reaper keyspaces after reconciling each dc. This
change is needed to handle rebuild and decommission scenarios. See
k8ssandra#262 (comment)
for a detailed explanation on why the changes are necessary.
jsanda added a commit to jsanda/k8ssandra-operator that referenced this pull request Jan 12, 2022
Summary:
* Add k8ssandra.io/rebuild to CassandraDatacenter when rebuild required
* Use Initialized condition to check if dc is being added to existing cluster
* Add RBAC annotations for CassandraTasks
* Add integration tests for adding dc to existing cluster
    * Add new set of subtests that use an existing cluster as test fixture
* Create CassandraTask for rebuild job
* Update logic for computing replication factor
* Add support for working with arbitrary number of kind clusters
* Update replication of system keyspaces
* Update replication of user keyspaces
    * Use k8ssandra.io/dc-replication annotation
* Update replication of stargate auth and reaper keyspaces

Details:
In Cassandra 4 you cannot declare a non-existent dc in the replication
strategy. If we are creating a K8ssandraCluster with 2 DCs, dc1 and dc2, for
example, we can only declare replicas for dc1 initially. Only after dc2 is
added to the C* cluster can we specify replicas for it.

The cassandra.system_distributed_replication_dc_names and
cassandra.system_distributed_replication_per_dc Java system properties are kind
of a backdoor via the management-api that do allow us to specify non-existent
DCs for system keysapces but only on the initial cluster creation.

The GetDatacentersForReplication function is used for system, stargate, reaper,
and user keyspaces to determine which DCs should be included for the replication.
If the cluster is already initialized then only the DCs that are already part of
the cluster are included.

When adding a new dc replication for user keyspaces is specified via the
k8ssanda.io/dc-replication annotation. If not specified, no replication changes
are made for user keyspaces. If specified, all user keyspaces must be specified.
If you don't want to replicate a particular keyspace, then specify a value of
zero.

Reconcile Stargate auth and Reaper keyspaces after reconciling each dc. This
change is needed to handle rebuild and decommission scenarios. See
k8ssandra#262 (comment)
for a detailed explanation on why the changes are necessary.
@jdonenine jdonenine requested review from burmanm and removed request for a team January 12, 2022 15:11

createSuperuserSecret(ctx, t, f, kcKey, kc.Spec.Cassandra.Cluster)

createReplicatedSecret(ctx, t, f, kcKey, "cluster-0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When adding a new cluster, we don't actually have any replication rule updates. I don't see here if we actually verify that some secrets are even copied to new clustrs? It's not strictly necessary per-datacenter of course, since datacenter could be replicated to an existing cluster in another namespace, but I wonder if these kind of issues came to mind (at least in documentation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verifyReplicatedSecretReconciled is called in each test. It verifies that the ReplicationDone condition gets set.

}
}
} else {
datacenters = kc.Spec.Cassandra.Datacenters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly confused here, why do we want only Ready ones if initialized is true, but everything if Initialized is false? Feels unnatural why this path should ever be taken instead of just returning nothing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is this path taken when we initialize all the datacenters the first time K8ssandraCluster is created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is this path taken when we initialize all the datacenters the first time K8ssandraCluster is created?

short answer - yes

long answer...

We configure replication of system keyspaces by setting a couple system properties:

  • cassandra.system_distributed_replication_dc_names
  • cassandra.system_distributed_replication_per_dc

management-api looks for these properties and applies changes to the system_auth keyspace using internal Cassandra APIs, not CQL. If we are creating a cluster with 2 dcs, dc1 and dc2, both will be configured for system_auth. Note that what I am describing is already the current behavior in main.

In 4.0 Cassandra won't let you specify the replication for a keyspace that does not exist (at least through CQL). I had to introduce changes in this PR for configuring the replication of the Stargate auth and Reaper keyspaces. Those are configured via CQL calls. So going back to the 2 DCs example, when dc1 becomes ready we can create the keyspaces and specify replication for dc1 but not for dc2. Cassandra won't allow this. This resulted in different handling of system keyspaces vs other keyspaces.

Once I added support for handling non-system keyspaces, I attempted (in vain) to handle system keyspaces the same way to make the code less confusing and more maintainable. There are queries with a LOCAL_ONE consistency executed in dc2 that will fail with auth error because the replica(s) in dc2 need a repair.

In summary, until we implement support for running repairs, at least of system_auth, we have to handle system keyspaces the way we currently do. That means 1) for system keyspaces we need to specify replication for all DCs up front and 2) for non-system keyspaces we cannot specify the replication for all DCs up front.

I will rename the method and add some docs to make it more clear that it is only intended for system keyspaces.

jsanda added a commit to jsanda/k8ssandra-operator that referenced this pull request Jan 19, 2022
Summary:
* Add k8ssandra.io/rebuild to CassandraDatacenter when rebuild required
* Use Initialized condition to check if dc is being added to existing cluster
* Add RBAC annotations for CassandraTasks
* Add integration tests for adding dc to existing cluster
    * Add new set of subtests that use an existing cluster as test fixture
* Create CassandraTask for rebuild job
* Update logic for computing replication factor
* Add support for working with arbitrary number of kind clusters
* Update replication of system keyspaces
* Update replication of user keyspaces
    * Use k8ssandra.io/dc-replication annotation
* Update replication of stargate auth and reaper keyspaces

Details:
In Cassandra 4 you cannot declare a non-existent dc in the replication
strategy. If we are creating a K8ssandraCluster with 2 DCs, dc1 and dc2, for
example, we can only declare replicas for dc1 initially. Only after dc2 is
added to the C* cluster can we specify replicas for it.

The cassandra.system_distributed_replication_dc_names and
cassandra.system_distributed_replication_per_dc Java system properties are kind
of a backdoor via the management-api that do allow us to specify non-existent
DCs for system keysapces but only on the initial cluster creation.

The GetDatacentersForReplication function is used for system, stargate, reaper,
and user keyspaces to determine which DCs should be included for the replication.
If the cluster is already initialized then only the DCs that are already part of
the cluster are included.

When adding a new dc replication for user keyspaces is specified via the
k8ssanda.io/dc-replication annotation. If not specified, no replication changes
are made for user keyspaces. If specified, all user keyspaces must be specified.
If you don't want to replicate a particular keyspace, then specify a value of
zero.

Reconcile Stargate auth and Reaper keyspaces after reconciling each dc. This
change is needed to handle rebuild and decommission scenarios. See
k8ssandra#262 (comment)
for a detailed explanation on why the changes are necessary.
jsanda added a commit to jsanda/k8ssandra-operator that referenced this pull request Jan 21, 2022
Summary:
* Add k8ssandra.io/rebuild to CassandraDatacenter when rebuild required
* Use Initialized condition to check if dc is being added to existing cluster
* Add RBAC annotations for CassandraTasks
* Add integration tests for adding dc to existing cluster
    * Add new set of subtests that use an existing cluster as test fixture
* Create CassandraTask for rebuild job
* Update logic for computing replication factor
* Add support for working with arbitrary number of kind clusters
* Update replication of system keyspaces
* Update replication of user keyspaces
    * Use k8ssandra.io/dc-replication annotation
* Update replication of stargate auth and reaper keyspaces

Details:
In Cassandra 4 you cannot declare a non-existent dc in the replication
strategy. If we are creating a K8ssandraCluster with 2 DCs, dc1 and dc2, for
example, we can only declare replicas for dc1 initially. Only after dc2 is
added to the C* cluster can we specify replicas for it.

The cassandra.system_distributed_replication_dc_names and
cassandra.system_distributed_replication_per_dc Java system properties are kind
of a backdoor via the management-api that do allow us to specify non-existent
DCs for system keysapces but only on the initial cluster creation.

The GetDatacentersForReplication function is used for system, stargate, reaper,
and user keyspaces to determine which DCs should be included for the replication.
If the cluster is already initialized then only the DCs that are already part of
the cluster are included.

When adding a new dc replication for user keyspaces is specified via the
k8ssanda.io/dc-replication annotation. If not specified, no replication changes
are made for user keyspaces. If specified, all user keyspaces must be specified.
If you don't want to replicate a particular keyspace, then specify a value of
zero.

Reconcile Stargate auth and Reaper keyspaces after reconciling each dc. This
change is needed to handle rebuild and decommission scenarios. See
k8ssandra#262 (comment)
for a detailed explanation on why the changes are necessary.
Copy link
Contributor

@adutra adutra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second round of remarks :-)

controllers/k8ssandra/datacenters.go Show resolved Hide resolved
controllers/k8ssandra/datacenters.go Show resolved Hide resolved
controllers/k8ssandra/replication.go Outdated Show resolved Hide resolved
controllers/k8ssandra/replication.go Outdated Show resolved Hide resolved
controllers/k8ssandra/replication.go Outdated Show resolved Hide resolved
pkg/test/testenv.go Outdated Show resolved Hide resolved
controllers/k8ssandra/k8ssandracluster_controller_test.go Outdated Show resolved Hide resolved
controllers/k8ssandra/replication.go Outdated Show resolved Hide resolved
pkg/cassandra/datacenter.go Show resolved Hide resolved
pkg/cassandra/management.go Show resolved Hide resolved
pkg/test/mgmtapi.go Outdated Show resolved Hide resolved
controllers/k8ssandra/add_dc_test.go Outdated Show resolved Hide resolved
controllers/k8ssandra/add_dc_test.go Outdated Show resolved Hide resolved
@adutra
Copy link
Contributor

adutra commented Jan 25, 2022

Approving this as the e2e failures seem unrelated. A few minor questions are still inflight but can be revisited later. It would be good to have this merged asap.

Summary:
* Add k8ssandra.io/rebuild to CassandraDatacenter when rebuild required
* Use Initialized condition to check if dc is being added to existing cluster
* Add RBAC annotations for CassandraTasks
* Add integration tests for adding dc to existing cluster
    * Add new set of subtests that use an existing cluster as test fixture
* Create CassandraTask for rebuild job
* Update logic for computing replication factor
* Add support for working with arbitrary number of kind clusters
* Update replication of system keyspaces
* Update replication of user keyspaces
    * Use k8ssandra.io/dc-replication annotation
* Update replication of stargate auth and reaper keyspaces

Details:
In Cassandra 4 you cannot declare a non-existent dc in the replication
strategy. If we are creating a K8ssandraCluster with 2 DCs, dc1 and dc2, for
example, we can only declare replicas for dc1 initially. Only after dc2 is
added to the C* cluster can we specify replicas for it.

The cassandra.system_distributed_replication_dc_names and
cassandra.system_distributed_replication_per_dc Java system properties are kind
of a backdoor via the management-api that do allow us to specify non-existent
DCs for system keysapces but only on the initial cluster creation.

The GetDatacentersForReplication function is used for system, stargate, reaper,
and user keyspaces to determine which DCs should be included for the replication.
If the cluster is already initialized then only the DCs that are already part of
the cluster are included.

When adding a new dc replication for user keyspaces is specified via the
k8ssanda.io/dc-replication annotation. If not specified, no replication changes
are made for user keyspaces. If specified, all user keyspaces must be specified.
If you don't want to replicate a particular keyspace, then specify a value of
zero.

Reconcile Stargate auth and Reaper keyspaces after reconciling each dc. This
change is needed to handle rebuild and decommission scenarios. See
k8ssandra#262 (comment)
for a detailed explanation on why the changes are necessary.
Previously we only set the default superuser secret name in memory and did not
persist it. The version check patches that I implemented caused a problem with
that. The setting is lost after the first patch is applied. It makes more sense
to just persist the default.
To date we have relied on setting a couple system properties to configure the
replication of system keyspaces. As I added support for managing replication
of reaper and stargate auth keyspaces, I attempted to consolidate how they are
managed. It makes sense they basically need to be managed in the same way. I
had to undo those changes though until we implement support for running repairs
after replication changes.

If we configure the replication for system_auth with client CQL calls from the
operator instead of back door in the management api, then we need to run a
repair on system_auth when a second dc is added to the cluster; otherwise,
any queries against nodes in the second dc will fail. This applies when
we are deploying a new cluster as well as when adding a dc to an existing
cluster.

This commit also updates the version of cass-operator now that the
CassandraTask API has landed in master in cass-operator.
@jsanda jsanda merged commit 143afb1 into k8ssandra:main Jan 26, 2022
@jsanda jsanda deleted the add-dc branch January 26, 2022 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants