Remote restore support for Medusa #454

adejanovski · 2022-03-08T14:38:39Z

What this PR does:
Allow restoring a backup from another cluster (including backups created with Medusa outside of K8ssandra).
Requires these medusa PR to be merged first, and a Medusa release so that we can update the tag for the medusa image: 448 and 449.

The design is fully documented in docs/medusa/remote-restore-design.md.

Which issue(s) this PR fixes:
Fixes #450
Fixes #451
Fixes #452

Checklist

Changes manually tested
Automated Tests added/updated
Documentation added/updated
CHANGELOG.md updated (not required for documentation PRs)
CLA Signed: DataStax CLA

apis/medusa/v1alpha1/cassandrabackup_types.go

apis/medusa/v1alpha1/medusatask_types.go

burmanm · 2022-03-11T14:14:07Z

controllers/medusa/cassandrabackup_controller.go

+		return ctrl.Result{RequeueAfter: r.DefaultDelay}, err
+	}
+
+	// If the CassandraBackup object was created by a sync operation, we mustn't trigger a backup.


The issue with this approach is that I don't think "backup" definition and "backupJob" should be the same. That would solve this mentioned issue. Also, if metadata per object is necessary, we could use annotation instead of polluting the CRD with extra fields.

Right, that change is coming in the next PR 👍

burmanm · 2022-03-11T14:25:03Z

controllers/medusa/medusatask_controller.go

+	if !task.Status.StartTime.IsZero() {
+		// If there is anything in progress, simply requeue the request
+		if len(task.Status.InProgress) > 0 {
+			logger.Info("Tasks already in progress")


This and next logger lines, without other info in the log statement the reader can't tell which restore process this is related to. So we need some KeyValues in the .Info()

isn't it what line 59 does?

logger := log.FromContext(ctx).WithValues("MedusaTask", req.NamespacedName)

burmanm · 2022-03-11T14:25:36Z

controllers/medusa/medusatask_controller.go

+		// Set the finish time
+		// Note that the time here is not accurate, but that is ok. For now we are just
+		// using it as a completion marker.
+		patch := client.MergeFrom(task.DeepCopy())


task is already DeepCopy of instance. Do we really need double copy?

I'm following the conventions we've used throughout the project when patching CRs.
Do you think we're wrong to do so?
I seem to remember that patches wouldn't be applied properly without this, but I could be wrong.

burmanm · 2022-03-11T14:27:01Z

controllers/medusa/medusatask_controller.go

+	}
+
+	// If the task is already finished, there is nothing to do.
+	if taskFinished(task) {


Shouldn't this be before the previous if statement? StartTime is not zero for finished jobs either?

yes, it would be better to move it before the previous if indeed 👍
I'll move it.

burmanm · 2022-03-11T14:29:31Z

controllers/medusa/medusatask_controller.go

+		return nil, err
+	}
+
+	pods := make([]corev1.Pod, 0)


, len(podList.Items)

pkg/medusa/reconcile.go

burmanm · 2022-03-11T14:38:38Z

controllers/medusa/medusatask_controller.go

+	}
+
+	// Invoke the purge operation on all pods, in the background
+	go func() {


How is the verification done if this go func is still alive or failed? What if the k8ssandra-operator crashes while this is running?

Then we lose track of the operation. For better or worse, this is still what we currently have for backups as well (I re-used the same code structure).
We could definitely improve the resiliency by making the calls non blocking with a way to check the state asynchronously, like we're about to do with backups.
The tasks we run here should be far more short lived and easy to rerun in case of failure though, so it's less worrisome than backups and could be dealt with in a future PR.

burmanm · 2022-03-11T14:39:50Z

controllers/medusa/medusatask_controller.go

+	}
+
+	// Invoke the purge operation on all pods, in the background
+	go func() {


Same as next go func() question. Liveness check.

Same answer :)

jsanda

A partial review. More to follow tomorrow...

jsanda · 2022-03-15T02:56:47Z

apis/medusa/v1alpha1/medusatask_types.go

+	Failed []string `json:"failed,omitempty"`
+}
+
+type TaskResult struct {


Are all of the fields in this struct specific to a purge operation? If so maybe rename to PurgeTaskResult or PurgeResult.

It's not the case. The prepare_restore will generate TaskResult instances with just the PodName filled, leaving purge specific variables empty.

I'm not a fan of this. I think it obscures the code. From a maintenance perspective for example, how am I supposed to know that a particular field is just empty at a given point in time vs empty because it isn't used for the given task type. How about some comments to explain which fields are used for which task types?

that's fair, I'll add the comments.

controllers/medusa/medusatask_controller.go

jsanda · 2022-03-15T04:24:34Z

controllers/medusa/medusatask_controller.go

+		task.Status.FinishTime = metav1.Now()
+		if err := r.Status().Patch(ctx, task, patch); err != nil {
+			logger.Error(err, "failed to patch status with finish time")
+			return ctrl.Result{RequeueAfter: r.DefaultDelay}, err


No need to set RequeueAfter when returning a non-nil error.

return ctrl.Result{}, err will reschedule a reconcile later?

Yup, with rate limiting. Here is the relevant code from the controller:

result, err := c.Reconcile(ctx, req) switch { case err != nil: c.Queue.AddRateLimited(req) ctrlmetrics.ReconcileErrors.WithLabelValues(c.Name).Inc() ctrlmetrics.ReconcileTotal.WithLabelValues(c.Name, labelError).Inc() log.Error(err, "Reconciler error") case result.RequeueAfter > 0: // The result.RequeueAfter request will be lost, if it is returned // along with a non-nil error. But this is intended as // We need to drive to stable reconcile loops before queuing due // to result.RequestAfter c.Queue.Forget(obj) c.Queue.AddAfter(req, result.RequeueAfter) ctrlmetrics.ReconcileTotal.WithLabelValues(c.Name, labelRequeueAfter).Inc() case result.Requeue: c.Queue.AddRateLimited(req) ctrlmetrics.ReconcileTotal.WithLabelValues(c.Name, labelRequeue).Inc() default: // Finally, if no error occurs we Forget this item so it does not // get queued again until another change happens. c.Queue.Forget(obj) ctrlmetrics.ReconcileTotal.WithLabelValues(c.Name, labelSuccess).Inc() } }

Notice that when a non-nil error is returned the controller calls c.Queue.AddRateLimited(req). See here for the full source.

cool, I'll do the changes 👍

jsanda · 2022-03-15T04:30:50Z

controllers/medusa/medusatask_controller.go

+	podList := &corev1.PodList{}
+	labels := client.MatchingLabels{cassdcapi.DatacenterLabel: cassdc.Name}
+	if err := r.List(ctx, podList, labels); err != nil {
+		logger.Error(err, "failed to get pods for cassandradatacenter", "CassandraDatacenter", cassdc.Name)


In general I don't think we should log an ERROR statement when returning an error. It can create noise in the logs. See #281 for more info. If we need to log something it is probably for debugging purposes. See #282 for details.

I saw you wanted us to remove these loggings. The problem is that the error printed out if we don't put our own logging is usually more obscure and doesn't give details allowing to know precisely what failed. Technically we know what failed, but it's harder to know what we were trying to do when it failed.
I'm ok with some repetition in the logs if that allows more clarity during troubleshooting.

The error printed out should include your error message. What isn't included though is a stack trace which can be problematic when the same error message is used in multiple places.

Whatever we do or change (if anything) for logging is a separate concern from this PR, so I think we're good to go.

You can use the Wrap function from github.com/pkg/errors to get a stacktrace in the output. Instead of the logger.Error call, you could just do

return nil, errors.Wrap(err, fmt.Sprintf("failed to get pods for cassandradatacenter %s", cassdc.Name))

The output will include a stacktrace from the point where Wrap is called.

jsanda · 2022-03-15T04:31:18Z

controllers/medusa/medusatask_controller.go

+	if err := r.Status().Patch(ctx, task, patch); err != nil {
+		logger.Error(err, "Failed to patch status")
+		// We received a stale object, requeue for next processing
+		return ctrl.Result{RequeueAfter: r.DefaultDelay}, err


No need to set RequeueAfter when returning a non-nil error.

jsanda · 2022-03-15T04:32:56Z

controllers/medusa/medusatask_controller.go

+		wg.Wait()
+		logger.Info("finished task operations")
+		if err := r.Status().Patch(context.Background(), task, patch); err != nil {
+			logger.Error(err, "failed to patch status", "MedusaTask", fmt.Sprintf("%s/%s", task.Spec.Operation, task.Namespace))


fmt.Sprintf("%s/%s", task.Spec.Operation, task.Namespace) is used in multiple places. Maybe implement the Stringer interface?

sounds good 👍

jsanda · 2022-03-15T04:33:40Z

controllers/medusa/medusatask_controller.go

+	if err := r.Status().Patch(ctx, task, patch); err != nil {
+		logger.Error(err, "Failed to patch status")
+		// We received a stale object, requeue for next processing
+		return ctrl.Result{RequeueAfter: r.DefaultDelay}, err


No need to set RequeueAfter when returning a non-nil error.

jsanda · 2022-03-15T04:35:05Z

controllers/medusa/medusatask_controller.go

+			localBackups := &medusav1alpha1.CassandraBackupList{}
+			if err = r.List(ctx, localBackups, client.InNamespace(task.Namespace)); err != nil {
+				logger.Error(err, "failed to list backups")
+				return ctrl.Result{RequeueAfter: r.DefaultDelay}, err


No need to set RequeueAfter when returning a non-nil error.

controllers/medusa/medusatask_controller.go

jsanda

I assume we want to support the scenario of restoring from one bucket and then storing subsequent backups in a different bucket. Do the API changes support that? It doesn't look like it.

adejanovski · 2022-03-25T07:21:02Z

I assume we want to support the scenario of restoring from one bucket and then storing subsequent backups in a different bucket. Do the API changes support that? It doesn't look like it.

Not yet, it's part of another upcoming ticket: #453

adejanovski · 2022-04-25T11:23:46Z

@jsanda, I made the adjustments in Medusa so that we can support both the case where IPs aren't resolved to hostnames and the case when they are.
E2E tests are all passing and the PR is ready for another review.

.github/workflows/kind_e2e_tests.yaml

.github/workflows/kind_multicluster_e2e_tests.yaml

jsanda · 2022-04-26T03:27:22Z

Makefile

@@ -439,3 +439,7 @@ install-kuttl:
 mocks:
 	mockery --dir=./pkg/cassandra --output=./pkg/mocks --name=ManagementApiFacade
 	mockery --dir=./pkg/reaper --output=./pkg/mocks --name=Manager  --filename=reaper_manager.go --structname=ReaperManager
+
+PHONY: protobuf-code-gen
+protobuf-code-gen:


This assumes that the protobuf compiler is installed. It would be really nice if the Makefile automatically installed the binary into the bin dir like it does with some other tools. That could make for a follow up enhancement. For now how about a comment pointing to some docs for installing?

sounds good 👍

I've added a comment on line 443

jsanda · 2022-04-26T03:39:17Z

apis/medusa/v1alpha1/medusatask_types.go

+	Failed []string `json:"failed,omitempty"`
+}
+
+type TaskResult struct {


I'm not a fan of this. I think it obscures the code. From a maintenance perspective for example, how am I supposed to know that a particular field is just empty at a given point in time vs empty because it isn't used for the given task type. How about some comments to explain which fields are used for which task types?

jsanda · 2022-04-26T04:00:05Z

apis/medusa/v1alpha1/cassandrabackup_types.go

@@ -18,20 +18,13 @@ package v1alpha1



If we intend to replace this api with medusabackup_types, then I think we need to discuss how we want to handle breaking API changes. That is not a discussion that I want to have in the PR review 😄 Do you think though that some comments should be added to indicate that the API is deprecated?

very much so. I had looked in the kubebuilder book if there was any annotation we could put to mark stuff as deprecated, with no luck.
But I just came across this: kubernetes-sigs/kubebuilder#2116

Turns out the annotation exists but isn't documented: //+kubebuilder:deprecatedversion:warning="example.com/v1alpha1 CronTab is deprecated"

I'll use this.

perfect 🙂

controllers/medusa/medusabackupjob_controller.go

adejanovski · 2022-04-26T12:35:05Z

@jsanda, I've made the requested changes. Let me know what you think.
Whatever happens, I'll need to merge the associated PRs in the Medusa project and changes the tag accordingly here.

Also, I'd like to point out I've removed the backup name in the spec of MedusaBackupJob as I feel it's redundant with the meta.name for that same object. Hence, the MedusaBackupJob name will be the same as the MedusaBackup name (which only has a meta.name as well and no more spec.backupName) and also as the backup name in the object storage backend.

jsanda · 2022-04-26T11:55:51Z

controllers/medusa/medusabackupjob_controller.go

+}
+
+func (r *MedusaBackupJobReconciler) createMedusaBackup(ctx context.Context, backup *medusav1alpha1.MedusaBackupJob, logger logr.Logger) error {
+	// Create a prepare_restore medusa task to create the mapping files in each pod.


Looks like this comment is in the wrong place.

good catch 👍

jsanda · 2022-04-26T12:26:21Z

controllers/medusa/medusatask_controller.go

+	}
+}
+
+func (r *MedusaTaskReconciler) getCassandraDatacenterPods(ctx context.Context, cassdc *cassdcapi.CassandraDatacenter, logger logr.Logger) ([]corev1.Pod, error) {


Looks like this is identical to the getCassandraDatacenterPods in medusabackupjob_controller.go. Can we refactor this into a common, shared method or func?

sounds good

jsanda · 2022-04-26T12:29:08Z

controllers/medusa/medusatask_controller.go

+}
+
+func (r *MedusaTaskReconciler) purgeOperation(ctx context.Context, task *medusav1alpha1.MedusaTask, pods []corev1.Pod, logger logr.Logger) (reconcile.Result, error) {
+	logger.Info("Starting purge operations")


It would be good to include the datacenter name in log statement.

k, I need to move the logging further in the code so that we can access the info after reading the task object.

What about adding the key/value pair to the logger in the Reconcile method?

Great idea. Done in my upcoming commit 👍

jsanda · 2022-04-26T12:29:16Z

controllers/medusa/medusatask_controller.go

+}
+
+func (r *MedusaTaskReconciler) prepareRestoreOperation(ctx context.Context, task *medusav1alpha1.MedusaTask, pods []corev1.Pod, logger logr.Logger) (reconcile.Result, error) {
+	logger.Info("Starting prepare restore operations")


It would be good to include the datacenter name in log statement.

jsanda · 2022-04-26T12:54:53Z

controllers/medusa/medusatask_controller.go

+		backupMutex := sync.Mutex{}
+		patch := client.MergeFrom(task.DeepCopy())
+
+		for _, p := range pods {


The control flow in this loop in nearly identical to the control flow in purgeOperation as well as in MedusaBackupJobReconciler.Reconcile. The status updates appear to be the same as well. I realize that in one case we are updating a MedusaTaskStatus and in other case we are updating a MedusaBackupJobStatus. How about moving the common fields into shared, common struct? And maybe define an interface through which the status updates are made. Both MedusaTaskStatus and MedusaBackupJobStatus would implement the interface.

I'm sorry but I'm not entirely convinced of the benefits of doing so. Statuses between both objects are fairly different even if there are common fields.
Also, that would apply to MedusaBackupJob as well, which uses the same control flow as well.
I could see some value if we treated restore jobs and backup jobs as a tasks, and used a common reconcile loop for all, but that's not what your suggesting.
Am I missing your point?

I forgot to include my main point 🤦‍♂️ The control logic is duplicated in several places. It would be good to look at deduplicating it. Doing so would handling both MedusaTaskStatus and MedusaBackupJobStatus, hence my suggestions for the common struct.

@adejanovski I pushed some initial changes here to illustrate what I had in mind. I introduced a executePodOperations which encapsulates all of the control flow and goroutines for making calls against individual pods. Some more work is needed to handle the additional status updates in purgeOperation. With the status-related refactoring I mentioned in my earlier comment you should be able reuse this in the backup controller as well.

To be clear though, my main observation is that the control flow for executing operations against pod is duplicated in multiple places and should be refactored to eliminate the duplication. A lot of that deduplication however could be done without the status-related changes I suggested.

Cool, I've refactored the code using your experiment code. Haven't pushed it as far as refactoring the status because I'd like to work on using common code between task and backup operations in a subsequent PR.
There's a lot being done here already IMO and we should scope this out.
I'll create a ticket for this if you agree.

Sorry for the delayed response. Follow up ticket works for me.

Cool, I've create the issue: #541

jsanda · 2022-04-27T03:08:34Z

controllers/medusa/medusatask_controller.go

+	podList := &corev1.PodList{}
+	labels := client.MatchingLabels{cassdcapi.DatacenterLabel: cassdc.Name}
+	if err := r.List(ctx, podList, labels); err != nil {
+		logger.Error(err, "failed to get pods for cassandradatacenter", "CassandraDatacenter", cassdc.Name)


You can use the Wrap function from github.com/pkg/errors to get a stacktrace in the output. Instead of the logger.Error call, you could just do

return nil, errors.Wrap(err, fmt.Sprintf("failed to get pods for cassandradatacenter %s", cassdc.Name))

The output will include a stacktrace from the point where Wrap is called.

jsanda · 2022-04-27T04:22:48Z

controllers/medusa/medusarestorejob_controller.go

+	}
+
+	// Prepare the restore by placing a mapping file in the Cassandra data volume.
+	if !request.RestoreJob.Status.RestorePrepared {


We don't need this for an in-place restore do we? I haven't tested but if I recall I want to say it is fine to do; however, that is a good bit of overhead that could otherwise be avoided.

The thing is that at this stage, we don't know yet if it's an in place restore or not. That distinction was removed from the MedusaRestoreJob CRD as it is computed by Medusa itself... during the prepare restore task.
In our future design, it's the operator that will compute it based on the host names (if they match between the source and target clusters, then it's in place).

In our future design, it's the operator that will compute it based on the host names

Will these changes be included in this PR or a follow up?

Follow up PR, in this PR Medusa computes the mappings, including the in place/remote nature.

jsanda · 2022-04-27T19:59:05Z

controllers/medusa/medusatask_controller_test.go

+	err := f.Client.Create(ctx, kc)
+	require.NoError(err, "failed to create K8ssandraCluster")
+
+	reconcileReplicatedSecret(ctx, t, f, kc)


This is a duplicate of verifyReplicatedSecretReconciled in k8ssandracluster_controller_test.go (see here). I created #537 to address it.

jsanda · 2022-04-27T20:22:53Z

controllers/medusa/controllers_test.go

+	medusaClientFactory = NewMedusaClientFactory()
+
+	err := testEnv.Start(ctx, t, func(controlPlaneMgr manager.Manager, clientCache *clientcache.ClientCache, clusters []cluster.Cluster) error {
+		err := (&k8ssandractrl.K8ssandraClusterReconciler{


I realize we are already creating and deploying the K8ssandraCluster controller in main, but you shouldn't be. The integration tests are structured in a way to facilitate testing controllers in isolation. I created #538 to address this. It can be done a separate, follow up PR.

yes, that's a lack of knowledge on my side, sorry. I couldn't figure out how to isolate them properly in the tests.

jsanda · 2022-04-27T20:26:23Z

controllers/medusa/controllers_test.go

+	medusaClientFactory = NewMedusaClientFactory()
+
+	err := testEnv.Start(ctx, t, func(controlPlaneMgr manager.Manager, clientCache *clientcache.ClientCache, clusters []cluster.Cluster) error {
+		err := (&k8ssandractrl.K8ssandraClusterReconciler{


As per #538 there shouldn't be any need to deploy the K8ssandraCluster controller for backup tests. You should only deploy the backup controller.

jsanda · 2022-04-27T20:39:24Z

controllers/medusa/controllers_test.go

+		}
+
+		for _, env := range testEnv.GetDataPlaneEnvTests() {
+			dataPlaneMgr, err := ctrl.NewManager(env.Config, ctrl.Options{Scheme: scheme.Scheme})


As per #538 there shouldn't be any need to deploy any other controller besides the task controller for task integration tests. Considering that the tests are already setup this way in main, I'm fine with addressing in a follow up PR.

jsanda · 2022-04-28T03:32:56Z

controllers/medusa/medusatask_controller_test.go

+			return false
+		}
+
+		return !updated.Status.FinishTime.IsZero() && updated.Status.Finished[0].NbBackupsPurged == 2


We should also verify that Finished has 3 elements.

jsanda · 2022-04-28T04:15:23Z

controllers/medusa/medusatask_controller.go

+
+		// Schedule a sync if the task is a purge
+		if task.Spec.Operation == medusav1alpha1.OperationTypePurge {
+			r.scheduleSyncForPurge(task)


Need to handle the error returned here.

Something else I thought about while thinking about this part of the code is cleanup. We need an automatic cleanup of completed tasks. This is done for the Cassandra tasks in cass-operator.

If the CassandraDacenter on which the operation was performed is deleted, should the corresponding tasks be deleted as well? I am inclined to say yes. If we are in agreement on that then we need either owner references or some new finalizer logic.

You're very right. Also, it's not a sync anymore as I've changed this part into directly creating the MedusaBackup object instead.
I'll rework that part.

I was confusing with some code from the backup job controller. This is still necessary and indeed I need to handle the errors.

jsanda · 2022-05-11T03:19:32Z

test/e2e/medusa_test.go

@@ -36,6 +36,22 @@ func createSingleMedusa(t *testing.T, ctx context.Context, namespace string, f *
 	verifyRestoreFinished(t, ctx, f, dcKey, backupKey)
 }

+func createSingleMedusaJob(t *testing.T, ctx context.Context, namespace string, f *framework.E2eFramework) {


I realize that the setup will be a bit more involved that existing tests, but we should have a e2e test that covers remote restores. Would you create a follow up ticket for it?

jsanda · 2022-05-11T03:22:06Z

pkg/medusa/requests.go

+		return nil, &ctrl.Result{RequeueAfter: 10 * time.Second}, err
+	}
+
+	backup := &api.MedusaBackup{}


This part needs to be updated. I hit this error when I tried restoring to a new cluster. I had to manually create a sync task to avoid this.

it's the process actually. Did you expect the sync to be done automatically?
Backups won't get synced unless you run a sync MedusaTask, then you'll have them created as MedusaBackup CR which can be restored.

I think I misread the code last night and thought that the restore controller created the sync task as a preliminary step. It creates the prepare restore task. I'm good here.

jsanda · 2022-05-11T03:31:36Z

controllers/medusa/medusarestorejob_controller.go

+			// Create the sync task
+			prepare = &medusav1alpha1.MedusaTask{
+				ObjectMeta: metav1.ObjectMeta{
+					Name:      request.RestoreJob.Status.RestoreKey,


Can we change the task name to something more descriptive, maybe something like <cluster-name>-<dc-name>-prepare-restore? Seeing task objects with UUIDs for their names isn't very helpful. Alternatively we could update the CRD to include the operation type and cassdc name in the output for kubectl get.

to be fair, it'll be created in the namespace of the cluster that's being restored.
If multiple restores are done, we need a name that distinguish them, and using the restore key seemed like a good idea.
I'm afraid that we may create names that are too long if we stuff in things like the cluster name 😕

Alternatively we could update the CRD to include the operation type and cassdc name in the output for kubectl get.

Would you add that in the status of the CRD?

I'm afraid that we may create names that are too long if we stuff in things like the cluster name

Definitely a valid concern.

Would you add that in the status of the CRD?

Not in the status. There is a kubebuilder:printcolumn:name annotation. See https://github.com/k8ssandra/k8ssandra-operator/blob/main/apis/stargate/v1alpha1/stargate_types.go#L283 for examples. Would make for a nice follow up enhancement. I don't think it's needed for this PR.

jsanda · 2022-05-11T03:58:04Z

apis/medusa/v1alpha1/medusarestorejob_types.go

+// NOTE: json tags are required.  Any new fields you add must have json tags for the fields to be serialized.
+
+// MedusaRestoreJobSpec defines the desired state of MedusaRestoreJob
+type MedusaRestoreJobSpec struct {


For remote restores shouldn't we allow the bucket from which we are restoring to be specified separately instead of just reusing what is specified for backups? I am thinking that I have some CI/CD pipeline where I provision clusters from a known, good backup I would probably want to keep their storage buckets separate.

The only thing we could do with the way Medusa is currently designed, is to specify a different prefix for the sync and the restore, which would then be specified at sync and restore time. It probably means that the MedusaBackup object should contain that prefix in the spec.
This is part of a follow up ticket: #453

jsanda · 2022-05-11T19:57:13Z

.github/workflows/kind_multicluster_e2e_tests.yaml

@@ -66,6 +66,7 @@ jobs:
          - CreateMultiReaper
          - ClusterScoped/MultiDcMultiCluster
          - CreateMultiMedusa


CreateMultiMedusa is matching both CreateMultiMedusaOld and CreateMultiMedusaJob, so both tests are being run and CreateMultiMedusaJob is being run twice. It should be changed to CreateMultiMedusaOld.

Damn, I was pretty sure I fixed this already 🤔
Good catch!

jsanda

I'm gonna go ahead and approve so we don't have to wait until I am back online. I spent some time investigating the e2e test failure, and I think we are good to go with a small change. @adejanovski well done 👏

… implement remote restore. The PrepareRestore grpc call will have Medusa compute the mapping between the backup and restore clusters, and store it in the local storage. This will allow the restore init container to know which mapping should be performed, without having Cassandra up. Disable ip address resolving in Medusa to allow restore mappings.

adejanovski requested a review from a team as a code owner March 8, 2022 14:38

adejanovski marked this pull request as draft March 8, 2022 14:38

adejanovski changed the title ~~Remote restore~~ Remote restore support for Medusa Mar 8, 2022

adejanovski mentioned this pull request Mar 8, 2022

Allow remote restore in Kubernetes thelastpickle/cassandra-medusa#449

Merged

adejanovski requested review from adutra and jsanda March 8, 2022 17:32

burmanm reviewed Mar 11, 2022

View reviewed changes

apis/medusa/v1alpha1/cassandrabackup_types.go Outdated Show resolved Hide resolved

burmanm reviewed Mar 11, 2022

View reviewed changes

apis/medusa/v1alpha1/medusatask_types.go Show resolved Hide resolved

burmanm reviewed Mar 11, 2022

View reviewed changes

adejanovski commented Mar 11, 2022

View reviewed changes

pkg/medusa/reconcile.go Show resolved Hide resolved

burmanm reviewed Mar 11, 2022

View reviewed changes

jsanda reviewed Mar 15, 2022

View reviewed changes

adejanovski marked this pull request as ready for review March 21, 2022 15:54

adejanovski force-pushed the remote-restore branch from 83ff351 to 56d3524 Compare March 24, 2022 09:44

jsanda reviewed Mar 25, 2022

View reviewed changes

adejanovski added the zh:In Progress label Mar 29, 2022

jsanda mentioned this pull request Mar 30, 2022

Make remote storage the source of truth for backups #492

Open

adejanovski removed the request for review from adutra March 30, 2022 14:41

adejanovski removed the zh:In Progress label Apr 1, 2022

adejanovski force-pushed the remote-restore branch 2 times, most recently from 82ca807 to ab94d8d Compare April 19, 2022 16:32

adejanovski requested a review from jsanda April 25, 2022 11:21

jsanda reviewed Apr 26, 2022

View reviewed changes

adejanovski force-pushed the remote-restore branch from d9da468 to f81b6db Compare April 26, 2022 12:32

adejanovski requested a review from jsanda April 26, 2022 12:54

jsanda reviewed Apr 26, 2022

View reviewed changes

jsanda reviewed Apr 27, 2022

View reviewed changes

jsanda reviewed Apr 28, 2022

View reviewed changes

adejanovski force-pushed the remote-restore branch from b6166b6 to 5c9e762 Compare May 5, 2022 16:47

jsanda reviewed May 11, 2022

View reviewed changes

adejanovski force-pushed the remote-restore branch from b2fd6b5 to a3d2bf8 Compare May 11, 2022 17:03

jsanda requested changes May 11, 2022

View reviewed changes

jsanda approved these changes May 12, 2022

View reviewed changes

adejanovski force-pushed the remote-restore branch 3 times, most recently from 29f6967 to 8bcd162 Compare May 13, 2022 07:39

adejanovski added 2 commits May 13, 2022 17:35

Upgrade to Medusa 0.13.1

b14b4ac

adejanovski force-pushed the remote-restore branch from 6b1f1b8 to b14b4ac Compare May 13, 2022 16:20

adejanovski merged commit 2e6ed83 into k8ssandra:main May 13, 2022

Remote restore support for Medusa #454

Remote restore support for Medusa #454

Conversation

adejanovski commented Mar 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsanda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsanda left a comment

Choose a reason for hiding this comment

adejanovski commented Mar 25, 2022

adejanovski commented Apr 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adejanovski Apr 26, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adejanovski commented Apr 26, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsanda Apr 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adejanovski Apr 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adejanovski commented Mar 8, 2022 •

edited

Loading

adejanovski Apr 26, 2022 •

edited

Loading

jsanda Apr 28, 2022 •

edited

Loading

adejanovski Apr 28, 2022 •

edited

Loading