-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LastSyncTime to volume replication status #232
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -51,6 +51,7 @@ const ( | |
pvcDataSource = "PersistentVolumeClaim" | ||
volumeReplicationClass = "VolumeReplicationClass" | ||
volumeReplication = "VolumeReplication" | ||
defaultScheduleTime = time.Hour | ||
) | ||
|
||
var ( | ||
|
@@ -363,16 +364,64 @@ func (r *VolumeReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Re | |
} | ||
|
||
instance.Status.LastCompletionTime = getCurrentTime() | ||
|
||
var requeueForInfo bool | ||
|
||
if instance.Spec.ReplicationState == replicationv1alpha1.Primary { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Madhu-1 If an image is in the split-brain state, is there a TS that is usable? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAIK no during split-brain no sync will happen this is not useful. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In split-brain, I do not assume one of the images is marked Primary? If that is the case, the lastSyncTime should be the time of the last successful sync, maybe. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since we are updating it in every interval, I guess if we reach split brain case, it would already have that, cc @ShyamsundarR There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The question really is what does RBD report, and then what should the proto report. In the case of split-brain we should report not-synced time (which IMHO is the "0" timestamp), or if available the last time it was synced... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, on a sync failure (like split-brain), no LastSyncTime should be reported, but ideally an error. Either the automatic rescheduling should stop, or be done with an exponential backoff. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When the image is in
nixpanic marked this conversation as resolved.
Show resolved
Hide resolved
|
||
info, err := r.getVolumeReplicationInfo(vr) | ||
if err != nil { | ||
logger.Error(err, "Failed to get volume replication info") | ||
return ctrl.Result{}, err | ||
} | ||
ts := info.GetLastSyncTime() | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. unnecessary newline |
||
lastSyncTime := metav1.NewTime(ts.AsTime()) | ||
instance.Status.LastSyncTime = &lastSyncTime | ||
requeueForInfo = true | ||
} | ||
if instance.Spec.ReplicationState == replicationv1alpha1.Secondary { | ||
instance.Status.LastSyncTime = nil | ||
} | ||
err = r.updateReplicationStatus(instance, logger, getReplicationState(instance), msg) | ||
if err != nil { | ||
return ctrl.Result{}, err | ||
} | ||
|
||
logger.Info(msg) | ||
|
||
if requeueForInfo { | ||
nixpanic marked this conversation as resolved.
Show resolved
Hide resolved
|
||
scheduleTime := getScheduleTime(parameters, logger) | ||
return ctrl.Result{ | ||
Requeue: true, | ||
RequeueAfter: scheduleTime, | ||
}, nil | ||
} | ||
|
||
return ctrl.Result{}, nil | ||
} | ||
|
||
// getScheduleTime takes parameters and returns the scheduling interval | ||
// after converting it to time.Duration. If the schedulingInterval is empty | ||
// or there is error parsing, it is set to the default value. | ||
func getScheduleTime(parameters map[string]string, logger logr.Logger) time.Duration { | ||
nixpanic marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// the schedulingInterval looks like below, which is the part of volumereplicationclass | ||
// and is an optional parameter. | ||
// ```parameters: | ||
// replication.storage.openshift.io/replication-secret-name: rook-csi-rbd-provisioner | ||
// replication.storage.openshift.io/replication-secret-namespace: rook-ceph | ||
// schedulingInterval: 1m``` | ||
rawScheduleTime := parameters["schedulingInterval"] | ||
nixpanic marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if rawScheduleTime == "" { | ||
return defaultScheduleTime | ||
} | ||
scheduleTime, err := time.ParseDuration(rawScheduleTime) | ||
if err != nil { | ||
logger.Error(err, "failed to parse time: %v", rawScheduleTime) | ||
return defaultScheduleTime | ||
} | ||
return scheduleTime | ||
} | ||
|
||
func (r *VolumeReplicationReconciler) getReplicationClient(driverName string) (grpcClient.VolumeReplication, error) { | ||
conns := r.Connpool.GetByNodeID(driverName, "") | ||
|
||
|
@@ -614,6 +663,30 @@ func (r *VolumeReplicationReconciler) enableReplication(vr *volumeReplicationIns | |
return nil | ||
} | ||
|
||
// getVolumeReplicationInfo gets volume replication info. | ||
func (r *VolumeReplicationReconciler) getVolumeReplicationInfo(vr *volumeReplicationInstance) (*proto.GetVolumeReplicationInfoResponse, error) { | ||
volumeReplication := replication.Replication{ | ||
Params: vr.commonRequestParameters, | ||
} | ||
|
||
resp := volumeReplication.GetInfo() | ||
if resp.Error != nil { | ||
vr.logger.Error(resp.Error, "failed to get volume replication info") | ||
|
||
return nil, resp.Error | ||
} | ||
|
||
infoResponse, ok := resp.Response.(*proto.GetVolumeReplicationInfoResponse) | ||
if !ok { | ||
err := fmt.Errorf("received response of unexpected type") | ||
vr.logger.Error(err, "unable to parse response") | ||
|
||
return nil, err | ||
} | ||
|
||
return infoResponse, nil | ||
} | ||
|
||
nixpanic marked this conversation as resolved.
Show resolved
Hide resolved
|
||
func getReplicationState(instance *replicationv1alpha1.VolumeReplication) replicationv1alpha1.State { | ||
switch instance.Spec.ReplicationState { | ||
case replicationv1alpha1.Primary: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
/* | ||
Copyright 2022 The Kubernetes-CSI-Addons Authors. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); | ||
you may not use this file except in compliance with the License. | ||
You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
*/ | ||
|
||
package controllers | ||
|
||
import ( | ||
"testing" | ||
"time" | ||
|
||
"github.com/go-logr/logr/testr" | ||
) | ||
|
||
func TestGetScheduledTime(t *testing.T) { | ||
t.Parallel() | ||
td, _ := time.ParseDuration("1m") | ||
const defaultScheduleTime = time.Hour | ||
logger := testr.New(t) | ||
testcases := []struct { | ||
parameters map[string]string | ||
time time.Duration | ||
}{ | ||
{ | ||
parameters: map[string]string{ | ||
"replication.storage.openshift.io/replication-secret-name": "rook-csi-rbd-provisioner", | ||
"schedulingInterval": "1m", | ||
}, | ||
time: td, | ||
}, | ||
{ | ||
parameters: map[string]string{ | ||
"replication.storage.openshift.io/replication-secret-name": "rook-csi-rbd-provisioner", | ||
}, | ||
time: defaultScheduleTime, | ||
}, | ||
{ | ||
parameters: map[string]string{}, | ||
time: defaultScheduleTime, | ||
}, | ||
{ | ||
parameters: map[string]string{ | ||
"schedulingInterval": "", | ||
}, | ||
time: defaultScheduleTime, | ||
}, | ||
{ | ||
parameters: map[string]string{ | ||
"schedulingInterval": "2mm", | ||
}, | ||
time: defaultScheduleTime, | ||
}, | ||
} | ||
for _, tt := range testcases { | ||
newtt := tt | ||
t.Run("", func(t *testing.T) { | ||
t.Parallel() | ||
if got := getScheduleTime(newtt.parameters, logger); got != newtt.time { | ||
t.Errorf("GetSchedluedTime() = %v, want %v", got, newtt.time) | ||
} | ||
}) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it required only for primary state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need to resync only when it is in primary state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary end has the status, that it reads from the secondary, and so we use that to update the lastSyncTime.
In cases where we demoted a Primary and are waiting for both Secondaries to be in sync, the timestamp is still useful. Tells us how far back the sync is, before either image can be gracefully promoted.
The demoted snapshot sync time is not critical at present, it would also just show that the lastSyncTime is in the past as compared to the demoted snapshot, and ideally once the demoted snapshot is synced the timestamps would be the same. So not sure if it is useful as such.
Needs more thought on how to use that information, hence sticking to when image is primary at one end is useful.