-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LastSyncTime to volume replication status #232
Conversation
@@ -125,6 +125,8 @@ func (r *VolumeReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Re | |||
} | |||
// remove the prefix keys in volume replication class parameters | |||
parameters := filterPrefixedParameters(replicationParameterPrefix, vrcObj.Spec.Parameters) | |||
schedulingTime := parameters["schedulingInterval"] | |||
scheduleTime, _ := time.ParseDuration(schedulingTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
describe the schedulingInterval
parameter somewhere
8b345f5
to
13f6350
Compare
internal/proto/replication.proto
Outdated
message GetVolumeReplicationInfoResponse{ | ||
// Holds the last sync time. | ||
// This field is REQUIRED. | ||
int64 last_sync_time = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still not implemented,
with this change, we can remove some code for conversion in other functions too.
schedulingTime := parameters["schedulingInterval"] | ||
scheduleTime, _ := time.ParseDuration(schedulingTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it can be empty and it's optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -363,13 +365,30 @@ func (r *VolumeReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Re | |||
} | |||
|
|||
instance.Status.LastCompletionTime = getCurrentTime() | |||
var last_sync_time int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use _ in variable names
@@ -363,13 +365,30 @@ func (r *VolumeReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Re | |||
} | |||
|
|||
instance.Status.LastCompletionTime = getCurrentTime() | |||
var last_sync_time int64 | |||
var requeueForInfo bool | |||
if instance.Spec.ReplicationState == replicationv1alpha1.Primary { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it required only for primary state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need to resync only when it is in primary state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The primary end has the status, that it reads from the secondary, and so we use that to update the lastSyncTime.
In cases where we demoted a Primary and are waiting for both Secondaries to be in sync, the timestamp is still useful. Tells us how far back the sync is, before either image can be gracefully promoted.
The demoted snapshot sync time is not critical at present, it would also just show that the lastSyncTime is in the past as compared to the demoted snapshot, and ideally once the demoted snapshot is synced the timestamps would be the same. So not sure if it is useful as such.
Needs more thought on how to use that information, hence sticking to when image is primary at one end is useful.
@@ -125,6 +125,8 @@ func (r *VolumeReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Re | |||
} | |||
// remove the prefix keys in volume replication class parameters | |||
parameters := filterPrefixedParameters(replicationParameterPrefix, vrcObj.Spec.Parameters) | |||
schedulingTime := parameters["schedulingInterval"] | |||
scheduleTime, _ := time.ParseDuration(schedulingTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whhy we are sending parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to update the last sync time after every interval and that is, scheduling interval and hence, it's required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from VolumeReplication CR schedulingInterval is optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, thanks for notifying, will check on that and update the PR.
@@ -651,3 +698,9 @@ func getCurrentTime() *metav1.Time { | |||
|
|||
return &metav1NowTime | |||
} | |||
|
|||
func getTime(time time.Time) *metav1.Time { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for a function for this one.
13f6350
to
23037e5
Compare
@@ -125,6 +125,14 @@ func (r *VolumeReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Re | |||
} | |||
// remove the prefix keys in volume replication class parameters | |||
parameters := filterPrefixedParameters(replicationParameterPrefix, vrcObj.Spec.Parameters) | |||
schedulingTime := parameters["schedulingInterval"] | |||
if schedulingTime == "" { | |||
schedulingTime = "1m" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define it as a constant ?
and write elaborate how it will be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still not done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the schedule is not set for this VR, then the pool level schedule (if available) should be used for the reconcile requeue. If that is not available then we can either NOT requeue or requeue for a duration till the last sync time is not empty (as post that there would be no further syncs).
There is a possibility that a pool level schedule is added later, once we make the above not to requeue decision, but that would either require a restart of the plugin (so that all VRs are reconciled), or reconciling based on added pool level schedules, or waiting for the periodic reconcile (24h) at which time this will be picked up.
We possibly want to track the above for the future, for now if there is no schedule we can default to a something (1m/5m), as the current DR stack comes with a schedule and should not break the functionality.
internal/proto/replication.proto
Outdated
message GetVolumeReplicationInfoResponse{ | ||
// Holds the last sync time. | ||
// This field is REQUIRED. | ||
int64 last_sync_time = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still not implemented,
with this change, we can remove some code for conversion in other functions too.
6478936
to
817bea4
Compare
return 0, resp.Error | ||
} | ||
|
||
infoResponse, ok := resp.Response.(*proto.GetVolumeReplicationInfoResponse) | ||
if !ok { | ||
err := fmt.Errorf("received response of unexpected type") | ||
vr.logger.Error(err, "unable to parse response") | ||
|
||
return 0, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return 0, resp.Error | |
} | |
infoResponse, ok := resp.Response.(*proto.GetVolumeReplicationInfoResponse) | |
if !ok { | |
err := fmt.Errorf("received response of unexpected type") | |
vr.logger.Error(err, "unable to parse response") | |
return 0, err | |
return nil, resp.Error | |
} | |
infoResponse, ok := resp.Response.(*proto.GetVolumeReplicationInfoResponse) | |
if !ok { | |
err := fmt.Errorf("received response of unexpected type") | |
vr.logger.Error(err, "unable to parse response") | |
return nil, err |
metav1Time := metav1.NewTime(lastSyncTime1) | ||
instance.Status.LastSyncTime = &metav1Time | ||
requeueForInfo = true | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary newline
scheduleTime, err := time.ParseDuration(schedulingTime) | ||
if err != nil { | ||
logger.Error(err, "failed to parse time: %v", schedulingTime) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at these lines again.
schedulingTime := parameters["schedulingInterval"] | ||
if schedulingTime == "" { | ||
schedulingTime = "1m" | ||
} | ||
scheduleTime, err := time.ParseDuration(schedulingTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use rawScheduleTime instead schedulingTime,
it is confusing.
@@ -125,6 +125,14 @@ func (r *VolumeReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Re | |||
} | |||
// remove the prefix keys in volume replication class parameters | |||
parameters := filterPrefixedParameters(replicationParameterPrefix, vrcObj.Spec.Parameters) | |||
schedulingTime := parameters["schedulingInterval"] | |||
if schedulingTime == "" { | |||
schedulingTime = "1m" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still not done.
|
||
var requeueForInfo bool | ||
if instance.Spec.ReplicationState == replicationv1alpha1.Primary { | ||
lastSyncTime, err := r.getVolumeReplicationInfo(vr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe call this info
instead of lastSyncTime
@@ -614,6 +644,30 @@ func (r *VolumeReplicationReconciler) enableReplication(vr *volumeReplicationIns | |||
return nil | |||
} | |||
|
|||
// getVolumeReplicationInfo gets volume replication info. | |||
func (r *VolumeReplicationReconciler) getVolumeReplicationInfo(vr *volumeReplicationInstance) (*timestamppb.Timestamp, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not return an info
kind of object? return something like what volumeReplication.GetInfo()
does, or rename the function to return a LastSyncTimestamp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah for now it just returns last sync time, but this name is given just with the thought of future implementation, we may like to return other informations along with the last sync time, hence don't want it to be specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current state is confusing, so you will have to decide what this function does. It can be renamed once it gets extended, no need to prepare function names for things in the future. Leave a comment with the possibility if you think it is useful.
if err != nil { | ||
logger.Error(err, "Failed to get volume replication info") | ||
} | ||
lastSyncTime1 := lastSyncTime.AsTime() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this can not fail, just place it in metav1Time := metav1.NewTime(lastSyncTime.AsTime())
.
metav1Time
sounds more like a package name, similar to metav1
. This is just a variable, so why not call it time
?
@@ -363,13 +372,34 @@ func (r *VolumeReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Re | |||
} | |||
|
|||
instance.Status.LastCompletionTime = getCurrentTime() | |||
|
|||
var requeueForInfo bool | |||
if instance.Spec.ReplicationState == replicationv1alpha1.Primary { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Madhu-1 If an image is in the split-brain state, is there a TS that is usable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK no during split-brain no sync will happen this is not useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In split-brain, I do not assume one of the images is marked Primary? If that is the case, the lastSyncTime should be the time of the last successful sync, maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are updating it in every interval, I guess if we reach split brain case, it would already have that, cc @ShyamsundarR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question really is what does RBD report, and then what should the proto report. In the case of split-brain we should report not-synced time (which IMHO is the "0" timestamp), or if available the last time it was synced...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, on a sync failure (like split-brain), no LastSyncTime should be reported, but ideally an error. Either the automatic rescheduling should stop, or be done with an exponential backoff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the image is in replaying
state with RBD we get the additional description, if stopped or split-brain the additional data is not available, and we will hence default to the "0" time stamp. We should be fine with that.
|
||
var requeueForInfo bool | ||
if instance.Spec.ReplicationState == replicationv1alpha1.Primary { | ||
lastSyncTime, err := r.getVolumeReplicationInfo(vr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a heads up!
The last sync time can be "0" or empty, i.e it was never synced even once, we need to determine how that value would be reflected. One option is to set it to the lower bound of UTC epoch (I am a bit confused on what Timestamp holds, UTC epoch (since 1970), or since 0001-01-01T00:00:00Z
)
e9ad9e1
to
bccc4ef
Compare
bccc4ef
to
7e294fc
Compare
rawScheduleTime := parameters["schedulingInterval"] | ||
if rawScheduleTime == "" { | ||
rawScheduleTime = "1m" | ||
} | ||
scheduleTime, err := time.ParseDuration(rawScheduleTime) | ||
if err != nil { | ||
logger.Error(err, "failed to parse time: %v", rawScheduleTime) | ||
return ctrl.Result{}, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rawScheduleTime := parameters["schedulingInterval"] | |
if rawScheduleTime == "" { | |
rawScheduleTime = "1m" | |
} | |
scheduleTime, err := time.ParseDuration(rawScheduleTime) | |
if err != nil { | |
logger.Error(err, "failed to parse time: %v", rawScheduleTime) | |
return ctrl.Result{}, err | |
} | |
const defaultScheduleTime = time.Minute | |
func getScheduleTime(parameters map[string]string) (time.Duration) { | |
rawScheduleTime := parameters["schedulingInterval"] | |
if rawScheduleTime == "" { | |
return defaultScheduleTime | |
} | |
scheduleTime, err := time.ParseDuration(rawScheduleTime) | |
if err != nil { | |
logger.Error(err, "failed to parse time: %v", rawScheduleTime) | |
return defaultScheduleTime | |
} | |
return scheduleTime | |
} |
Create a constant and a function like above and use it .
ca61302
to
e4bc1d9
Compare
err = r.updateReplicationStatus(instance, logger, getReplicationState(instance), msg) | ||
if err != nil { | ||
return ctrl.Result{}, err | ||
} | ||
|
||
logger.Info(msg) | ||
|
||
scheduleTime := getScheduleTime(parameters, logger) | ||
if requeueForInfo { | ||
logger.Info("volume is primary, requeuing to get volume replication info") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging is not much helpful. Remove it on add the next queue interval time. and also move line 388 inside the if condition
@@ -614,6 +651,30 @@ func (r *VolumeReplicationReconciler) enableReplication(vr *volumeReplicationIns | |||
return nil | |||
} | |||
|
|||
// getVolumeReplicationInfo gets volume replication info. | |||
func (r *VolumeReplicationReconciler) getLastSyncTime(vr *volumeReplicationInstance) (*timestamppb.Timestamp, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should not be doing this. Currently, we have only one parameter in the GetInfo
this looks fine, If we add more parameters we need to write a new function for it. My suggestion is to return a complete response infoResponse
and use it in caller
@@ -45,6 +45,8 @@ type VolumeReplication interface { | |||
// ResyncVolume RPC call to resync the volume. | |||
ResyncVolume(volumeID, replicationID string, force bool, secretName, secretNamespace string, parameters map[string]string) (*proto. | |||
ResyncVolumeResponse, error) | |||
// GetVolumeReplicationInfo RPC call to get volume replication info. | |||
GetVolumeReplicationInfo(volumeID, replicationID string, secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetVolumeReplicationInfo(volumeID, replicationID string, secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error) | |
GetVolumeReplicationInfo(volumeID, replicationID , secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error)``` |
func (rc *replicationClient) GetVolumeReplicationInfo(volumeID, replicationID string, | ||
secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (rc *replicationClient) GetVolumeReplicationInfo(volumeID, replicationID string, | |
secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error) { | |
func (rc *replicationClient) GetVolumeReplicationInfo(volumeID, replicationID, | |
secretName, secretNamespace string) (*proto.GetVolumeReplicationInfoResponse, error) { |
} | ||
|
||
lastsynctime := resp.GetLastSyncTime() | ||
if lastsynctime == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when this can be nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case there is no local snapshot timestamp and the csi driver return nil.
d93d7e8
to
7418c6f
Compare
@@ -51,6 +52,7 @@ const ( | |||
pvcDataSource = "PersistentVolumeClaim" | |||
volumeReplicationClass = "VolumeReplicationClass" | |||
volumeReplication = "VolumeReplication" | |||
defaultScheduleTime = time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every minute sounds very often as a default, how about hourly? Consider that there can be many volumes, and the LastSyncTime may not be critical information. In environments with hundred (to name a random number) volume, users should not run into high resource consumption because of the constant reconciling.
For environments that require more strict updates on syncing, the parameter can be used to change it.
@@ -363,13 +372,34 @@ func (r *VolumeReplicationReconciler) Reconcile(ctx context.Context, req ctrl.Re | |||
} | |||
|
|||
instance.Status.LastCompletionTime = getCurrentTime() | |||
|
|||
var requeueForInfo bool | |||
if instance.Spec.ReplicationState == replicationv1alpha1.Primary { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, on a sync failure (like split-brain), no LastSyncTime should be reported, but ideally an error. Either the automatic rescheduling should stop, or be done with an exponential backoff.
} | ||
lastSyncTime := info.GetLastSyncTime() | ||
|
||
lastsynctime := metav1.NewTime(lastSyncTime.AsTime()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very confusing naming, there is lastSyncTime
and lastsynctime
. If you really need both, make sure to call them differently (maybe ts
or timestamp
and lastSyncTime
or just t
).
This commit adds lastsynctime to volume replicationstatus and updates the crds. Signed-off-by: yati1998 <ypadia@redhat.com>
This commit creates internal grpc to get volume replication information. Signed-off-by: yati1998 <ypadia@redhat.com>
this commit updates the vendor to get latest spec. Signed-off-by: yati1998 <ypadia@redhat.com>
42af0cf
to
f0c7f15
Compare
@nixpanic @ShyamsundarR @Rakshith-R @Madhu-1 done with all the changes here as well. |
if instance.Spec.ReplicationState == replicationv1alpha1.Secondary { | ||
instance.Status.LastSyncTime = nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go at line 382,
otherwise there will be no update to status.
f0c7f15
to
31c190b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the documentation for the new parameter needs to be added. Otherwise it looks good to me.
This commit adds reconcile logic to update the last sync time. Signed-off-by: yati1998 <ypadia@redhat.com>
31c190b
to
07aff99
Compare
If the RPC is not implemented, this will be a breaking change. we need to address it later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yati1998 ,can you open an issue to track handling NotImplemented Error ?
We can set lastSyncTime to nil if getvolrepInfo returns this error.
We'll need to fix this before another release of csi-addons operator.
Syncing latest changes from upstream main for kubernetes-csi-addons
This PR adds lastsynctime to volumereplication status and updates the reconcile logic to get the LastSynctime