-
Notifications
You must be signed in to change notification settings - Fork 95
Implement the new locking and notification system for vFile #2001
Conversation
d87e690
to
6dbd2aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very big change. I finished half for the first pass. I want to have a code walk through with you together to better understand the code.
return 0, "", false | ||
} else { | ||
// if the service is already stopped | ||
return 0, "", true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why here we return "true" to indicate the service is already stopped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the purpose of this function is to stop the service.
When the service is already stopped, we just return true so the event handlers can proceed to next steps.
string(fromState), string(interimState)) | ||
if !succeeded { | ||
// this handler doesn't get the right to start/stop server | ||
).Debug("Watcher on start trigger returns event ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better change the log level to info, then we can trace it without enable debug log.
// Compare the value of start marker, only one watcher will be able to successfully update the value to | ||
// the new value of start trigger | ||
success, err := e.CompareAndPutIfNotEqual(kvstore.VolPrefixStartMarker+volName, string(ev.Kv.Value)) | ||
if err != nil || success == false { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add log for this error case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a real error. Only one watcher will be able to successfully update the marker.
Other watchers will fail, but they should just return. I will add more comments here.
func (e *EtcdKVS) etcdStopEventHandler(ev *etcdClient.Event) { | ||
log.WithFields( | ||
log.Fields{"type": ev.Type}, | ||
).Debug("Watcher on stop trigger returns event ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the log level to info.
// the new value of stop trigger | ||
success, err := e.CompareAndPutIfNotEqual(kvstore.VolPrefixStopMarker+volName, string(ev.Kv.Value)) | ||
if err != nil || success == false { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add log for this error case.
|
||
// TryLock: try to get a ETCD mutex lock | ||
func (e *EtcdLock) TryLock() error { | ||
session, err := concurrency.NewSession(e.lockCli, concurrency.WithTTL(20)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TTL is 20 seconds, right? Better define a const instead of using hard coded value "20" here.
func (e *EtcdLock) BlockingLockWithLease() error { | ||
log.Debugf("BlockingLockWithLease: key=%s", e.Key) | ||
|
||
session, err := concurrency.NewSession(e.lockCli, concurrency.WithTTL(20)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above.
if err != nil { | ||
msg := fmt.Sprintf("Transactional metadata update failed: %v.", err) | ||
if err == context.DeadlineExceeded { | ||
msg += fmt.Sprintf(swarmUnhealthyErrorMsg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why here we add error for "swarmUnhealthy"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be etcdUnhealthy. Will update.
@@ -57,13 +57,36 @@ const ( | |||
VolPrefixGRef = "SVOLS_gref_" | |||
VolPrefixInfo = "SVOLS_info_" | |||
VolPrefixClient = "SVOLS_client_" | |||
VolPrefixStartTrigger = "SVOLS_start_trigger_" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still not quite clear with VolPrefixStartTrigger and VolPrefixStartMarker. Could you explain when those two values are used?
3ed5632
to
d00c4b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, only have a few comments.
// stop the SMB server | ||
port, servName, succeeded := e.dockerOps.StopSMBServer(volName) | ||
if succeeded { | ||
err = e.updateServerInfo(kvstore.VolPrefixInfo+volName, port, servName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to update server after removing the smb service? I think port and servName should be empty since the service is stopped,right?
kvstore.VolPrefixState+r.Name, err) | ||
log.Error(msg) | ||
grefLock.ReleaseLock() | ||
return volume.Response{Err: msg} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should call stateLock.ClearLock() here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this error is when it's failed to create state lock. Since it's not created, we don't need to clear it.
msg = fmt.Sprintf("Failed to try lock for removing volume %s. Error: %v", | ||
kvstore.VolPrefixState+r.Name, err) | ||
log.Error(msg) | ||
stateLock.ClearLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here should be "stateLock.ReleaseLock()"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we are not successfully get the lock, we do ClearLock instead of ReleaseLock.
lock, err := d.kvStore.CreateLock(kvstore.VolPrefixGRef + name) | ||
if err != nil { | ||
log.Errorf("Failed to create lock for mounting volume %s", kvstore.VolPrefixGRef+name) | ||
return "", err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should call "lock.ClearLock()" before return error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
log.Fields{"volume name": name, | ||
"error": err, | ||
}).Error("Failed to get IP address from docker swarm ") | ||
log.Errorf("Failed to create lock for mounting volume %s", kvstore.VolPrefixGRef+name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should call lock.ClearLock() before return error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The PR is to solve issue #1943
The new design has the following changes: