Revisit the error handling for vFile design #1943

luomiao · 2017-10-18T00:18:07Z

Current design has a flaw when handling multiple errors in ETCD updating at the same time.
Consider the following scenario:

Worker creates a new volume and start to use it, internally global refcount is atomic increased by one.
All the managers are triggered to try to update the state from "Ready" to "Mounting". Assume ETCD cluster is not accessible suddenly, and thus the update of the state is failed. State of the volume will stay as "Ready"
Worker will timeout when waiting for state to be "Mounted". As an error handling, worker will try to reduce the global refcount. However the ETCD cluster is not accessible again, and the worker may fail to reduce the global refcount and returned error back.
Now the volume's state is "Ready" while the global refcount of the volume is 1. From now on, even ETCD cluster is back to normal, no one is able to use this volume correctly, since the global refcount won't be changed from 0 to 1, which means no filer server is able to be started for it.

To handle the above race condition, the design needs to be adjusted to introduce locks/helper threads to check the unmatched global refcount and users.

luomiao · 2017-12-01T01:38:19Z

Closed by #2001

luomiao self-assigned this Oct 18, 2017

luomiao added P0 component/vFile labels Oct 18, 2017

tusharnt added this to the Sprint - Thor milestone Oct 24, 2017

luomiao mentioned this issue Nov 28, 2017

Implement the new locking and notification system for vFile #2001

Merged

tusharnt modified the milestones: Sprint - Thor, Sprint - Kubecon Nov 28, 2017

luomiao closed this as completed Dec 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit the error handling for vFile design #1943

Revisit the error handling for vFile design #1943

luomiao commented Oct 18, 2017

luomiao commented Dec 1, 2017

Revisit the error handling for vFile design #1943

Revisit the error handling for vFile design #1943

Comments

luomiao commented Oct 18, 2017

luomiao commented Dec 1, 2017