-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsfs | wait for endpoint startup before namespace monitor registration #8474
nsfs | wait for endpoint startup before namespace monitor registration #8474
Conversation
02e747f
to
b4ee9d0
Compare
client: internal_rpc_client, | ||
should_monitor: nsr => Boolean(nsr.nsfs_config), | ||
})); | ||
setTimeout(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not add a retry in the namespace monitor if the error was ENOENT and start time of the endpoint is lower than 60 seconds? also why 60 seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like more code with equivalent solution.
60 seconds to allow pod to stabilize (get to 'ready' or be deleted).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason is understood, but how do you know it takes a minute? is it always a minute? should it be configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, there's no way for node to know status of the pod.
I assume we don't want to make such a dependency.
60 seconds is more than enough on my minikube and is low enough not to bother other deployments in other envs, but I can make it an env variable.
Signed-off-by: Amit Prinz Setter <alphaprinz@gmail.com>
b4ee9d0
to
7ec7799
Compare
Explain the changes
Wait for endpoint startup before registering namespace resource monitor.
Issues: Fixed #xxx / Gap #xxx
Nsr can enter "Rejected" status if endpoint is deleted by kubernetes before it is in "Ready" state.
https://bugzilla.redhat.com/show_bug.cgi?id=2284585.
Testing Instructions:
Repoduction
Sagie details scenario for ODS in bz.
I've reproduced with this scenario on minikube:
Start with a nsfs nsr on a pvc. A single endpoint A is in "Ready" state.
-Delete nsr. A new endpoint B is being spun.
-While endpoint B is done creating but NOT ready yet (endpoint A is still in "Ready" state), create nsr.
-Kubernetes will delete endpoint B and will leave endpoint A running.
-Endpoint B loads nsr from system store, but nsr is not mounted on it. Endpoint B issues NO_SUCH_BUCKET report on nsr (and then it is deleted by kubernetes).
local_nsfs.yaml.txt