Skip to content
This repository has been archived by the owner on Mar 30, 2023. It is now read-only.

feature: Stuck unmounts on clients should cause a node to be downed where this is not achieved within a desired time #49

Open
sjr20 opened this issue Jan 9, 2019 · 2 comments
Labels
enhancement New feature or request

Comments

@sjr20
Copy link
Collaborator

sjr20 commented Jan 9, 2019

No description provided.

@sjr20
Copy link
Collaborator Author

sjr20 commented Jan 9, 2019

Occasionally (and in general) lustre unmounts get stuck on clients and the only option is to reboot/reset the client. This failure condition should be detected and handled by the DAC - e.g., if the unmount is not successful within a certain period, SLURM should mark the node down and the associated DAC storage should be considered to be still busy. Note that lazy unmounting (umount -l) may conceal this type of problem until later.

@JohnGarbutt
Copy link
Collaborator

OK, step one, lets spot this problem. I have removed "-l".

@JohnGarbutt JohnGarbutt changed the title Stuck unmounts on clients should cause a node to be downed where this is not achieved within a desired time feature: Stuck unmounts on clients should cause a node to be downed where this is not achieved within a desired time Aug 6, 2019
@JohnGarbutt JohnGarbutt added the enhancement New feature or request label Aug 6, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants