Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use force unmount and explicitly unmount bad mount points #183

Conversation

dabradley
Copy link
Collaborator

There have been cases where the logic to cleanup a mount point has caused the driver to get into a bad state. This is most obvious when a subdirectory is mounted to a volume and a parent directory of that subdirectory is deleted. The Lustre driver doesn't handle that case in the way that Kubernetes expects and returns invalid data. To avoid this scenario causing our driver to get into a bad state, leak mount points, etc, we must explicitly check that we can read the necessary information about the mount point, and if not, explicitly unmount that mount point before allowing Kubernetes to clean up the directory. To ensure that we don't end up in a bad state, this change enables force unmounting as well. The force unmount will only occur after a timeout has expired, since force unmounts can cause issues with the Lustre driver. However, in this case, it is better if we are in a bad enough situation to be able to eventually return to a good state rather than require manual intervention.

What type of PR is this?

/kind bug

@dabradley dabradley requested a review from t-mialve October 16, 2024 12:04
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 16, 2024
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 16, 2024
@coveralls
Copy link

coveralls commented Oct 16, 2024

Coverage Status

coverage: 81.844% (-0.9%) from 82.773%
when pulling 028a78b on dabradley:personal/dabradley/cleanupbadmounts
into 4613c77 on kubernetes-sigs:development.

Copy link
Collaborator

@t-mialve t-mialve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 25, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dabradley, t-mialve

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

There have been cases where the logic to cleanup a mount point
has caused the driver to get into a bad state. This is most
obvious when a subdirectory is mounted to a volume and a parent
directory of that subdirectory is deleted. The Lustre driver
doesn't handle that case in the way that Kubernetes expects
and returns invalid data. To avoid this scenario causing our
driver to get into a bad state, leak mount points, etc, we
must explicitly check that we can read the necessary information
about the mount point, and if not, explicitly unmount that
mount point before allowing Kubernetes to clean up the directory.
To ensure that we don't end up in a bad state, this change
enables force unmounting as well. The force unmount will only
occur after a timeout has expired, since force unmounts can
cause issues with the Lustre driver. However, in this case, it
is better if we are in a bad enough situation to be able to
eventually return to a good state rather than require manual
intervention.
@dabradley dabradley force-pushed the personal/dabradley/cleanupbadmounts branch from 186ee0a to 028a78b Compare November 25, 2024 20:28
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 25, 2024
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@dabradley dabradley marked this pull request as ready for review November 25, 2024 20:28
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 25, 2024
@dabradley dabradley merged commit abc5396 into kubernetes-sigs:development Nov 25, 2024
9 of 10 checks passed
@dabradley dabradley deleted the personal/dabradley/cleanupbadmounts branch November 25, 2024 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants