Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use /dev/uesrfaultfd, when it exists, to create the file descriptor #56

Merged
merged 2 commits into from
Sep 19, 2023

Conversation

bchalios
Copy link
Collaborator

Since kernel 6.1, Linux introduced a new interface to create userfault file descriptors, via a new USERFAULTFD_IOC_NEW ioctl() issued to the /dev/userfaultd newly added device. Creating the file descriptor using this interface does not require the process to have the CAP_SYS_PTRACE capability, which is the case now if it does not define the UFFD_USER_MODE_ONLY flag. Access to /dev/userfaultfd is governed by normal file system permissions. Flags passed to the USERFAULTFD_IOC_NEW ioctl() are the same as the ones passed to the system call.

This PR changes the way we create the file descriptor. If /dev/userfaultfd is present, we will try to create the file descriptor issuing the ioctl() to it. Otherwise, it will fall back to calling the system call.

Expose the USERFAULTFD_IOC ioctl number for `/dev/userfaultfd` device.
This device is only present on kernels >= 6.1, but we expose it
unconditionally so that userfaultfd-sys has the same exports for all
kernels.

Signed-off-by: Babis Chalios <bchalios@amazon.es>
Linux kernel 6.1 introduces a /dev/userfaultfd pseudo-device which
allows creating a UFFD objects by issuing USERFAULTFD_IOC_NEW ioctl() to
the device [1]. This way, a process does not need to have the
CAP_SYS_PTRACE capability for creating UFFD objects that can handle
kernel-triggered page faults. UFFDs created through this interface can
always handle kernel-triggered page faults. Access to the device is
granted through normal filesystem permissions to the device file.

This commit changes the way we create Uffd objects, to first try to
create the file descriptor through /dev/userfaultfd. If the file exists
we will try to create the descriptor using ioctl in this device. If the
device does not exist, we will fall back to the syscall.

Signed-off-by: Babis Chalios <bchalios@amazon.es>
@pchickey
Copy link
Collaborator

This implementation looks great, but I'm not familiar with this new kernel interface. I'll leave this PR open for a little bit so that hopefully someone who knows that interface better can take a look.

@bchalios
Copy link
Collaborator Author

Thanks for the response @pchickey!

Just as additional context, kernel documentation about the API can be found here. Also, crosvm implements the same logic but within their project.

For what is worth, I consumed the API in a PR I opened for Firecracker and everything seems to be working.

@bchalios
Copy link
Collaborator Author

Hey @pchickey. Did you maybe have time to take a look into this? Please, let me know if there's anything I can do to help with the review.

Without this, we are observing issues in Firecracker with kernels 6.1, so it would be nice to have a solution upstream.

Copy link

@maggie-lou maggie-lou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're also facing problems with this issue and this solution LGTM 👍

@pchickey pchickey merged commit 19fe4e7 into bytecodealliance:main Sep 19, 2023
pchickey pushed a commit that referenced this pull request Sep 19, 2023
changes to public API in #56
bindgen bumped in -sys in #55
@pchickey
Copy link
Collaborator

@bchalios I sent you an invite to commit to this repo, and publish the crates. Once you accept you can please approve #57, hit merge, checkout the merged main locally, and publish the crates.

@bchalios
Copy link
Collaborator Author

Wow! Thanks I guess :) I see the CI is failing on main. I assume it is because we don't have access to the /dev/userfaultfd device. I'll fix this before making the release. Does that make sense to you?

@pchickey
Copy link
Collaborator

Oh, I didn't see that until now. Yes, lets get the fix in for handling access denied before the release.

@bchalios
Copy link
Collaborator Author

Weird thing is that the test passed in the PR GitHub action run. I'll take a look tomorrow morning.

bchalios pushed a commit that referenced this pull request Sep 21, 2023
changes to public API in #56
bindgen bumped in -sys in #55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants