Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: properly close fd when return-fd op cancelled failed #318

Merged
merged 6 commits into from
Nov 6, 2024

Conversation

ihciah
Copy link
Member

@ihciah ihciah commented Oct 30, 2024

What Problem This PR Fixed

In a readiness-based synchronous non-blocking I/O model, canceling a pending I/O call is side-effect-free because the syscall hasn’t actually been executed. In other words, the syscall’s execution state is definitive: it has either been executed or not, with no intermediate state (such as “in execution”).

However, with asynchronous I/O like io-uring, canceling I/O and executing I/O are no longer mutually exclusive. When canceling an I/O operation, it may either be successfully canceled or already completed.

Previously, we used traits like AsyncReadRent/AsyncWriteRent, which transfer buffer ownership, to address buffer memory safety. When a Future is dropped (indicating I/O cancellation), the buffer is transferred to a global state and only cleaned up when the corresponding CQE returns.

I also designed a CancellableIO trait, which works by first pushing a CancelOp when canceling cancellable I/O and then awaiting the original Op’s CQE, expecting it to return with either a cancellation success or the syscall result within a short timeframe.

For standard I/O interfaces, when cancellation occurs, the runtime will push a CancelOp. However, this alone is insufficient: CancelOp does not guarantee that the Op is canceled. If the Op returns a file descriptor (fd), we cannot ignore the possibility of a failed cancellation. Similar to the previous solutions, we need to await the Op result and close its corresponding fd.

Solution

Once we get fd from the kernel, convert it to a wrapped type instantly, and this wrapped type can guarntee the fd is closed when dropped.

How to know if the result in the CQE is fd? We can only know it with user_data. Since a slab lookup is essential, saving a is_fd mark inside is the cheapest way I can think about. One may think about putting logic inside the op related bottom half, but in this way fd may leak if the future not be polled. We can only wrap it with RAII when calling tick, which means a runtime cost is unavoidable.

Affected Cases

Accept and Open op are affected. If users use these ops with timeout(note the timeout may also allpied to the outer layer future), or with any style which not poll the op until ready, and use io_uring as the backend, then they may be affected, there may be fd leaks when the future cancelled.

epoll/kqueue/iocp backends are not affected.

Further reading

@ethe wrote an article explains this issue in more detail: Async Rust is not safe with io_uring

Additional Notes

This PR also fixed another minor issue.
Some background:

  1. When close fd, with io_uring backend, it will submit a Close Op and leave without waiting.
  2. When Op drops, it will remove its related state. And if the state is unfinished and async-cancel feature is enabled, the runtime will try to cancel the op by pushing a AsyncCancel op.

Problem: The close op will be cancelled because of dropped, which is not what we want.
Solution: So I added a const bool for each op to hint if the cancel is desired(intended leaving without waiting or not want the op executed any more). For Close op, it is false and for others like Read op, it is true.

@ihciah ihciah requested a review from ethe October 30, 2024 13:59
@ihciah ihciah self-assigned this Oct 30, 2024
@ihciah ihciah force-pushed the fix-lifecycle branch 5 times, most recently from 5ed4fd3 to 7f00dd4 Compare October 30, 2024 14:50
@ihciah ihciah force-pushed the fix-lifecycle branch 3 times, most recently from b2e8dbe to 221708e Compare October 31, 2024 08:37
@ihciah ihciah requested a review from Lzzzzzt November 4, 2024 03:25
monoio/src/driver/util.rs Show resolved Hide resolved
monoio/src/driver/op.rs Show resolved Hide resolved
monoio/src/driver/op/connect.rs Show resolved Hide resolved
monoio/src/driver/op/poll.rs Show resolved Hide resolved
Copy link
Collaborator

@Lzzzzzt Lzzzzzt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@har23k har23k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @ethe & @ihciah!

@ihciah ihciah merged commit 704e7d5 into bytedance:master Nov 6, 2024
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants