-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement an ntci::Proactor using io_uring #24
Conversation
{ | ||
const long k_SYSTEM_CALL_SETUP = 425; | ||
|
||
return static_cast<int>(::syscall(k_SYSTEM_CALL_SETUP, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be a good idea to write a comment with the actual system call function name, like int io_uring_setup(u32 entries, struct io_uring_params *p)
in this case.
Also applicable to other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UPD. I got it that there are no glibc wrappers yet for io_uring syscalls, so the comment is not really applicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
groups/ntc/ntco/ntco_ioring.cpp
Outdated
ring, | ||
k_SYSTEM_CALL_REGISTER_PROBE, | ||
probe, | ||
256)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to replace with IoRingProbe::k_ENTRY_COUNT
or pas as a parameter to the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
|
||
NTCO_IORING_LOG_CREATED(d_ring); | ||
|
||
rc = ntco::IoRingUtil::probe(d_ring, &d_probe); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it can be a good idea to validate that operations like SENDMSG
(ano other 100% required) are supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea but will require a non-trivial amount of work. I've created #67 for this.
There are multiple structures which are defined manually so that this code can be built on machines with old Linux kernel. But there are no tests which validate that these structures are defined properly. I think it can be done in compile time (maybe in the test driver) if the library is built on a system with modern kernel. |
groups/ntc/ntco/ntco_ioring.cpp
Outdated
|
||
bool IoRingCompletion::isValid() const | ||
{ | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This method seems to be unused
- Maybe you've wanted to do something like
return d_userData != 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed this function.
groups/ntc/ntco/ntco_ioring.cpp
Outdated
d_operation = static_cast<bsl::uint8_t>(ntco::IoRingOperation::e_SENDMSG); | ||
d_handle = handle; | ||
d_event = reinterpret_cast<__u64>(event); | ||
d_flags = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach to explicitly zero d_flags
here and in other places does not look consistent.
IoRingSubmission
members are zeroed in its constructor. And despite that fact, d_flags
is zeroed here one more time, while other members aren't. I think it's better to remove explicit zeroeing of d_flags
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
A comment probably mainly for myself: I am reading https://kernel.dk/io_uring.pdf and there is such statement:
I haven't yet found if there is some protection implemented here |
// has previously registered the 'waiter'. | ||
void wait(ntci::Waiter waiter); | ||
|
||
// Acquire usage of the most suitable proactor selected according to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we use ///
to write contracts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The '///' is only supposed to be for comments on public declarations. Any '///' that occurs in an implementation is a (harmless) mistake, most of the time caused by the tool previously used to move comments around to conform to the latest documentation strategy.
NTCO_IORING_LOG_EVENT_STARTING(event); | ||
|
||
ntco::IoRingSubmissionMode::Value mode; | ||
if (NTCCFG_LIKELY(isWaiter())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please elaborate why you consider this condition likely to be true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably an unjustified usage of the branch prediction annotation. But the observation is this function is likely to be called by an I/O thread process a socket, timer, or previously deferred timer and that that wants to defer another function or schedule another timer.
groups/ntc/ntco/ntco_ioring.cpp
Outdated
|
||
bool IoRingDevice::supportsCancelByHandle() const | ||
{ | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it always return true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to wait until we merged standard support for the run-time detection of the operation system kernel version in ntscfg_platform
. I have properly implemented detection of this type operation now.
|
||
// Cancel all outstanding operations initiated for the specified | ||
// 'socket'. Return the error. | ||
ntsa::Error cancel(const bsl::shared_ptr<ntci::ProactorSocket>& socket) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a subject of the review, but it looks like that cancellation should be treated as async operation, right? I mean that ideally ntci::ProactorSocket
would have a method like processCancelled(...)
to be notified when all operations are cancelled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are correct. I've created a new issue for this work: #68.
This PR adds initial support for an
ntci::Proactor
implemented usingio_uring
on Linux.io_uring
is an alternative system call interface that allows user space to initiate certain system calls in batches , to execute asynchronously, and learn about completed system calls in batches, attempting to reduce the system call overhead required for, e.g. one system call persendmsg
,recvmsg
, etc. Userspace may learn about completed system calls without a system call itself, or it can make a system call to block until a minimum number of previously submitted system calls complete. A more thorough description ofio_uring
can be found in its Introduction.In the design of NTC, an
ntci::Proactor
is an abstraction of an interface to an operating system whereby a user "proactively" initiates socket operations (e.g. connect, accept, send, receive) and is asynchronously notified when those operations complete. This PR implements thentci::Proactor
abstract class usingio_uring
.The
io_uring
interface to the kernel consists of a set of three system calls, one to create a ring, one to configure/control a ring, and one to submit operations/reap completions. Userspace writes system call submissions to a ring buffer and then performs a system call to notify the kernel to drain the submission queue. The kernel writes the status of completed system calls to a separate ring buffer, which is drained by userspace by a system call. Userspace may submit and reap completions simultaneously. The submission queue ring buffer and completion queue ring buffer is memory mapped into userspace according to the ring buffer configuration specified by the kernel when the ring is created. Given this design, it is possible for userspace to utilizeio_uring
functionality with nothing by support forsyscall
andmmap
by libc. This implementation leverages these characteristics so that it may be built on any Linux machine, even those machines that do not supportio_uring
(such as older kernels.) To achieve this, this implementation eschews both thelinux/io_uring.h
header and any support library, such as liburing.io_uring
is conceptually simple but has many aspects that make it challanging to figure out how to use correctly and efficiently:io_uring
has evolved significantly since its first release, gaining many features over time. Not all kernel versions support all features. Some features can be detected withio_uring
's capabilities system, but not all (e.g. support for certain flags when submitting operations.) This implementation will probably work on Linux kernel versions >= 5.11 (it has only been tested on 6.2) but future work is necessary to support and test all functionality on older kernels. That is left for a future PR.The principle argument for
io_uring
seems to be the efficiency gains for submittingrecvmsg
andsendmsg
in batches, avoiding the system call overhead per-operation. While batchingrecvmsg
may make some sense, it is unclear why batchingsendmsg
calls makes sense: this is trading for greater system call efficiency by increasing latency and reducing throughput. Moreover, anntci::Proactor
is designed to be safely used by multiple threads concurrently callingntci::Proactor::send
,ntci::Proactor::receive
, andntci::Proactor::wait
. If a thread is blocked inwait
for at least one event to complete, another thread callingsend
orreceive
must submit and enter the ring immediately, otherwise the kernel never learns about the submission, never initiates the operation, which then never completes, and the thread blocked inwait
remains blocked forever. This implementation chooses to always submitsendmsg
operations and immediately enter the ring, and only defers entering the ring for other operations when it can be safely determined that only one thread will be callingntci::Proactor::wait
and that thread is the one callingntci::Proactor::send
,ntci::Proactor::receive
, etc.io_uring
has evolved support for a number of ways of timing out when waiting for operation to complete. Each are a bit clumsy to use and inconsistent in what they support: userspace may submit a "timeout" operation with a relative timeout or (on some kernels) an absolute timeout, with (on some kernels) a specified clock type. Such "timers" need to constantly removed or at least re-submitted. New kernels support a direct specification of a timeout when entering the ring, but it is not clear if absolute timeouts are or will be supported, or any clock type other thanCLOCK_MONOTONIC
.Very recent kernels support canceling all pending operations for a file descriptor, but older kernels do not. On older kernels, this implementation must maintain a data structure of all pending operations for a socket, so they can be cancelled when the user wishes to close the socket.
It is unclear if two threads can enter the ring simultaneously, where both are instructing the kernel to drain the submission queue. This implementation relies on that being safe, which should be reviewed or revisited in later improvements. If two threads cannot enter a ring simultaneously to submit, it is unclear how to use
io_uring
from multiple threads at all: the obvious (but perhaps naive) solution would be to maintain a thread safe copy of the submission queue, that sits in front of the ring's submission queue, where threads initiating operations perform a thread-safe push onto this queue, then "kick" the ring with a "no-op" operation, waking up a thread blocked on entering the queue, so that thread can go right back into entering the queue again to actually submit the operations to kernel. This seems contradictory to the design and goals ofio_uring
.The implementation of
ntco_ioring
is lightly tested in its component test driver, but is folded into the NTC proactor framework and more thoroughly tested in thentcf_system
test driver.For now, I consider support for
io_uring
in NTC as experimental. As such, it is disabled by default in the build configuration. Builders may enable it by specifyingconfigure --with-ioring
during the configuration step of the build process. Further support is detected at runtime; if a machine does not support theio_uring
system calls the "ioring" proactor will be automatically disabled.Initial performance tests are underwhelming. The existing
epoll
backend out-performs thisio_uring
backend for most workloads, though a small few do showio_uring
winning on latency and throughput. Future work should be planned to optimize this implementation.