Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Linux] Support async I/O with uring / liburing #12650

Open
damageboy opened this issue May 7, 2019 · 13 comments
Open

[Linux] Support async I/O with uring / liburing #12650

damageboy opened this issue May 7, 2019 · 13 comments
Milestone

Comments

@damageboy
Copy link
Contributor

It seems that Linux finally has a good story when it comes to async I/O: io_uring

While this is only released as part of Kernel 5.1, it definitely looks like a game changer when it comes to async I/O perf...

There's no point in going into what io_uring brings to the table in this issue, as it should be pretty clear from the linked PDF document, it is worth while to mention that this allows for some super high-perf scenarios by using advanced features such as:

  • Pre-registering I/O related file descriptors to avoid expensive kernel-side (!) reference counting
  • Pre-registering fixed buffers (in conjunction with O_DIRECT) to avoid expensive page-table manipulation on the kernel side (!)
  • Using Polled I/O to entirely avoid system calls when reading/writing data
  • Using batched operations

Some initial tests for File I/O from nodejs point to very substantial latency reductions (I'm "reprinting" the latency numbers from the linked comment):

I think that supporting this in CoreCLR can lead to substantial improvemnt of async I/O in Linux...
At the same time, its not clear to me how/if/when CoreCLR should adopt this and at what abstraction level...

@benaadams
Copy link
Member

/cc @tmds

@omariom
Copy link
Contributor

omariom commented May 8, 2019

@benaadams Looks similar to Windows RIO?

@damageboy
Copy link
Contributor Author

@omariom In spirit it definitely is, but there are a few key differences:

  • Not limited to sockets (Unlike RIO), can also do storage and really anything that is an FD
    (Can also read from arbitrary offests inside the file)
  • Can be much less chatty, all the way to completely syscall-less I/O
  • Not limited to predefined buffers (e.g. Registered buffers) like RIO, although, when that is used (in io_uring paralance: fixed buffers) that can vastly improve perf and reduce latency.
  • Lastly, if I'm not mistaken, RIO's API is somewhat limiting in the interaction between request-queues (RQ) and completion-queues (CQ) in the sense that (IIRC) the completion queues have a finite size, and the total amount of requests in the RQs associated with a given CQ cannot exceed the size of the CQ. I'm less sure about this last one, but as far as I can tell, io_uring is more ad-hoc in this respect and somewhat more dynamic in its nature...

@benaadams
Copy link
Member

Can be much less chatty, all the way to completely syscall-less I/O

Is also one of the options of RIO. Does have a flavour of completion ports e.g. async with fast-path sync completion if data is already ready, rather than having to do callback.

Extending the registration to all IO is good; especially with advancements in throughput e.g. NVMe.

Definitely interesting!

@omariom
Copy link
Contributor

omariom commented May 8, 2019

In apps consisting of microservices
large connection buffers is less an issue and low latency with low jitter are valued more than raw throughput.
So it would be great to have both io_uring and RIO in Kestrel. Kestrel is used by gRPC which is the most popular transport for communication between microservices.

@zbjornson
Copy link

zbjornson commented May 8, 2019

(I'm the one who was playing with io_uring in Node.js/libuv, mentioned in the OP; sharing some notes here.)

io_uring is more like Windows' "overlapped IO" with IOCP than RIO in terms of usability with files.

all the way to completely syscall-less I/O

(Referring to kernel-side polling of the submission queue) Note that this requires root, torvalds/linux@3ec482d#diff-a196e54ec8b5398427f9df3d2b074478.

RIO's API is somewhat limiting ... in the sense that (IIRC) the completion queues have a finite size...

io_uring still has fixed-sized SQs and CQs. It's immediately safe to use a SQ slot once io_uring_enter/submit returns (before the kernel is done processing it). There's a tiny bit of info on what happens with full CQs in http://git.kernel.dk/cgit/liburing/commit/?id=76b61ebf1bd17d3a31c3bf2d8236b9bd50d0f9a8 but I'm still uncertain what happens if you submit more events and e.g. never drain the CQ.

since the sqe lifetime is only that of the actual submission of it, it's possible for the application to drive a higher pending request count than the SQ ring size would indicate. The application must take care not to do so,or it could risk overflowing the CQ ring. By default, the CQ ring is twice the size of the SQ ring. This allows the application some amount of flexibility in managing this aspect, but it doesn't completely remove the need to do so. If the application does violate this restriction, it will be tracked as an overflow condition in the CQ ring. More on that later.

but I can't find the "later" part :). I assume the CQE just gets overwritten.

@zbjornson
Copy link

And here's the answer on CQ overflow: https://twitter.com/axboe/status/1126203058071826432

CQEs do not get overwritten, cqring.overflow just increments. The app has to be grossly negligent to trigger that, as the CQE ring is twice the SQE ring. If cqring.overflow is ever != 0, the app has failed.

@tmds
Copy link
Member

tmds commented May 10, 2019

Is there some info on how you use this with sockets? In particular how to deal with blocking calls.
Should you add a blocking read/write to io_uring and then check for it's completion?
Or do you need to use io_uring for polling? And then when readable/writable add non-blocking reads/writes to it?
Or something else?

@tmds
Copy link
Member

tmds commented Jun 27, 2019

Is there some info on how you use this with sockets? In particular how to deal with blocking calls.
Should you add a blocking read/write to io_uring and then check for it's completion?
Or do you need to use io_uring for polling? And then when readable/writable add non-blocking reads/writes to it?
Or something else?

To answer my own question. You can use io_uring like epoll in one-shot mode. There is a command to add poll for a fd, and there is a command to cancel an on-going poll.

@tmds
Copy link
Member

tmds commented Dec 16, 2019

When looking into io_uring we also need to consider what operations are privileged, and what kernel resources are needed.

To have wide applicability, it should work in a Kubernetes container deployment.

CQEs do not get overwritten, cqring.overflow just increments. The app has to be grossly negligent to trigger that, as the CQE ring is twice the SQE ring.

If you're writing to disk, you can control this.
I wonder if this doesn't become an issue if you use io_uring for sockets. If you have a lot of polls going on for idle connections, maybe some activity on those sockets can get you into a cqe overflow.

@isilence
Copy link

isilence commented Jan 12, 2020

CQEs do not get overwritten, cqring.overflow just increments. The app has to be grossly negligent to trigger that, as the CQE ring is twice the SQE ring.

If you're writing to disk, you can control this.
I wonder if this doesn't become an issue if you use io_uring for sockets. If you have a lot of polls going on for idle connections, maybe some activity on those sockets can get you into a cqe overflow.

It does this smarter now and won't drop anything. Since 5.5, I believe.
see 1d7bb1d50fb4dc14 ("io_uring: add support for backlogged CQ ring") or here

Cc: @axboe

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 26, 2020
@JeremyKuhne JeremyKuhne removed the untriaged New issue has not been triaged by the area owner label Mar 3, 2020
@adamsitnik
Copy link
Member

At the same time, its not clear to me how/if/when CoreCLR should adopt this and at what abstraction level

With the recent FileStream refactoring it should be now much easier to implement it for file IO. I've created a new issue with all the details: #51985

@curiousdev
Copy link

For reference, looks like other web-frameworks are adopting this. LibUV which powers nodeJs just picked up this support.

https://www.phoronix.com/news/libuv-io-uring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests