-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
io_uring to improve throughput in the background thread #549
Comments
Hey thanks for the ideas! it can be easily implemented as a stand alone sink It’s something i was considering to add 1-2 months ago. I put it on the back burner after reading this blog post https://notes.eatonphil.com/2023-10-19-write-file-to-disk-with-io_uring.html I will try to implement it at some point and see if it actually makes any difference |
The results in the post were surprising, so I ran a benchmark of my own using System Config
I have not done additional optimizations like CPU isolation, or disabling io-scheduling, or setting irq affinity Benchmark Config
Results
Summary
These numbers show |
I've opened a draft PR; feel free to take a look and let me know your thoughts. Other things I tried:
Results on my RHEL9 box with the current PR: Without
With
|
Thank you for submitting this pull request. I've conducted testing on Ubuntu 24.04 LTS to evaluate the performance. The io_uring-based implementation shows approximately 6% lower throughput compared to the standard version. I'm currently investigating the cause of this performance discrepancy and will share my findings. Regarding the code structure, I suggest implementing an option in Please let me know if you have any thoughts about this. |
The benchmark numbers above were from before the latest commits on master. With recent improvements I introduced in the backend worker, it seems that the sync IO performance has now caught up with the performance of io uring sink... I created this PR as a proof of concept to demonstrate how it could be implemented and to evaluate if it's worth adding. One reason I’m hesitant to include it directly in |
These are the numbers i am currently getting after syncing this PR with master branch Without
With
i have also made the benchmark more accurate as the backend thread needs to spend a few seconds on the first log message to calculate the rdtsc and that was previously included in the latency. Thats why you see a different number |
Thanks for pointing this out. I hadn't noticed the additional io_uring thread. I assumed that io_uring operates in an OS thread not on the isolated cores. This also explains why the busy-wait polling in io_uring was making things slower. |
I've tried using The change was to add these lines before registering buffers in the constructor
|
We could potentially improve the throughput of the logger by implementing io_uring in the background thread.
Proposed changes:
Expected benefits:
Considerations:
Do you think this is something worth pursuing?
The text was updated successfully, but these errors were encountered: