-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance: Update io.DEFAULT_BUFFER_SIZE to make python IO faster? #117151
Comments
some benchmarking code I used to debug download and write performance. import io
import os
import platform
import requests
import sys
import time
def download_file(run, url, filepath, chunksize, buffersize):
if os.path.exists(filepath):
os.remove(filepath)
calls = 0
start = time.perf_counter()
write_duration = 0.0
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(filepath, 'wb', buffering=buffersize) as f:
st_blksize = os.stat(filepath).st_blksize
for chunk in r.iter_content(chunk_size=chunksize):
calls = calls + 1
t1 = time.perf_counter()
f.write(chunk)
t2 = time.perf_counter()
write_duration = write_duration + (t2 - t1)
end = time.perf_counter()
function_duration = end - start
print(
"run={} filepath={} total_duration={} download_chunksize={} write_duration={} write_buffersize={} calls={} st_blksize={}".format(
run, filepath, function_duration, chunksize, write_duration, buffersize, calls, st_blksize
))
def main():
print("python {} running on {}".format(sys.version, platform.platform()))
NUMPY_WHEEL = "https://example.com/numpy/1.21.6/numpy-1.21.6-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl"
for run in range(0, 10):
for download_directory in [os.path.abspath(".")]:
for url in [NUMPY_WHEEL]:
download_path = os.path.join(download_directory, url.rsplit("/", 1)[1])
for download_chunksize in [512, 1024, 2048, 4096, 8192,
10000, 16384, 32768, 65536, 131072, 262144, 524488,
1048576, 2097152, 4194304, 8388608, 16777216]:
for file_buffersize in [0, 4096, 8192, 65536, 262144, 1048576]:
download_file(run, url, download_path, download_chunksize, file_buffersize)
if __name__ == "__main__":
main() |
I think your argument makes sense, consumer RAM sizes have more than quadrupled in the past 16 years IIRC, so it shouldn't hurt to increase buffer sizes. I cannot champion this though, because I am currently wrapped up in too many things. Sorry. |
AFAIK SSDs have 4 to 8k pages. An SSD block contains up to 256 pages. The NVMe capabilities of the drive are also a factor, as an NVM command can generally transfer a multiple of the page size. |
… are equal to the buffer size. avoid extra memory copy. BufferedWriter() was buffering calls that are the exact same size as the buffer. it's a very common case to read/write in blocks of the exact buffer size. it's pointless to copy a full buffer, it's costing extra memory copy and the full buffer will have to be written in the next call anyway.
…ER_SIZE to 128k, fix open() to use max(st_blksize, io.DEFAULT_BUFFER_SIZE) performance:
I can confirm this improves performance. @morotti Could you open a PR? |
…ER_SIZE to 128k, fix open() to use max(st_blksize, io.DEFAULT_BUFFER_SIZE) performance:
…ER_SIZE to 128k, fix open() to use max(st_blksize, io.DEFAULT_BUFFER_SIZE)
…he buffer size (GH-118037) BufferedWriter() was buffering calls that are the exact same size as the buffer. it's a very common case to read/write in blocks of the exact buffer size. it's pointless to copy a full buffer, it's costing extra memory copy and the full buffer will have to be written in the next call anyway. Co-authored-by: rmorotti <romain.morotti@man.com>
@eendebakpt I opened a PR, can you review? |
Yes, i'll have a look in a couple of days. |
… to 256k. it was set to 16k in the 1990s. it was raised to 64k in 2019. the discussion at the time mentioned another 5% improvement by raising to 128k and settled for a very conservative setting. it's 2024 now, I think it should be revisited to match modern hardware. I am measuring 0-15% performance improvement when raising to 256k on various types of disk. there is no downside as far as I can tell. this function is only intended for sequential copy of full files (or file like objects). it's the typical use case that benefits from larger operations. for reference, I came across this function while trying to profile pip that is using it to copy files when installing python packages.
…ER_SIZE to 128k, fix open() to use max(st_blksize, io.DEFAULT_BUFFER_SIZE)
In what cases |
I see it larger than 128kB on NFS network filesystems, like in the benchmark I submitted above. It can be seen on any filesystem where a larger block was set. It's a free setting when the filesystem is created. I think most filesystems XFS/ZFS/EXT4 allow to set any block size from 4k to 1M or so. I think more than 128k can be seen for some RAID setups with enough large disks. Microsoft recommends 64k block size for windows server 2019+, 4k block size is limited to 16 TB volumes, 64k block size is limited to 256 TB volume. The block size can be set up to 2M. I think it should be visible as well for s3 filesystems, but I don't have one to test anymore. Basically, anything involving large disks, storage appliances, network and specialized filesystems. |
…ER_SIZE to 128k, fix open() to use max(st_blksize, io.DEFAULT_BUFFER_SIZE)
… to 256k. it was set to 16k in the 1990s. it was raised to 64k in 2019. the discussion at the time mentioned another 5% improvement by raising to 128k and settled for a very conservative setting. it's 2024 now, I think it should be revisited to match modern hardware. I am measuring 0-15% performance improvement when raising to 256k on various types of disk. there is no downside as far as I can tell. this function is only intended for sequential copy of full files (or file like objects). it's the typical use case that benefits from larger operations. for reference, I came across this function while trying to profile pip that is using it to copy files when installing python packages.
…ER_SIZE to 128k, fix open() to use max(st_blksize, io.DEFAULT_BUFFER_SIZE)
…ER_SIZE to 128k, fix open() to use max(st_blksize, io.DEFAULT_BUFFER_SIZE)
…ER_SIZE to 128k, adjust open() to use max(st_blksize, io.DEFAULT_BUFFER_SIZE)
…ER_SIZE to 128k, adjust open() to use max(st_blksize, io.DEFAULT_BUFFER_SIZE)
…at implementations This is especially important for Lustre after disabling the buffering for calls through FUSE in the earlier commit. Without this, we would now have 8K reads to Lustre that has huge latencies and should function more optimally with 4 MiB reads. See also the proposal to increase the default buffer size of 8K: python/cpython#117151
…ER_SIZE to 128k, adjust open() to use max(st_blksize, io.DEFAULT_BUFFER_SIZE)
…6k. (GH-119783) * gh-117151: increase default buffer size of shutil.copyfileobj() to 256k. it was set to 16k in the 1990s. it was raised to 64k in 2019. the discussion at the time mentioned another 5% improvement by raising to 128k and settled for a very conservative setting. it's 2024 now, I think it should be revisited to match modern hardware. I am measuring 0-15% performance improvement when raising to 256k on various types of disk. there is no downside as far as I can tell. this function is only intended for sequential copy of full files (or file like objects). it's the typical use case that benefits from larger operations. for reference, I came across this function while trying to profile pip that is using it to copy files when installing python packages. * add news --------- Co-authored-by: rmorotti <romain.morotti@man.com>
Bug report
Bug description:
Hello,
I was doing some benchmarking of python and package installation.
That got me down a rabbit hole of buffering optimizations between between pip, requests, urllib and the cpython interpreter.
TL;DR I would like to discuss updating the value of io.DEFAULT_BUFFER_SIZE. It was set to 8192 since 16 years ago.
original commit: https://github.com/python/cpython/blame/main/Lib/_pyio.py#L27
It was a reasonable size given hardware and OS at the time. It's far from optimal today.
Remember, in 2008 you'd run a 32 bits operating system with less than 2 GB memory available and to share between all running applications.
Buffers had to be small, few kB, it wasn't conceivable to have buffer measured in entire MB.
I will attach benchmarks in the next messages showing 3 to 5 times write performance improvement when adjusting the buffer size.
I think the python interpreter can adopt a buffer size somewhere between 64k to 256k by default.
I think 64k is the minimum for python and it should be safe to adjust to.
Higher is better for performance in most cases, though there may be some cases where it's unwanted
(seek and small read/writes, unwanted trigger of write ahead, slow devices with throughput in measured in kB/s where you don't want to block for long)
In addition, I think there is a bug in open() on Linux.
open() sets the buffer size to the device block size on Linux when available (st_blksize, 4k on most disks), instead of io.DEFAULT_BUFFER_SIZE=8k.
I believe this is unwanted behavior, the block size is the minimal size for IO operations on the IO device, it's not the optimal size and it should not be preferred.
I think open() on Linux should be corrected to use a default buffer size of
max(st_blksize, io.DEFAULT_BUFFER_SIZE)
instead ofst_blksize
?Related, the doc might be misleading for saying st_blksize is the preferred size for efficient I/O. https://github.com/python/cpython/blob/main/Doc/library/os.rst#L3181
The GNU doc was updated to clarify: "This is not guaranteed to give optimum performance" https://www.gnu.org/software/gnulib/manual/html_node/stat_002dsize.html
Thoughts?
Annex: some historical context and technical considerations around buffering.
On the hardware side:
On filesystems:
On network filesystems:
host:path on path type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,acregmin=60,acdirmin=60,hard,proto=tcp,nconnect=8,mountproto=tcp, ...)
On pipes:
on compression code, they probably all need to be adjusted:
On network IO:
on HTTP, a large subset of networking:
note to self: remember to publish code and result in next message
CPython versions tested on:
3.11
Operating systems tested on:
Other
Linked PRs
The text was updated successfully, but these errors were encountered: