Skip to content

Commit

Permalink
Improve upload_fileobj performance
Browse files Browse the repository at this point in the history
Changing the file reader buffer from `bytes` to `bytearray` significantly reduces CPU usage. Using `bytes` is inefficient because it's immutable: you get the classic string building problem, where repeatedly appending to an immutable sequence requires O(n^2) operations.

From [the python docs](https://docs.python.org/3/library/stdtypes.html#common-sequence-operations):
> if concatenating bytes objects, you can similarly use bytes.join() or io.BytesIO, or you can do in-place concatenation with a bytearray object. bytearray objects are mutable and have an efficient overallocation mechanism

I tried `io.BytesIO` too, but `bytearray` has slightly better performance in my testing.
  • Loading branch information
JohnHBrock authored and terrycain committed May 10, 2023
1 parent 6932bef commit 1c9bd60
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions aioboto3/s3/inject.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,7 +265,7 @@ async def file_reader() -> None:
eof = False
while not eof:
part += 1
multipart_payload = b''
multipart_payload = bytearray()
loop_counter = 0
while len(multipart_payload) < multipart_chunksize:
try:
Expand All @@ -284,7 +284,7 @@ async def file_reader() -> None:

# shortcircuit upload logic
eof = True
multipart_payload = b''
multipart_payload = bytearray()
break

if data == b'' and loop_counter > 0: # End of file, handles uploading empty files
Expand Down

0 comments on commit 1c9bd60

Please sign in to comment.