-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-129559: Add bytearray.resize()
#129560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add `bytearray.resize()` which wraps `PyByteArray_Resize`
@vstinner I think this is ready for review if |
Please initialize new bytes to zero. We don't accept undefined behavior in Python. |
Co-authored-by: Victorien <65306057+Viicos@users.noreply.github.com>
I'm not sure that adding this API is needed since there is already a way to truncate a bytearray and to extend a bytearray using the existing API. You should run a benchmark to show that it's worth it, especially to extend a bytearray. |
What is the way to extend the |
What you say. In short, Please run a benchmark to measure this compared to ba.extend(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation mostly looks fine, but my primary concern is that we are indeed exposing an implementation detail on our end. Alternate Python implementations that either don't use an internal buffer, or can't control the size of it, will have a hard time effectively providing resize
.
For example, Brython implements bytearray using JavaScript arrays, which don't have an explicit way to resize. I don't think they'll like this method.
I'd go for one of two options:
- Mark
resize
as a CPython implementation detail. - Make
resize
a hint, allowing the runtime to ignore the request for resizing if it wants to.
I'd personally choose the latter, but I'd also like to hear Victor's thoughts.
@ZeroIntensity: It's already possible to implement extend() with the current There are many ways to "extend" a bytearray. Another example: >>> ba=bytearray(b'abc')
>>> ba += b'def'
>>> ba, len(ba)
(bytearray(b'abcdef'), 6) |
Yeah, but I think that's a bit different than an explicit "resize." You can emulate it with that kind of thing, but it feels different than an actual method saying it will reallocate. |
The method documentation should not promise any performance guarantee in general, it can be implemented in different ways (the implementation doesn't matter). |
Can we clarify that with a |
@ZeroIntensity can definitely add a note of "This is equivalent to (Planning to do next iteration + perf testing follow up today, yesterday got sucked into a |
Co-authored-by: Peter Bierma <zintensitydev@gmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>
Co-authored-by: Victor Stinner <vstinner@python.org>
test_bytes and test_capi are failing, you need to update the exception. |
Working on them; Unfortunate part of "add suggestions", it adds "this person contributed" but doesn't let you do that + local changes before making push visible / alerting reviewers... Should have soon |
Doc/library/stdtypes.rst
Outdated
.. method:: resize(size) | ||
|
||
Resize the :class:`bytearray` to contain *size* bytes. | ||
If :class:`bytearray` needs to grow, all new bytes will be set to null bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also describe the "obvious-to-us" behavior of what happens when it shrinks: the data at the end is truncated, the remaining data should be equivalent to a [:size]
slice.
... and consider if the Python version should also support negative size to mean the same thing it would in a slice notation.
if (size < 0): size = max(0, len(self) + size)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for the truncation.
I don't like negative, because the function operates in absolute buffer size, and requesting a negative buffer size sounds like a bug I'd write and I'd prefer an exception / exit rather than debug the symptom "my buffer shrank".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my original equivalent to is wrong, which make it look like a delta, current doc one I'm validating:
if len(self) > size:
del self[size:]
else:
self += b'\0' * (size - len(self))
I think I've addressed all review comments. Tested performance on an optimized build (
import itertools
import pyperf
runner = pyperf.Runner()
# Aiming for what a file readall does in terms of resizing as get more data.
# Non-linear resizing up to a reaonably large read size.
blocksize = 1024
sizes = [
blocksize,
blocksize * 4,
blocksize * 16,
1_000_000,
100_000_000,
1_000_000_000,
]
def extend_bytes():
"""Extend by appending a temporary byte string"""
ba = bytearray()
for size in sizes:
ba.extend(b'\0' * size)
def extend_range():
"""Resize by an iterator with length_hint"""
ba = bytearray()
for size in sizes:
ba.extend(itertools.repeat(0, size))
def iadd():
"""Resize by using += """
ba = bytearray()
for size in sizes:
ba += b'\0' * size
def resize():
"""Use resize"""
ba = bytearray()
for size in sizes:
ba.resize(size)
runner.bench_func("extend_bytes", extend_bytes)
runner.bench_func("extend_range", extend_range)
runner.bench_func("iadd", iadd)
runner.bench_func("resize", resize) |
Doc/library/stdtypes.rst
Outdated
>>> shrink = bytearray(5) | ||
>>> resize(shrink, 0) | ||
>>> shrink | ||
bytearray(b'') | ||
>>> grow = bytearray(2) | ||
>>> resize(grow, 7) | ||
>>> grow | ||
bytearray(b'\x00\x00\x00\x00\x00\x00\x00') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to use resize in examples:
>>> shrink = bytearray(5) | |
>>> resize(shrink, 0) | |
>>> shrink | |
bytearray(b'') | |
>>> grow = bytearray(2) | |
>>> resize(grow, 7) | |
>>> grow | |
bytearray(b'\x00\x00\x00\x00\x00\x00\x00') | |
Examples: | |
>>> shrink = bytearray(b'abc') | |
>>> shrink.resize(1) | |
>>> (shrink, len(shrink)) | |
(bytearray(b'a'), 1) | |
>>> grow = bytearray(b'abc') | |
>>> grow.resize(5) | |
>>> (grow, len(grow)) | |
(bytearray(b'abc\x00\x00'), 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Making locally so I can get the doctest passing)
Misc/NEWS.d/next/Library/2025-02-01-14-55-33.gh-issue-129559.hQCeAz.rst
Outdated
Show resolved
Hide resolved
The change mostly LGTM. I just made a new review for some details.
Thanks for the benchmark. So resize is 2.9x faster than the existing fastest method to grow a bytearray (extend_bytes). So it's worth it to add a dedicated method. |
Co-authored-by: Victor Stinner <vstinner@python.org>
Nice, I'm not surprised it is faster. Though what I care more about is that code using this is more readable than the other hacks. :) |
Add bytearray.resize() which wraps PyByteArray_Resize. Make negative size passed to resize exception/error rather than crash in optimized builds.
Add bytearray.resize() which wraps PyByteArray_Resize. Make negative size passed to resize exception/error rather than crash in optimized builds.
Add
bytearray.resize()
which wrapsPyByteArray_Resize
.Make negative size passed to resize exception/error rather than crash in optimized builds.
bytearray.resize()
method #129559📚 Documentation preview 📚: https://cpython-previews--129560.org.readthedocs.build/