gh-129559: Add `bytearray.resize()` #129560

cmaloney · 2025-02-01T23:14:06Z

Add bytearray.resize() which wraps PyByteArray_Resize.

Make negative size passed to resize exception/error rather than crash in optimized builds.

Issue: Add bytearray.resize() method #129559

📚 Documentation preview 📚: https://cpython-previews--129560.org.readthedocs.build/

Add `bytearray.resize()` which wraps `PyByteArray_Resize`

cmaloney · 2025-02-02T03:58:41Z

@vstinner I think this is ready for review if bytearray.resize seems like a reasonable feature add.

Lib/test/test_bytes.py

Doc/library/stdtypes.rst

Objects/bytearrayobject.c

Doc/library/stdtypes.rst

vstinner · 2025-02-02T14:57:36Z

Please initialize new bytes to zero. We don't accept undefined behavior in Python.

Co-authored-by: Victorien <65306057+Viicos@users.noreply.github.com>

Objects/bytearrayobject.c

Lib/test/test_bytes.py

Objects/bytearrayobject.c

Doc/library/stdtypes.rst

Lib/test/test_bytes.py

vstinner · 2025-02-03T11:41:13Z

I'm not sure that adding this API is needed since there is already a way to truncate a bytearray and to extend a bytearray using the existing API. You should run a benchmark to show that it's worth it, especially to extend a bytearray.

cmaloney · 2025-02-03T18:52:28Z

What is the way to extend the bytearray without copying byte by byte or having the length of the extension extra memory allocated?

vstinner · 2025-02-04T00:20:52Z

What is the way to extend the bytearray without copying byte by byte or having the length of the extension extra memory allocated?

What you say. In short, ba[len(ba):] = b'\0' * extend.

Please run a benchmark to measure this compared to ba.extend().

ZeroIntensity

The implementation mostly looks fine, but my primary concern is that we are indeed exposing an implementation detail on our end. Alternate Python implementations that either don't use an internal buffer, or can't control the size of it, will have a hard time effectively providing resize.

For example, Brython implements bytearray using JavaScript arrays, which don't have an explicit way to resize. I don't think they'll like this method.

I'd go for one of two options:

Mark resize as a CPython implementation detail.
Make resize a hint, allowing the runtime to ignore the request for resizing if it wants to.

I'd personally choose the latter, but I'd also like to hear Victor's thoughts.

Objects/bytearrayobject.c

Doc/c-api/bytearray.rst

vstinner · 2025-02-04T14:19:14Z

@ZeroIntensity: It's already possible to implement extend() with the current bytearray API, it's just a convenient helper for: ba[len(ba):] = b'\0' * extend. So I don't see why other Python implementation couldn't implement it. They should already be able to resize a bytearray.

There are many ways to "extend" a bytearray. Another example:

>>> ba=bytearray(b'abc')
>>> ba += b'def'
>>> ba, len(ba)
(bytearray(b'abcdef'), 6)

ZeroIntensity · 2025-02-04T14:35:03Z

Yeah, but I think that's a bit different than an explicit "resize." You can emulate it with that kind of thing, but it feels different than an actual method saying it will reallocate.

vstinner · 2025-02-04T14:47:40Z

The method documentation should not promise any performance guarantee in general, it can be implemented in different ways (the implementation doesn't matter).

ZeroIntensity · 2025-02-04T16:53:22Z

Can we clarify that with a CPython implementation detail note?

cmaloney · 2025-02-04T17:39:09Z

@ZeroIntensity can definitely add a note of "This is equivalent to bytearray += b'\0' * size". ~~For more efficiency in JavaScript in particular I think this could map to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/fill~~ -- nevermind, just sets elements to 0, can't append.

(Planning to do next iteration + perf testing follow up today, yesterday got sucked into a BytesIO.readfrom(file, *, limit: int | None = None, expected: int | None = None) experiment)

Co-authored-by: Peter Bierma <zintensitydev@gmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>

Co-authored-by: Victor Stinner <vstinner@python.org>

vstinner · 2025-02-04T19:43:41Z

test_bytes and test_capi are failing, you need to update the exception.

cmaloney · 2025-02-04T19:51:55Z

Working on them; Unfortunate part of "add suggestions", it adds "this person contributed" but doesn't let you do that + local changes before making push visible / alerting reviewers... Should have soon

Doc/c-api/bytearray.rst

gpshead · 2025-02-05T06:00:13Z

Doc/library/stdtypes.rst

+   .. method:: resize(size)
+
+      Resize the :class:`bytearray` to contain *size* bytes.
+      If :class:`bytearray` needs to grow, all new bytes will be set to null bytes.


Also describe the "obvious-to-us" behavior of what happens when it shrinks: the data at the end is truncated, the remaining data should be equivalent to a [:size] slice.

... and consider if the Python version should also support negative size to mean the same thing it would in a slice notation.
if (size < 0): size = max(0, len(self) + size) ?

👍 for the truncation.

I don't like negative, because the function operates in absolute buffer size, and requesting a negative buffer size sounds like a bug I'd write and I'd prefer an exception / exit rather than debug the symptom "my buffer shrank".

my original equivalent to is wrong, which make it look like a delta, current doc one I'm validating:

if len(self) > size: del self[size:] else: self += b'\0' * (size - len(self))

Lib/test/test_bytes.py

Doc/library/stdtypes.rst

cmaloney · 2025-02-05T06:55:27Z

I think I've addressed all review comments.

Tested performance on an optimized build (--with-lto --enable-optimizations)

$ python bm_bytearray_resize.py       
.....................
extend_bytes: Mean +- std dev: 142 ms +- 1 ms
.....................
extend_range: Mean +- std dev: 1.89 sec +- 0.07 sec
.....................
iadd: Mean +- std dev: 143 ms +- 1 ms
.....................
resize: Mean +- std dev: 48.7 ms +- 0.5 ms

import itertools
import pyperf

runner = pyperf.Runner()


# Aiming for what a file readall does in terms of resizing as get more data.
# Non-linear resizing up to a reaonably large read size.
blocksize = 1024
sizes = [
    blocksize,
    blocksize * 4,
    blocksize * 16,
    1_000_000,
    100_000_000,
    1_000_000_000,
    ]

def extend_bytes():
    """Extend by appending a temporary byte string"""
    ba = bytearray()
    for size in sizes:
        ba.extend(b'\0' * size)


def extend_range():
    """Resize by an iterator with length_hint"""
    ba = bytearray()
    for size in sizes:
        ba.extend(itertools.repeat(0, size))


def iadd():
    """Resize by using += """
    ba = bytearray()
    for size in sizes:
        ba += b'\0' * size


def resize():
    """Use resize"""
    ba = bytearray()
    for size in sizes:
        ba.resize(size)



runner.bench_func("extend_bytes", extend_bytes)
runner.bench_func("extend_range", extend_range)
runner.bench_func("iadd", iadd)
runner.bench_func("resize", resize)

Doc/library/stdtypes.rst

vstinner · 2025-02-05T09:54:55Z

Doc/library/stdtypes.rst

+      >>> shrink = bytearray(5)
+      >>> resize(shrink, 0)
+      >>> shrink
+      bytearray(b'')
+      >>> grow = bytearray(2)
+      >>> resize(grow, 7)
+      >>> grow
+      bytearray(b'\x00\x00\x00\x00\x00\x00\x00')


I would prefer to use resize in examples:

Suggested change

>>> shrink = bytearray(5)

>>> resize(shrink, 0)

>>> shrink

bytearray(b'')

>>> grow = bytearray(2)

>>> resize(grow, 7)

>>> grow

bytearray(b'\x00\x00\x00\x00\x00\x00\x00')

Examples:

>>> shrink = bytearray(b'abc')

>>> shrink.resize(1)

>>> (shrink, len(shrink))

(bytearray(b'a'), 1)

>>> grow = bytearray(b'abc')

>>> grow.resize(5)

>>> (grow, len(grow))

(bytearray(b'abc\x00\x00'), 5)

(Making locally so I can get the doctest passing)

Lib/test/test_bytes.py

Misc/NEWS.d/next/Library/2025-02-01-14-55-33.gh-issue-129559.hQCeAz.rst

Objects/bytearrayobject.c

Doc/c-api/bytearray.rst

vstinner · 2025-02-05T10:02:08Z

The change mostly LGTM. I just made a new review for some details.

extend_bytes: Mean +- std dev: 142 ms +- 1 ms
extend_range: Mean +- std dev: 1.89 sec +- 0.07 sec
iadd: Mean +- std dev: 143 ms +- 1 ms
resize: Mean +- std dev: 48.7 ms +- 0.5 ms

Thanks for the benchmark. So resize is 2.9x faster than the existing fastest method to grow a bytearray (extend_bytes). So it's worth it to add a dedicated method.

Co-authored-by: Victor Stinner <vstinner@python.org>

gpshead · 2025-02-05T19:28:36Z

Nice, I'm not surprised it is faster. Though what I care more about is that code using this is more readable than the other hacks. :)

Add bytearray.resize() which wraps PyByteArray_Resize. Make negative size passed to resize exception/error rather than crash in optimized builds.

pythongh-129559: Add bytearray.resize()

e240327

Add `bytearray.resize()` which wraps `PyByteArray_Resize`

bedevere-app bot added the awaiting review label Feb 1, 2025

bedevere-app bot mentioned this pull request Feb 1, 2025

Add bytearray.resize() method #129559

Closed

cmaloney mentioned this pull request Feb 1, 2025

gh-129005: Fix buffer expansion in _pyio.FileIO.readall #129541

Closed

cmaloney added 9 commits February 1, 2025 15:46

Add versionchanged note to c-api around len behavior

925fcbd

Add argument formatting to stdtypes.rst

29c04ad

Add buffer error to c api test

d15ef67

Remove case that no longer crashes

d199032

Fix versionchanged

18829a2

Fix doc warnings

df779e2

Fix requested size check range

facc91f

0 is a fine size

d8b9faf

Fix grammar

7429cf4

vstinner reviewed Feb 2, 2025

View reviewed changes

Lib/test/test_bytes.py Outdated Show resolved Hide resolved

vstinner reviewed Feb 2, 2025

View reviewed changes

Doc/library/stdtypes.rst Outdated Show resolved Hide resolved

Add NULL byte tests, include set bytes in docs

c183116

cmaloney commented Feb 2, 2025

View reviewed changes

Objects/bytearrayobject.c Show resolved Hide resolved

Viicos reviewed Feb 2, 2025

View reviewed changes

Doc/library/stdtypes.rst Outdated Show resolved Hide resolved

cmaloney and others added 2 commits February 2, 2025 09:55

Update Doc/library/stdtypes.rst

dd46a85

Co-authored-by: Victorien <65306057+Viicos@users.noreply.github.com>

Always null new bytes

69bbdc1

vstinner reviewed Feb 2, 2025

View reviewed changes

ZeroIntensity reviewed Feb 4, 2025

View reviewed changes

Objects/bytearrayobject.c Outdated Show resolved Hide resolved

Objects/bytearrayobject.c Outdated Show resolved Hide resolved

Doc/c-api/bytearray.rst Outdated Show resolved Hide resolved

cmaloney and others added 2 commits February 4, 2025 10:23

Apply suggestions from code review

ec4aa3d

Co-authored-by: Peter Bierma <zintensitydev@gmail.com> Co-authored-by: Victor Stinner <vstinner@python.org>

Update Objects/bytearrayobject.c

336299a

Co-authored-by: Victor Stinner <vstinner@python.org>

cmaloney added 2 commits February 4, 2025 13:03

tweak docs, fixup return exception

1a0e157

Update test_resize_forbidden to use bytearray.resize

cf89ec1

gpshead reviewed Feb 5, 2025

View reviewed changes

doc tweaks

ddc7b09

cmaloney added 2 commits February 4, 2025 22:58

Fix indentation of equivalent code

a49374b

Fix doctest

298d052

vstinner reviewed Feb 5, 2025

View reviewed changes

cmaloney and others added 4 commits February 5, 2025 09:42

Apply suggestions from code review

2f6b0a3

Co-authored-by: Victor Stinner <vstinner@python.org>

Update doctest per review

bf00f33

Fix whitespace in test_bytes

8df7b02

Merge branch 'main' into resize_pr

43e46dd

gpshead approved these changes Feb 5, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Feb 5, 2025

gpshead merged commit 5fb019f into python:main Feb 5, 2025
46 checks passed

bedevere-app bot removed the awaiting merge label Feb 5, 2025

cmaloney deleted the resize_pr branch February 5, 2025 20:03

cmaloney mentioned this pull request Feb 5, 2025

Convert change detection to a Python script #129627

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-129559: Add `bytearray.resize()` #129560

gh-129559: Add `bytearray.resize()` #129560

cmaloney commented Feb 1, 2025 •

edited

Loading

cmaloney commented Feb 2, 2025

vstinner commented Feb 2, 2025

vstinner commented Feb 3, 2025

cmaloney commented Feb 3, 2025

vstinner commented Feb 4, 2025

ZeroIntensity left a comment

vstinner commented Feb 4, 2025

ZeroIntensity commented Feb 4, 2025

vstinner commented Feb 4, 2025

ZeroIntensity commented Feb 4, 2025

cmaloney commented Feb 4, 2025 •

edited

Loading

vstinner commented Feb 4, 2025

cmaloney commented Feb 4, 2025

gpshead Feb 5, 2025

cmaloney Feb 5, 2025 •

edited

Loading

cmaloney Feb 5, 2025

cmaloney commented Feb 5, 2025 •

edited

Loading

vstinner Feb 5, 2025

cmaloney Feb 5, 2025

vstinner commented Feb 5, 2025

gpshead commented Feb 5, 2025

-      >>> shrink = bytearray(5)
-      >>> resize(shrink, 0)
-      >>> shrink
-      bytearray(b'')
-      >>> grow = bytearray(2)
-      >>> resize(grow, 7)
-      >>> grow
-      bytearray(b'\x00\x00\x00\x00\x00\x00\x00')
+      Examples:
+      >>> shrink = bytearray(b'abc')
+      >>> shrink.resize(1)
+      >>> (shrink, len(shrink))
+      (bytearray(b'a'), 1)
+      >>> grow = bytearray(b'abc')
+      >>> grow.resize(5)
+      >>> (grow, len(grow))
+      (bytearray(b'abc\x00\x00'), 5)

gh-129559: Add bytearray.resize() #129560

gh-129559: Add bytearray.resize() #129560

Conversation

cmaloney commented Feb 1, 2025 • edited Loading

cmaloney commented Feb 2, 2025

vstinner commented Feb 2, 2025

vstinner commented Feb 3, 2025

cmaloney commented Feb 3, 2025

vstinner commented Feb 4, 2025

ZeroIntensity left a comment

Choose a reason for hiding this comment

vstinner commented Feb 4, 2025

ZeroIntensity commented Feb 4, 2025

vstinner commented Feb 4, 2025

ZeroIntensity commented Feb 4, 2025

cmaloney commented Feb 4, 2025 • edited Loading

vstinner commented Feb 4, 2025

cmaloney commented Feb 4, 2025

gpshead Feb 5, 2025

Choose a reason for hiding this comment

cmaloney Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

cmaloney Feb 5, 2025

Choose a reason for hiding this comment

cmaloney commented Feb 5, 2025 • edited Loading

vstinner Feb 5, 2025

Choose a reason for hiding this comment

cmaloney Feb 5, 2025

Choose a reason for hiding this comment

vstinner commented Feb 5, 2025

gpshead commented Feb 5, 2025

gh-129559: Add `bytearray.resize()` #129560

gh-129559: Add `bytearray.resize()` #129560

cmaloney commented Feb 1, 2025 •

edited

Loading

cmaloney commented Feb 4, 2025 •

edited

Loading

cmaloney Feb 5, 2025 •

edited

Loading

cmaloney commented Feb 5, 2025 •

edited

Loading