gh-81322: support multiple separators in Stream.readuntil #16429

bmerry · 2019-09-26T16:02:51Z

Allow Stream.readuntil to take an iterable of separators and match any
of them. The earliest match endpoint wins (which ensures that results
are dependent on the chunking) and on ties shortest separator wins
(which only matters if the user has supplied a redundant set like
[b'\r\n', b'\n'] and the limit is reached).

It's also implemented for the deprecated StreamReader, just because the
code for the two implementations was the same except for one line and it
seemed easier to keep them in sync than leaving two different versions
to maintain.

https://bugs.python.org/issue37141

Issue: Allow multiple separators in Stream.readuntil #81322

bmerry · 2019-09-26T16:08:30Z

A few questions about this:

I've implemented it for both StreamReader and Stream, because it seemed like having two identical (apart from _ensure_can_read) implementations is better for maintainability than two implementations that have drifted out of sync. But if one would prefer not to make any changes to StreamReader I can revert that. I suggest doing review discussion on the Stream implementation.
Because the existing functionality is implemented using the general case there might be a very small loss of performance. Are there any benchmarks I should run to test that?
For matches that end in the same position I arbitrarily decided that shortest should win for the purpose of LimitOverrunError, but it was an arbitrary choice. It could just as easily be longest-wins or first-in-list wins.

bmerry · 2019-09-26T16:13:59Z

Lib/asyncio/streams.py

                    break

                # see upper comment for explanation.
-                offset = buflen + 1 - seplen
+                offset = buflen + 1 - max_seplen


Hmm, I see this can make offset negative if max_seplen > min_seplen. I'll add a test to catch it and add a fix.

brandtbucher

Thanks for taking the time to draft this PR @bmerry, and welcome to CPython! 😎

I'm not super familiar with this module, but I found a couple of things that could be tidied up. Other than that, the code looks good! This will definitely need updated docs, though.

Lib/asyncio/streams.py

brandtbucher · 2019-09-26T18:19:22Z

Lib/asyncio/streams.py

+            # Makes sure shortest matches wins, and supports arbitrary iterables
+            separator = sorted(separator, key=lambda sep: len(sep))
+        if not separator:
+            raise ValueError('Separator list should contain at least one element')


See above.

Suggested change

raise ValueError('Separator list should contain at least one element')

raise ValueError('Separator should contain at least one element')

brandtbucher · 2019-09-26T18:19:42Z

Lib/asyncio/streams.py

@@ -1672,26 +1709,35 @@ def _feed_data(self, data):
        #   messages :)

        # `offset` is the number of bytes from the beginning of the buffer
-        # where there is no occurrence of `separator`.
+        # where there is no occurrence of any separator.


See above.

Suggested change

# where there is no occurrence of any separator.

# where there is no occurrence of any `separator`.

brandtbucher · 2019-09-26T18:19:58Z

Lib/asyncio/streams.py

        offset = 0

-        # Loop until we find `separator` in the buffer, exceed the buffer size,
+        # Loop until we find a separator in the buffer, exceed the buffer size,


See above.

Suggested change

# Loop until we find a separator in the buffer, exceed the buffer size,

# Loop until we find a `separator` in the buffer, exceed the buffer size,

brandtbucher · 2019-09-26T18:51:54Z

Lib/asyncio/streams.py

+            separator = [separator]
+        else:
+            # Makes sure shortest matches wins, and supports arbitrary iterables
+            separator = sorted(separator, key=lambda sep: len(sep))


Sorry, missed one. Here too!

Suggested change

separator = sorted(separator, key=lambda sep: len(sep))

separator = sorted(separator, key=len)

1st1

Unfortunately it was decided to revert the current streams implementation from 3.8. See https://bugs.python.org/issue38242 for more details. I'm really sorry, but we'll need to rebase this work on the new API we later add to 3.9 :(

bedevere-bot · 2019-09-26T19:43:29Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be poked with soft cushions!

bmerry · 2019-09-26T20:47:53Z

@1st1 not a problem - definitely worth getting things right before they go into stdlib. Do you think the revert of the new stream API will conflict much with my changes to the old StreamReader class? If not, perhaps the way forward is for me to remove my changes to Stream, address the PR comments on StreamReader and then hopefully it shouldn't need much work to rebase after Stream is removed?

1st1 · 2019-09-26T21:25:03Z

@bmerry I'm not yet sure. I'll be working on a revert later today or tomorrow and will probably know more. I'd keep everything as is as it's not going to be an easy revert anyways.

bmerry · 2019-10-03T17:37:58Z

@1st1 it looks like you've done the Streams reversion now, and I've merged that into my branch. So hopefully this is ready for review now.

@brandtbucher thanks for the suggestions. I've applied all the ones that Github was still showing after the merge from master - let me know if anything got lost along the way.

brandtbucher

One more tiny clarification from me, otherwise this looks good. Still needs updated docs though!

Misc/NEWS.d/next/Library/2019-09-26-17-52-52.bpo-37141.onYY2-.rst

bmerry · 2019-10-07T07:21:50Z

Still needs updated docs though!

@brandtbucher which docs need updating?

brandtbucher · 2019-10-08T18:17:57Z

Sorry about the delay @bmerry. The docs I'm referring to are located in Doc/library/asyncio-stream.rst. Specifically, here:

   .. coroutinemethod:: readuntil(separator=b'\\n')

      Read data from the stream until *separator* is found.

      On success, the data and separator will be removed from the
      internal buffer (consumed). Returned data will include the
      separator at the end.

      If the amount of data read exceeds the configured stream limit, a
      :exc:`LimitOverrunError` exception is raised, and the data
      is left in the internal buffer and can be read again.

      If EOF is reached before the complete separator is found,
      an :exc:`IncompleteReadError` exception is raised, and the internal
      buffer is reset.  The :attr:`IncompleteReadError.partial` attribute
      may contain a portion of the separator.

      .. versionadded:: 3.5.2

The signature and description should be updated. Also, something like this should be included below the .. versionadded:: 3.5.2 bit:

      .. versionchanged:: 3.9

         The *separator* parameter may now be an :term:`iterable` of separators.

brandtbucher

One tiny change to the NEWS entry due to the API revert:

Misc/NEWS.d/next/Library/2019-09-26-17-52-52.bpo-37141.onYY2-.rst

bmerry · 2019-10-09T06:17:47Z

The docs I'm referring to are located in Doc/library/asyncio-stream.rst

Thanks. I'd assumed the library docs were generated from the docstring, which is why I missed this. I'll update it.

brandtbucher · 2019-10-09T18:31:16Z

Looks good to me! @1st1?

1st1 · 2019-10-09T18:33:39Z

We will soon start a discussion about the new streaming API for 3.9. I'll update this issue with a link when that happens; please give us some time before making a decision on this one -- Python 3.9 is still relatively far away.

gvanrossum · 2024-04-07T15:44:58Z

Thanks, I will review in the coming week!

Misc/NEWS.d/next/Library/2019-09-26-17-52-52.bpo-37141.onYY2-.rst

gvanrossum

One comment nit, I'll merge that myself and then land it. Thanks for your contribution, and on behalf of the asyncio maintainers I apologize for the delay!

Lib/asyncio/streams.py

len(separator) would be the number of separators, which makes no sense.

PR python#16429 introduced support for an iterable of separators in Stream.readuntil. Since bytes-like types are themselves iterable, this can introduce ambiguities in deciding whether the argument is an iterator of separators or a singleton separator. In python#16429, only 'bytes' was considered a singleton, but this will break code that passes other buffer object types. The Python library docs don't indicate what separator types were permitted in Python <=3.12, but comments in typeshed indicate that it would work with types that implement the buffer protocol and provide a len(). To keep those cases working the way they did before, I've changed the detection logic to consider any instance of collections.abc.Buffer as a singleton separator. There may still be corner cases where this doesn't do what the user wants e.g. a numpy array of byte strings will implement the buffer protocol and hence be treated as a singleton; but at least those corner cases should behave the same in 3.13 as they did in 3.12. Relates to python#81322.

PR python#16429 introduced support for an iterable of separators in Stream.readuntil. Since bytes-like types are themselves iterable, this can introduce ambiguities in deciding whether the argument is an iterator of separators or a singleton separator. In python#16429, only 'bytes' was considered a singleton, but this will break code that passes other buffer object types. Fix it by only supporting tuples rather than arbitrary iterables. Closes python#117722.

gh-16429 introduced support for an iterable of separators in Stream.readuntil. Since bytes-like types are themselves iterable, this can introduce ambiguities in deciding whether the argument is an iterator of separators or a singleton separator. In gh-16429, only 'bytes' was considered a singleton, but this will break code that passes other buffer object types. Fix it by only supporting tuples rather than arbitrary iterables. Closes gh-117722.

…python#16429)

…ython#117723) pythongh-16429 introduced support for an iterable of separators in Stream.readuntil. Since bytes-like types are themselves iterable, this can introduce ambiguities in deciding whether the argument is an iterator of separators or a singleton separator. In pythongh-16429, only 'bytes' was considered a singleton, but this will break code that passes other buffer object types. Fix it by only supporting tuples rather than arbitrary iterables. Closes pythongh-117722.

bmerry requested review from 1st1 and asvetlov as code owners September 26, 2019 16:02

the-knights-who-say-ni added the CLA signed label Sep 26, 2019

bedevere-bot added the awaiting review label Sep 26, 2019

bmerry commented Sep 26, 2019

View reviewed changes

brandtbucher requested changes Sep 26, 2019

View reviewed changes

bedevere-bot added awaiting core review and removed awaiting review labels Sep 26, 2019

brandtbucher added topic-asyncio type-feature A feature request or enhancement labels Sep 26, 2019

brandtbucher reviewed Sep 26, 2019

View reviewed changes

1st1 requested changes Sep 26, 2019

View reviewed changes

bedevere-bot removed the awaiting core review label Sep 26, 2019

bedevere-bot added the awaiting changes label Sep 26, 2019

brandtbucher requested changes Oct 3, 2019

View reviewed changes

Misc/NEWS.d/next/Library/2019-09-26-17-52-52.bpo-37141.onYY2-.rst Outdated Show resolved Hide resolved

bmerry requested a review from 1st1 October 7, 2019 15:51

brandtbucher requested changes Oct 8, 2019

View reviewed changes

brandtbucher reviewed Oct 8, 2019

View reviewed changes

Misc/NEWS.d/next/Library/2019-09-26-17-52-52.bpo-37141.onYY2-.rst Outdated Show resolved Hide resolved

bmerry requested a review from brandtbucher October 9, 2019 18:09

brandtbucher approved these changes Oct 9, 2019

View reviewed changes

ezio-melotti removed the CLA signed label Jul 13, 2022

bmerry mannequin mentioned this pull request Apr 10, 2022

Allow multiple separators in Stream.readuntil #81322

Closed

bmerry added 2 commits April 7, 2024 16:56

Merge remote-tracking branch 'origin/main' into multi-separator

d6e3508

Update version at which readuntil with separate list is supported

99e4fe2

bmerry requested review from gvanrossum, kumaraditya303 and willingc as code owners April 7, 2024 15:09

gvanrossum removed request for 1st1, asvetlov, aeros and kumaraditya303 April 7, 2024 15:42

gvanrossum changed the title ~~bpo-37141: support multiple separators in Stream.readuntil~~ gh-81322: support multiple separators in Stream.readuntil Apr 7, 2024

bmerry commented Apr 8, 2024

View reviewed changes

Misc/NEWS.d/next/Library/2019-09-26-17-52-52.bpo-37141.onYY2-.rst Show resolved Hide resolved

gvanrossum approved these changes Apr 8, 2024

View reviewed changes

Lib/asyncio/streams.py Outdated Show resolved Hide resolved

bedevere-app bot added awaiting merge and removed awaiting changes labels Apr 8, 2024

Fix comment

bee5ebe

len(separator) would be the number of separators, which makes no sense.

gvanrossum merged commit 775912a into python:main Apr 8, 2024

bedevere-app bot removed the awaiting merge label Apr 8, 2024

bmerry mentioned this pull request Apr 8, 2024

Fix Stream.readuntil with non-bytes buffer objects #117653

Closed

bmerry deleted the multi-separator branch April 8, 2024 19:09

bmerry mentioned this pull request Apr 10, 2024

3.13.0a6 breaks asyncio.Stream.readuntil with bytearray separator #117722

Closed

bmerry mentioned this pull request Apr 10, 2024

gh-117722: Fix Stream.readuntil with non-bytes buffer objects #117723

Merged

bmerry mentioned this pull request Apr 14, 2024

Update asyncio.Stream.readuntil python/typeshed#11755

Merged

diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 17, 2024

pythongh-81322: support multiple separators in StreamReader.readuntil (…

23d0d4d

…python#16429)

	raise ValueError('Separator list should contain at least one element')
	raise ValueError('Separator should contain at least one element')

	# where there is no occurrence of any separator.
	# where there is no occurrence of any `separator`.

	# Loop until we find a separator in the buffer, exceed the buffer size,
	# Loop until we find a `separator` in the buffer, exceed the buffer size,

	separator = sorted(separator, key=lambda sep: len(sep))
	separator = sorted(separator, key=len)

Uh oh!

gh-81322: support multiple separators in Stream.readuntil #16429

gh-81322: support multiple separators in Stream.readuntil #16429

Uh oh!

Conversation

bmerry commented Sep 26, 2019 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bmerry commented Sep 26, 2019

Uh oh!

bmerry Sep 26, 2019

Choose a reason for hiding this comment

Uh oh!

brandtbucher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brandtbucher Sep 26, 2019

Choose a reason for hiding this comment

Uh oh!

brandtbucher Sep 26, 2019

Choose a reason for hiding this comment

Uh oh!

brandtbucher Sep 26, 2019

Choose a reason for hiding this comment

Uh oh!

brandtbucher Sep 26, 2019

Choose a reason for hiding this comment

Uh oh!

1st1 left a comment

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented Sep 26, 2019

Uh oh!

bmerry commented Sep 26, 2019

Uh oh!

1st1 commented Sep 26, 2019

Uh oh!

bmerry commented Oct 3, 2019

Uh oh!

brandtbucher left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bmerry commented Oct 7, 2019

Uh oh!

brandtbucher commented Oct 8, 2019

Uh oh!

brandtbucher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bmerry commented Oct 9, 2019

Uh oh!

brandtbucher commented Oct 9, 2019

Uh oh!

1st1 commented Oct 9, 2019

Uh oh!

gvanrossum commented Apr 7, 2024

Uh oh!

Uh oh!

gvanrossum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bmerry commented Sep 26, 2019 •

edited by bedevere-app bot

Loading

brandtbucher left a comment •

edited

Loading