Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StopIteration is not raised correctly with _iter=True #273

Closed
ghost opened this issue Sep 12, 2015 · 7 comments
Closed

StopIteration is not raised correctly with _iter=True #273

ghost opened this issue Sep 12, 2015 · 7 comments
Labels

Comments

@ghost
Copy link

ghost commented Sep 12, 2015

I found an odd case in which the return value of something like sh.ls('-a', _iter=True) is not behaving as a regular iterator. If I keep calling next on an expression with _iter=True everything looks okay. However, when I try to iterate over chunks using itertools.izip_longest(*([ iter(p) ] * 10)) (iterating over chunks of 10 elements), there is some point in which next is not raising StopIteration and instead hangs the Python REPL.

(sh-env)➜  test  touch {1..200}.txt
(sh-env)➜  test  python
>>> import sh
>>> import itertools
>>> p = sh.ls('-a', _iter=True)
>>> s = itertools.izip_longest(*([ iter(p) ] * 10))
>>> next(s)
(u'.\t 11.txt   140.txt  161.txt  182.txt  21.txt  42.txt  63.txt  84.txt\n', u'..\t 120.txt  141.txt  162.txt  183.txt  22.txt  43.txt  64.txt  85.txt\n', u'100.txt  121.txt  142.txt  163.txt  184.txt  23.txt  44.txt  65.txt  86.txt\n', u'101.txt  122.txt  143.txt  164.txt  185.txt  24.txt  45.txt  66.txt  87.txt\n', u'102.txt  123.txt  144.txt  165.txt  186.txt  25.txt  46.txt  67.txt  88.txt\n', u'103.txt  124.txt  145.txt  166.txt  187.txt  26.txt  47.txt  68.txt  89.txt\n', u'104.txt  125.txt  146.txt  167.txt  188.txt  27.txt  48.txt  69.txt  8.txt\n', u'105.txt  126.txt  147.txt  168.txt  189.txt  28.txt  49.txt  6.txt   90.txt\n', u'106.txt  127.txt  148.txt  169.txt  18.txt   29.txt  4.txt   70.txt  91.txt\n', u'107.txt  128.txt  149.txt  16.txt   190.txt  2.txt   50.txt  71.txt  92.txt\n')
>>> next(s)
(u'108.txt  129.txt  14.txt   170.txt  191.txt  30.txt  51.txt  72.txt  93.txt\n', u'109.txt  12.txt   150.txt  171.txt  192.txt  31.txt  52.txt  73.txt  94.txt\n', u'10.txt\t 130.txt  151.txt  172.txt  193.txt  32.txt  53.txt  74.txt  95.txt\n', u'110.txt  131.txt  152.txt  173.txt  194.txt  33.txt  54.txt  75.txt  96.txt\n', u'111.txt  132.txt  153.txt  174.txt  195.txt  34.txt  55.txt  76.txt  97.txt\n', u'112.txt  133.txt  154.txt  175.txt  196.txt  35.txt  56.txt  77.txt  98.txt\n', u'113.txt  134.txt  155.txt  176.txt  197.txt  36.txt  57.txt  78.txt  99.txt\n', u'114.txt  135.txt  156.txt  177.txt  198.txt  37.txt  58.txt  79.txt  9.txt\n', u'115.txt  136.txt  157.txt  178.txt  199.txt  38.txt  59.txt  7.txt\n', u'116.txt  137.txt  158.txt  179.txt  19.txt   39.txt  5.txt   80.txt\n')
>>> next(s)
^CTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/anaconda/envs/sh-env/lib/python2.7/site-packages/sh.py", line 559, in next
    chunk = self.process._pipe_queue.get(True, 0.001)
  File "/home/user/anaconda/envs/sh-env/lib/python2.7/Queue.py", line 177, in get
    self.not_empty.wait(remaining)
  File "/home/user/anaconda/envs/sh-env/lib/python2.7/threading.py", line 352, in wait
    gotit = waiter.acquire(0)
KeyboardInterrupt
>>> next(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

The interesting thing is that after interrupting the last next operation, if I call next again, then I get a StopInteration.

This is how I think it should work:

>>> p = iter(range(24))
>>> s = itertools.izip_longest(*([ iter(p) ] * 10))
>>> next(s)
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
>>> next(s)
(10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
>>> next(s)
(20, 21, 22, 23, None, None, None, None, None, None)
>>> next(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
@amoffat amoffat added the bug label Sep 12, 2015
@amoffat
Copy link
Owner

amoffat commented Sep 12, 2015

Your process is already an iterator, by virtue of passing _iter=True when you launched the process, so no need to wrap it again in iter(). Also, the docs for itertools.izip_longest say that this is the behavior of the functions:

izip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-

It's designed to take all the args as (presumably?) unique iterators, pull one from each, and zip it up. The fact that you're using the same iterator 10 times is strange. It means izip_longest will attempt to pull a value from the first iterator, and it gets one, then tries to pull a value from the second iterator (which is also the first iterator!) and gets the second value of the first iterator...etc etc. For each iterator index N, you're getting the Nth value of that single iterator that you've duplicated N times. Is this a standard practice for chunking iterators?

Thinking about it this way, it makes sense why the code is locking up. Your resulting iterator (the one returned from izip_longest) can only yield a value IFF it has collected 10 values from the single iterator you passed in 10 times. Let's say your process would return 6 values but you're asking for 10. When izip_longest gets to iterator 7, the process raises StopIteration. To izip_longest, this is normal, as any iterator it is receiving could raise StopIteration when its done. So izip_longest merrily goes to iterator 8 and asks it to yield a value, and since you're using the same iterator for each argument, that iterator is already done... the process has ended and StopIteration was already raised previously.

So the bug here seems to be that multiple calls to next() on a process iterator where the process iterator has stopped, will block. Granted, you've found a very strange way of uncovering it :) It can be shown like this as well:

>>> p = sh.ls(_iter=True)
>>> next(p) # repeat this until StopIteration, then do it one more time

In the meantime, this should accomplish the same thing you're trying to do and works correctly with sh because it stops after the process says its done.

def chunk_iterator(it, times):
    chunks = []
    for piece in it:
        chunks.append(piece)
        if len(chunks) == times:
            yield chunks
            chunks = []
    yield chunks

@ghost
Copy link
Author

ghost commented Sep 12, 2015

Thank you for your response.

Is this a standard practice for chunking iterators?

I believe so. A similar piece of code is included in the recipes section of itertools (in the grouper function).

Yes, definitely calling next(p) in your simpler example repeatedly is producing the same behavior, but in that case, at least the first response is a StopIteration exception. Here you can see another example where instead of returning a list with some elements and filled with None, it just freezes (in this example, I think I created 25 files which are returned in 3 or 4 pieces by sh.ls):

>>> p = sh.ls(_iter=True)
>>> s = itertools.izip_longest(*([ iter(p) ] * 100))
>>> next(s)
^CTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/anaconda/envs/sh-env/lib/python2.7/site-packages/sh.py", line 559, in next
    chunk = self.process._pipe_queue.get(True, 0.001)
  File "/home/user/anaconda/envs/sh-env/lib/python2.7/Queue.py", line 177, in get
    self.not_empty.wait(remaining)
  File "/home/user/anaconda/envs/sh-env/lib/python2.7/threading.py", line 359, in wait
    _sleep(delay)
KeyboardInterrupt

A regular iterator follows this behavior:

>>> p = iter(range(5))
>>> s = itertools.izip_longest(*([ p ] * 100))
>>> next(s)
(0, 1, 2, 3, 4, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None)
>>> next(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Are you sure chunk_iterator is working? I'm getting this:

>>> p = sh.ls(_iter=True)
>>> next(chunk_iterator(p, 2))
[u'file1\tfile12\tfile15\tfile18\tfile20\tfile23\tfile3  file6  file9\n', u'file10\tfile13\tfile16\tfile19\tfile21\tfile24\tfile4  file7\n']
>>> next(chunk_iterator(p, 2))
[u'file11\tfile14\tfile17\tfile2\tfile22\tfile25\tfile5  file8\n']
>>> next(chunk_iterator(p, 2))
^CTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in chunk_iterator
  File "/home/user/anaconda/envs/sh-env/lib/python2.7/site-packages/sh.py", line 559, in next
    chunk = self.process._pipe_queue.get(True, 0.001)
  File "/home/user/anaconda/envs/sh-env/lib/python2.7/Queue.py", line 177, in get
    self.not_empty.wait(remaining)
  File "/home/user/anaconda/envs/sh-env/lib/python2.7/threading.py", line 359, in wait
    _sleep(delay)
KeyboardInterrupt

If I set a variable x = chunk_iterator(p,2), then it works:

>>> p = sh.ls(_iter=True)
>>> x = chunk_iterator(p, 2)
>>> next(x)
[u'file1\tfile12\tfile15\tfile18\tfile20\tfile23\tfile3  file6  file9\n', u'file10\tfile13\tfile16\tfile19\tfile21\tfile24\tfile4  file7\n']
>>> next(x)
[u'file11\tfile14\tfile17\tfile2\tfile22\tfile25\tfile5  file8\n']
>>> next(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

@amoffat
Copy link
Owner

amoffat commented Sep 12, 2015

>>> p = sh.ls(_iter=True)
>>> next(chunk_iterator(p, 2))
>>> next(chunk_iterator(p, 2))

Don't do that. It's creating multiple chunked iterator for the same underlying iterator. It's causing the problem of calling a completed process iterator more than once again.

>>> output_iterator = sh.ls(_iter=True)
>>> chunks = chunk_iterator(output_iterator, 2)
>>> next(chunks)
>>> next(chunks) # etc

That is the correct usage

@ghost
Copy link
Author

ghost commented Sep 12, 2015

It's causing the problem of calling a completed process iterator more than once again.

Yes, that's what I thought. Thanks.

@amoffat
Copy link
Owner

amoffat commented Sep 12, 2015

Sure thing, thanks for reporting the underlying bug. I'll leave this open until I fix it.

@amoffat amoffat added the 1.2 label Nov 4, 2015
@amoffat
Copy link
Owner

amoffat commented Oct 10, 2016

This is on the release-1.2 branch and will go out with it, closing this issue

@amoffat amoffat closed this as completed Oct 10, 2016
@ghost
Copy link
Author

ghost commented Oct 14, 2016

Great. Thank you!

0-wiz-0 added a commit to NetBSD/pkgsrc-wip that referenced this issue Dec 12, 2016
*   added `_out` and `_out_bufsize` validator [#346](amoffat/sh#346)
*   bugfix for internal stdout thread running when it shouldn't [#346](amoffat/sh#346)

*   regression bugfix on timeout [#344](amoffat/sh#344)
*   regression bugfix on `_ok_code=None`

*   further improvements on cpu usage

*   regression in cpu usage [#339](amoffat/sh#339)

*   fd leak regression and fix for flawed fd leak detection test [#337](amoffat/sh#337)

*   support for `io.StringIO` in python2

*   added support for using raw file descriptors for `_in`, `_out`, and `_err`
*   removed `.close()`ing `_out` handler if FIFO detected

*   composed commands no longer propagate `_bg`
*   better support for using `sys.stdin` and `sys.stdout` for `_in` and `_out`
*   bugfix where `which()` would not stop searching at the first valid executable found in PATH
*   added `_long_prefix` for programs whose long arguments start with something other than `--` [#278](amoffat/sh#278)
*   added `_log_msg` for advanced configuration of log message [#311](amoffat/sh#311)
*   added `sh.contrib.sudo`
*   added `_arg_preprocess` for advanced command wrapping
*   alter callable `_in` arguments to signify completion with falsy chunk
*   bugfix where pipes passed into `_out` or `_err` were not flushed on process end [#252](amoffat/sh#252)
*   deprecated `with sh.args(**kwargs)` in favor of `sh2 = sh(**kwargs)`
*   made `sh.pushd` thread safe
*   added `.kill_group()` and `.signal_group()` methods for better process control [#237](amoffat/sh#237)
*   added `new_session` special keyword argument for controlling spawned process session [#266](amoffat/sh#266)
*   bugfix better handling for EINTR on system calls [#292](amoffat/sh#292)
*   bugfix where with-contexts were not threadsafe [#247](amoffat/sh#195)
*   `_uid` new special keyword param for specifying the user id of the process [#133](amoffat/sh#133)
*   bugfix where exceptions were swallowed by processes that weren't waited on [#309](amoffat/sh#309)
*   bugfix where processes that dupd their stdout/stderr to a long running child process would cause sh to hang [#310](amoffat/sh#310)
*   improved logging output [#323](amoffat/sh#323)
*   bugfix for python3+ where binary data was passed into a process's stdin [#325](amoffat/sh#325)
*   Introduced execution contexts which allow baking of common special keyword arguments into all commands [#269](amoffat/sh#269)
*   `Command` and `which` now can take an optional `paths` parameter which specifies the search paths [#226](amoffat/sh#226)
*   `_preexec_fn` option for executing a function after the child process forks but before it execs [#260](amoffat/sh#260)
*   `_fg` reintroduced, with limited functionality.  hurrah! [#92](amoffat/sh#92)
*   bugfix where a command would block if passed a fd for stdin that wasn't yet ready to read [#253](amoffat/sh#253)
*   `_long_sep` can now take `None` which splits the long form arguments into individual arguments [#258](amoffat/sh#258)
*   making `_piped` perform "direct" piping by default (linking fds together).  this fixes memory problems [#270](amoffat/sh#270)
*   bugfix where calling `next()` on an iterable process that has raised `StopIteration`, hangs [#273](amoffat/sh#273)
*   `sh.cd` called with no arguments no changes into the user's home directory, like native `cd` [#275](amoffat/sh#275)
*   `sh.glob` removed entirely.  the rationale is correctness over hand-holding. [#279](amoffat/sh#279)
*   added `_truncate_exc`, defaulting to `True`, which tells our exceptions to truncate output.
*   bugfix for exceptions whose messages contained unicode
*   `_done` callback no longer assumes you want your command put in the background.
*   `_done` callback is now called asynchronously in a separate thread.
*   `_done` callback is called regardless of exception, which is necessary in order to release held resources, for example a process pool
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant