Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StopIteration from first() caught by a generator #465

Open
mrkrd opened this issue Sep 21, 2019 · 16 comments · May be fixed by #473
Open

StopIteration from first() caught by a generator #465

mrkrd opened this issue Sep 21, 2019 · 16 comments · May be fixed by #473

Comments

@mrkrd
Copy link

mrkrd commented Sep 21, 2019

Hello, I've noticed a surprising behavior of first() which can be well illustrated by:

print(list(map(
    first,
    [[1,2,3], [], [1,2,3], [1,2,3]]
)))

I would expect this statement to raise an exception, because first([]) normally raises an exception on an empty sequence. However, the exception is StopIteration which seems to interfere with map and stops its iteration.

What do you think about catching StopIteration in first() and re-raising a different exception, so it can propagate until explicitly caught? Which exception would be appropriate?

@groutr
Copy link
Contributor

groutr commented Oct 9, 2019

As a matter of record,

x = list(map(first, [[1,2,3], [], [1,2,3], [1,2,3]]))  # x == [1]

The question is should first and friends handle StopIteration? Raising a different exception would break backwards compatibility and potentially break existing code.

Faced with a situation like this, I would probably suggest something like:

from toolz import pluck
x = list(pluck(0, [[1, 2, 3], [], [1, 2, 3], [1, 2, 3]]))  # Raises IndexError
# or
from toolz.curried import get
x = list(map(get(0), [[1, 2, 3], [], [1, 2, 3], [1, 2, 3]])) # Raises IndexError

Note both of these will only work on indexed sequences.

@mrkrd
Copy link
Author

mrkrd commented Oct 9, 2019 via email

@groutr
Copy link
Contributor

groutr commented Oct 10, 2019

@mrkrd, any function that manipulates the iterator directly could potentially raise a StopIteration. Such a functions would be interpose, peek, nth and anything that use them.

You make a very good point and I would like to add that there is precedent in itertoolz already for catching and handling StopIteration.
Examples of functions that handle StopIteration: accumulate, partition_all
I don't think the functions that use itertools (except for itertools.islice) will raise StopIteration either.

I agree that the behavior is subtle. map is consistent with for and zip which all terminate on a StopIteration.

@eriknw what are your thoughts on this?

@groutr
Copy link
Contributor

groutr commented Oct 11, 2019

This might be useful to the issue: https://www.python.org/dev/peps/pep-0479/

@mrkrd are you using Python 3? If so, does adding from __future__ import generator_stop at the top of your module do what you expect?

EDIT: Just remembered that first is not a generator. The generator_stop functionality won't work here.

@eriknw
Copy link
Member

eriknw commented Oct 11, 2019

I agree this is non-ideal behavior, so thanks for the thoughtful discussion already. My initial reaction is we could perhaps follow in the spirit of https://www.python.org/dev/peps/pep-0479/ and raise RuntimeError. In this case, first and second could be implemented as:

def first(seq):
    for rv in seq:
        return rv
    raise RuntimeError()

def second(seq):
    for rv in itertools.islice(seq, 1, None):
        return rv
    raise RuntimeError()

I'll want to give careful consideration to other functions. Returning IndexError may be okay (and nth may sometimes raise this), but doesn't seem entirely appropriate since we're not necessarily indexing into a collection. Perhaps we could raise an error we define.

@groutr
Copy link
Contributor

groutr commented Oct 11, 2019

@eriknw, I think returning an IndexError is incorrect since we are targeting iterators in itertoolz.

Perhaps raising a custom IterationStop error? RuntimeError would also be fine.

BTW: @eriknw, your for-loop implemenation of first is about 2x faster on my machine than the current implementation of first.

@mrkrd
Copy link
Author

mrkrd commented Oct 13, 2019 via email

@groutr groutr linked a pull request Oct 18, 2019 that will close this issue
@groutr
Copy link
Contributor

groutr commented Oct 18, 2019

@mrkrd @eriknw submitted a PR to test raising RuntimeError. If we decide to raise a custom error instead, it will be trivial to update.

@mrkrd
Copy link
Author

mrkrd commented Oct 19, 2019 via email

@wrobell
Copy link

wrobell commented Jan 3, 2020

Unfortunately, this bug shows that Toolz API shall not be locked yet.

IMHO, the best would be introduction a Maybe monad. first should return Nothing on empty input sequence and as consequence map(first, [[1,2,3], [], [1,2,3], [1,2,3]]) shall return Nothing as well (using design from the Wikipedia example here). The Toolz documentation emphasizes a lot of functional programming concepts and it seems it needs one more to not fall short of its promises.

If that's not possible, the IterationError looks like the next best alternative. The documentation shall be improved of first function to suggest its use together with excepts to avoid side effects in form of an exception (you basically implement your own Maybe monad every time).

The documentation update can be done before #473 is merged and reference StopIteration for now. Would such documentation change have a chance to be merged any time soon? Asking as the project seems to be stalled a bit (this bug, #470, #466, #477, etc).

@mrkrd
Copy link
Author

mrkrd commented Jan 3, 2020 via email

@groutr
Copy link
Contributor

groutr commented Jan 4, 2020

@wrobell that is a good suggestion. I'm curious to know what that would look like for toolz API to support monads. I'm open to experimenting with that in the future.

@mrkrd is correct in that the aim of toolz is to extend parts of the standard library and bring useful utility functions to python. It seeks to complement pythonic paradigms. From the documentation, " The toolz API is also strongly affected by the principles of the Python language itself, and often makes deviations in order to be more approachable to that community." (https://toolz.readthedocs.io/en/latest/heritage.html). Depending on toolz should be a very easy thing and not require substantial changes to possibly production code.

I think the right course of action right now is the iteration error route. I'd like to see this issue resolved as well.

@wrobell
Copy link

wrobell commented Jan 6, 2020

If more Pythonic, then what about raising the exception and allow setting a default value like next?

@groutr
Copy link
Contributor

groutr commented Jan 6, 2020

@wrobell one can certainly pass a default value to next. Adding a default argument to first would break existing code. Making it a keyword argument could be doable (similar to get).

Taking a step back, do you have a use case where setting a default is required?

@wrobell
Copy link

wrobell commented Jan 6, 2020

Please consider

first = excepts(StopIteration, itz.first, lambda v: None)

vs.

first = partial(itz.first, default=None)

@igorhub
Copy link

igorhub commented Jan 25, 2020

Coming from Clojure, it is very counterintuitive that first raises exception on empty collection.

I’ve been bitten by this behavior right now. I wanted to open an issue, then found this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants