Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support lazy iteration #131

Closed
chuckwondo opened this issue Mar 1, 2024 · 4 comments · Fixed by #136
Closed

Support lazy iteration #131

chuckwondo opened this issue Mar 1, 2024 · 4 comments · Fixed by #136

Comments

@chuckwondo
Copy link

As currently implemented, Iter eagerly evaluates all iterators, so it does not support infinite iterators, per https://github.com/MartinBernstorff/iterpy/blob/main/iterpy/iter.py#L16. Even very large iterators could easily crush Iter. Even smaller iterators could cause problems, if you create a large enough chain of calls to Iter.map, Iter.filter, etc., as each call in the chain creates a new list, consuming more and more memory along the way.

Do you have any plans for implementing lazy evaluation to avoid these problems?

@MartinBernstorff
Copy link
Owner

MartinBernstorff commented Mar 3, 2024

Hi Chuck! Thanks for the interest in the project!

I've been pondering this for a while, with two considerations:

  1. Implement a consumable Iter (e.g. CIter or similar), using generators through the entire pipeline. It should support the same methods, but be focused on performance. It won't be stateless, but would be memory efficient (Issue feat: add lazy and consumable subtype #61).

  2. I don't know enough about Python's garbage collection to know whether it'll be able to collect part of a method chain. If it does, I think the issue is less severe than stated above.

Any thoughts on the above?

Do you have a use-case where you're running into issues? 😊

@chuckwondo
Copy link
Author

Given that this is a library intended to provide a "fluent" interface for using iterators, I would say that by automatically eagerly evaluating things under the covers, you are automatically undoing one of the main reasons for using iterators to begin with, which is laziness.

Here's an example of a "competing" library to this one, where everything remains lazy, as anybody using iterators would reasonably expect to be the case: https://github.com/olirice/flupy/tree/master. I would imagine there are other such Python libraries around. Automatic eager evaluation is likely to be surprising to anybody specifically choosing to use an iterator. The builtin map, filter, zip, et al. functions are lazy.

Given your stated desire to have this library included as part of the rustedpy group of packages, I would imagine that you might wish to align this library with Rust's Iterator trait, which you will also find to be completely lazy, until you invoke a method known to trigger evaluation, such as collect. Of course, you wouldn't necessarily have to implement every single method defined by Rust's Iterator trait, but if the intent is to provide a Rust-like Iterator for Python, then Rust's Iterator trait should certainly be your guide.

@MartinBernstorff
Copy link
Owner

MartinBernstorff commented Mar 7, 2024

Thanks a ton for your interest, Chuck!

I just want to highlight that this is a hobby project, and as most open source contributors, I'm doing this for fun!

With that said, I completely agree with your technical points! I even mention flupy and std::iter in the readme 👍

I've implemented lazy, consumable evaluation as the default in #136.

@MartinBernstorff
Copy link
Owner

Just to add to this, I personally find non-consumable iterables much easier to work with for debugging. To that end, I've added an Arr[ay] in #139. Would love to hear what you think!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants