Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make uniqued() lazy by default #71

Merged
merged 3 commits into from
Apr 7, 2021
Merged

Conversation

timvermeulen
Copy link
Contributor

Adds the Uniqued sequence that lazily produces the unique elements of a sequence. uniqued() and lazy.uniqued(on:) now return a Uniqued instance, the eager uniqued(on:) still produces an array.

Checklist

  • I've added at least one test that validates that my change is working, if appropriate
  • I've followed the code style of the rest of the project
  • I've read the Contribution Guidelines
  • I've updated the documentation if necessary

Copy link

@kylemacomber kylemacomber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me.

My only question is how much folks are attached to uniqued() returning an Array. I think that we probably want to take uniqued() in this lazy direction (as opposed to, say, adding lazy.uniqued()), because it's more consistent with the rest of Algorithms and stdlib, but whenever I see this in the wild it's returning an Array.

Sources/Algorithms/Unique.swift Outdated Show resolved Hide resolved
}
```

### Complexity

The `uniqued` methods are O(_n_) in both time and space complexity.
The eager `uniqued(on:)` method is O(_n_) in both time and space complexity.
The lazy versions are O(_1_).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a very salient consideration mentioned by @kylemacomber that most uses seem to expect an Array, in which case a .lazy.uniqued design would make more sense.

Additionally, this discrepancy here, where uniqued(on:) isn't lazy by default but uniqued would be, seems like it invites confusion. I certainly would not expect that supplying a custom predicate would change the complexity or behavior so fundamentally, and I can see that issue arising when people make changes to their code and encounter this difference. Therefore, if it makes sense to make uniqued lazy, then I think the same change should be applied to uniqued(on:).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do people expect uniqued() to return an array, or do implementations usually simply return an array because it's easier to implement that way? I genuinely don't know the answer to this.

I share your concern about the discrepancy between uniqued() and uniqued(on:), but I do think that viewing them in isolation, this change is consistent with other sequence operations in the standard library. Operations that can be lazy without having to compromise (except perhaps on the return type) typically are, even when called on a sequence not conforming to LazySequenceProtocol. Collection's joined() and reversed() fall into this category, and I argue that uniqued() does as well. uniqued(on:) does not, because making it lazy would require the closure to be escaping and non-throwing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do any other algorithms in this library or the standard library differ in laziness depending on the presence of a custom predicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the pair of operations that comes closest is joined() being lazy and flatMap { $0 } being eager while effectively doing the same thing. Of course they don't have similar names like uniqued() and uniqued(on:) do.

This is a fairly unique situation because other potential pairs lack one of the variants. chunked(by: ==) and compactMap { $0 } have no corresponding closure-less version, and operations like zip, reversed, and combinations have no versions that do take a closure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I think the joined()/flatMap { $0 } and compacted()/compactMap { $0 } (see #112) serve as good precedent of similar of lazy/eager pairs.

To try to articulate the philosophy:

  1. Anything that can efficiently be lazy should be lazy, because (i) it can avoid extra work (e.g. unnecessary computation or allocation) and (ii) it's easy to go from lazy to eager by constructing an Array, but it's impossible to go the other way.
  2. Require an explicit .lazy for algorithms that take a closure to emphasize that (i) the closure will not run immediately and (ii) may run more than once per element in the collection, which can introduce a surprising vector for error.
  3. An algorithm shouldn't be lazy if its being lazy would pessimize its runtime complexity. For example, a lazy reversed adapter over a plain Collection would be absurd because looping over all of its indices would be O(n²).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have OrderedSet via the Swift Collections packages, I think it's even clearer to me that uniqued should be lazy... the ability to do partial computation is really the only thing (other than method call syntax) distinguishing this algorithm from just creating an OrderedSet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have OrderedSet via the Swift Collections packages, I think it's even clearer to me that uniqued should be lazy... the ability to do partial computation is really the only thing (other than method call syntax) distinguishing this algorithm from just creating an OrderedSet

+1

@natecook1000 natecook1000 added the source breaking This change affects existing source code label Feb 9, 2021
@kylemacomber
Copy link

@swift-ci please test

@natecook1000
Copy link
Member

:shipit:

@natecook1000 natecook1000 merged commit 718220d into apple:main Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
source breaking This change affects existing source code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants