Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evenly divide a collection into chunks #96

Merged
merged 9 commits into from
Jul 26, 2023

Conversation

timvermeulen
Copy link
Contributor

@timvermeulen timvermeulen commented Mar 11, 2021

Divide a collection into a given number of chunks as evenly as possible, with larger chunks at the start.

for chunk in Array(0..<10).evenlyChunked(in: 4) {
    print(chunk)  // [0, 1, 2], [3, 4, 5], [6, 7], [8, 9]
}

EvenChunks<Base>.SubSequence is set to be EvenChunks<Base.SubSequence> which works out rather nicely. Other collections in this package that could benefit from this as well are LazyChunked, ChunkedByCount, and Windows.

Checklist

  • I've added at least one test that validates that my change is working, if appropriate
  • I've followed the code style of the rest of the project
  • I've read the Contribution Guidelines
  • I've updated the documentation if necessary

@timvermeulen timvermeulen force-pushed the even-chunks branch 2 times, most recently from 4c03fb7 to 88d0450 Compare April 22, 2021 17:59
@timvermeulen timvermeulen marked this pull request as ready for review April 22, 2021 19:48
@timvermeulen
Copy link
Contributor Author

@swift-ci please test

Tim Vermeulen added 2 commits April 22, 2021 21:56
@timvermeulen
Copy link
Contributor Author

I tried for a bit to make EvenChunks<Base>.SubSequence equal to EvenChunks<Base.SubSequence>, but it doesn't really seem to be possible (with how EvenChunks.Index currently works)!

It is important that Index only uses its offset property when testing for equality, because two unequal indices can correspond to the same empty slice of the base collection when the number of chunks exceeds the size of the collection:

print(Array((0..<3).evenlyChunked(into: 5)))
// [0..<1, 1..<2, 2..<3, 3..<3, 3..<3]

At the same time, every collection subsequence needs to share its indices with the collection it came from, in particular:

let chunks = (0..<6).evenlyChunked(into: 2)
let index = chunks.index(after: chunks.startIndex)
let slice = chunks[index...]
print(index == slice.startIndex)  // should print "true"

This is problematic because the slice doesn't know it's a slice, and so it assigns its startIndex an offset of 0, not 1. EvenChunks would need an extra stored property to make this work which is a line I didn't want to cross.

@timvermeulen
Copy link
Contributor Author

@swift-ci please test

/// Returns the base distance between two `EvenChunks` indices from the end
/// of one to the start of the other, when given their offsets.
func baseDistance(from offsetA: Int, to offsetB: Int) -> Int {
let smallChunkSize = baseCount / numberOfChunks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @timvermeulen :)
Should we safe guard by a 0 value in numberOfChunks? The following code for example

    let ec =  "".evenlyChunked(into: 0)
    let d = ec.index(ec.startIndex, offsetBy: 1)

could lead to a division by zero.

Given that should fail anyways because we cannot advance that, should we just precondition that instead of fail in division by zero? WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you pointed out, it's fine (or even desired) for the program to crash in that scenario since it's a programmer error to advance past the end. Division by zero doesn't have the most descriptive error message, but other than that it's a totally reasonable way to crash.

Advancing past the end (or before the start) isn't required to crash in any particular way, in fact, it isn't required to crash at all: Array is a common example of a collection that is totally fine with you moving an index outside the bounds of the collection, as long as you don't use it to try to index the array. In the Algorithms package we try to be a lot more vigilant about making sure the program crashes when an invalid index is used for subscripting, than about whatever happens when you try to move out of bounds (usually deferring to however the base collection handles it).

Note that in this particular case the division by zero only happens when the number of chunks is 0 — when moving past the end of, say, [1, 2, 3].evenlyChunked(into: 2) using index(_:offsetBy:), no crash happens at all. index(_:offsetBy:) is probably where the change should be made if we wanted to be more strict about this behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok! Thanks :)

Division by zero doesn't have the most descriptive error message, but other than that it's a totally reasonable way to crash.

That is what I was thinking with the precondition suggestion, if we will crash, it seems a smoother way to crash from the user perspective since we can give a better error message. But agree, it is totally fine.

internal var firstUpperBound: Base.Index

@inlinable
internal init(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is this init method being used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not anymore, good catch!

@kylemacomber
Copy link

I think I like evenlyChunked(in: 4) more than evenlyChunked(into: 4), since into has connotations of the argument being passed inout, for example:

  • hash(into:)
  • reduce(into:_:)

@timvermeulen
Copy link
Contributor Author

@swift-ci please test

Copy link
Member

@natecook1000 natecook1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚢

let start = startOfChunk(endingAt: end, offset: offset)
return Index(start..<end, offset: offset)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These helpers are great! 👏🏻

@natecook1000
Copy link
Member

@swift-ci Please test

@natecook1000 natecook1000 added this to the Swift Algorithms 1.1 milestone Jul 22, 2023
@natecook1000 natecook1000 merged commit de3efbc into apple:main Jul 26, 2023
@timvermeulen timvermeulen deleted the even-chunks branch August 27, 2023 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants