Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: slices: add Divide (split into equally sized chunks) #65523

Closed
earthboundkid opened this issue Feb 5, 2024 · 20 comments
Closed

proposal: slices: add Divide (split into equally sized chunks) #65523

earthboundkid opened this issue Feb 5, 2024 · 20 comments
Labels
Milestone

Comments

@earthboundkid
Copy link
Contributor

Proposal Details

Follow up to #53987. This is a proposal to add an iterator to the slices package that divides a slice into N equally sized subslices.

// Divide returns an iterator that divides a slice into n sub-slices of roughly equal length.
// Description of special cases goes here.
func Divide[S ~[]E, E any](slice S, n int) iter.Seq[S]

Bikeshed topics: What should the name of this function be? How does it handle special cases?

I think it should always iterate over N items, but if the slice is too short, it may return subslices of length 0. So

  • Divide(slice of len(5), n of 4) → len 1, len 1, len 1, len 0
  • Divide(0, 2) → 0, 0
  • Divide(1, 3) → 1, 0, 0
  • Divide(5, 4) → 2, 1, 1, 1

Should it be:

  • Divide(6, 4) → 2, 2, 1, 1 OR
  • Divide(6, 4) → 3, 1, 1, 1 ?
@avamsi
Copy link

avamsi commented Feb 5, 2024

Should it be:

Divide(6, 4) → 2, 2, 1, 1 OR
Divide(6, 4) → 3, 1, 1, 1 ?

Do you have any use cases in mind for Divide(6, 4) → 3, 1, 1, 1? Divide(6, 4) → 2, 2, 1, 1 seems better from a fairness POV and IMO, a more intuitive expectation from "sub-slices of roughly equal length".

@earthboundkid
Copy link
Contributor Author

earthboundkid commented Feb 5, 2024

The veggiemonk version of Divide does Divide(6, 4) → 1, 2, 1, 2, which I find sort of weird.

I like the avamsi version best, where it distributes the remainder starting from the first in the batch until there aren't any remainders left.

An alternative would be to just pile all of the remainder onto the first or last slice, but I agree that you usually don't want that.

@earthboundkid
Copy link
Contributor Author

earthboundkid commented Feb 5, 2024

Here's a working version with a test.

// Divide returns an iterator that divides a slice into n sub-slices of roughly equal length.
// If there is a remainder from dividing the slice,
// it will be distributed evenly starting from the first sub-slice.
// Sub-slices may be empty if n is greater than the length of the slice.
// Divide panics if n is less than one.
// All sub-slices are clipped to have no capacity beyond their length.
func Divide[S ~[]E, E any](slice S, n int) iter.Seq[S] {
	if n < 1 {
		panic(fmt.Sprintf("attempted to divide a slice by n < 1: n = %d", n))
	}
	return func(yield func(S) bool) {
		size := len(slice) / n
		remainder := len(slice) % n
		var start, end int
		for range n {
			start, end = end, end+size
			if remainder > 0 {
				remainder--
				end++
			}
			if !yield(slice[start:end:end]) {
				return
			}
		}
	}
}

@earthboundkid earthboundkid changed the title slilces: Add Divide slices: Add Divide Feb 5, 2024
@robpike
Copy link
Contributor

robpike commented Feb 5, 2024

At least for those of us with a Unix background, Split might be the better name.

But also: Is this worth putting in the standard library? It's not a common operation in my experience. And as you demonstrate, it's not hard.

@earthboundkid
Copy link
Contributor Author

I think it's tricky enough to warrant being in the standard library. I forget where the last time I needed it was, but I've definitely needed it in the last couple of years and just wrote a worse version that used a slice of slices and didn't handle the edge cases correctly.

@leaxoy
Copy link

leaxoy commented Feb 6, 2024

Should this live in iter or xiter package? And naming it to chunk? So signature can change to:

func Chunk[E any](s Seq[E], n int) Seq[Seq[E]] {
    // implementation
}

@avamsi
Copy link

avamsi commented Feb 6, 2024

@leaxoy there's a different proposal (#53987) for slices.Chunk (the int argument it takes is the chunk size) and there might be an xiter version if the slices version is accepted. slices.Divide (the int argument it takes is the number of chunks) OTOH needs len upfront to calculate the chunk size, so I don't think an iter / xiter version is possible.

@randall77
Copy link
Contributor

I think it would be very confusing to have an iterator chunker that takes the max size of the chunk, and a slice chunker that takes the number of chunks desired. Then slices.Chunk(s, n) and iter.Chunk(iter.FromSlice(s), n) do entirely different things.

@seankhliao seankhliao changed the title slices: Add Divide proposal: slices: add Divide (split into equally sized chunks) Feb 6, 2024
@gopherbot gopherbot added this to the Proposal milestone Feb 6, 2024
@apparentlymart
Copy link

apparentlymart commented Feb 6, 2024

On the topic of naming, it might be of interest that when I first read this proposal I misunderstood (based on the name and the short summary) that this was "divide a slice into chunks of size N (with a possible remainder)" rather than "divide a slice into N chunks of (roughly) the same length". I've seen the former more often in other languages and so I suppose that's why my mind went there first.

Unless I'm the only one with that confusion, it seems to me like the name should try to communicate whether the int parameter is the length of the iterator or the length of the slices in the iterator.

Sadly, my only idea for that so far was to put an N directly in the name to illustrate which of the operands of "division" is being specified as an argument, which seems overly quirky:

NChunks(slice, chunkCount)
ChunksOfN(slice, chunkLength)

Hopefully either I'm the only one with this confusion (and thus it isn't a problem that needs solving at all) or someone else has a better imagination than me and suggests something more normal-looking.

Edit: Sorry, I had this comment written in my browser and didn't submit it immediately due to a distraction, and then when I submitted it I found it was now a little redundant with earlier comments. I hope it's still at least a little useful.

@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Feb 6, 2024
@earthboundkid
Copy link
Contributor Author

I agree that there's a question of which is Chunk and which is Divide and it's probable that you could get confused about which does what, but I'm not sure if there's a better solution than just having both under those names. ChunkIntoN (=Divide) and ChunksOfN (=Chunk) are a little more clear I guess, but at the cost of being a lot more ugly.

@rsc
Copy link
Contributor

rsc commented Feb 7, 2024

If we accept Chunk, then Divide is:

func Divide[S ~[]E, E any](slice S, n int) iter.Seq[S] {
    return Chunk(slice, (len(slice)+n-1)/n)
}

Do I have that right?

@avamsi
Copy link

avamsi commented Feb 7, 2024

Divide(slice len 10, n 4) should ideally yield sub-slices of lens 3, 3, 2, 2.
Using Chunk would yield sub-slices of lens 3, 3, 3, 1.

https://github.com/avamsi/ergo/blob/bc8d6f210e71d95a8e63c513c49ed5f18d8e9710/slices/slices_test.go#L51-L62

@robpike
Copy link
Contributor

robpike commented Feb 7, 2024

Is this facility important enough to commit to the standard library? The discussion shows the problem itself isn't even well understood.

Beware the endless addition of "things one can do with a slice" (or map) to the library. Not every operation needs formal support, and the more things you put in the library, the more things need support from an already busy group.

@veggiemonk
Copy link

Indeed, I think it would lead to more confusion to know which one does what.
I'm pretty sure 99% of people will be fine with just Chunk and move on with their live. Those who do care will easily find the solution or write one.

I didn't think of all the possible ramifications when I restarted the discussion in #53987. (my first time participating on this repo).

I appreciate the discussion but I would leave Divide out of standard library.

Thank you all for your time and attention.

@rsc rsc moved this from Incoming to Active in Proposals Feb 9, 2024
@rsc
Copy link
Contributor

rsc commented Feb 9, 2024

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@Merovius
Copy link
Contributor

Merovius commented Feb 9, 2024

I think it's tricky enough to warrant being in the standard library.

To me, most of the trickiness seems to be not in the implementation, but in deciding what the implementation should do. ISTM that mainly, there are cases (I wouldn't even call them "edge" cases - numbers not being divisible is the norm) where there are multiple reasonable answers and it's ambiguous what a user might want.

That, to me, speaks against inclusion in the standard library. Inclusion in the standard library would be a canonical policy decision, yet there doesn't seem to be a canonical policy on what users want.

@deefdragon
Copy link

I have implemented splitting of arrays several times, but I think only once or twice have I explicitly re-used the same method to do the splitting, almost always re-writing it based on the surrounding context. I am with Merovius, that this proposal is going to be down to what the implementation should do.

Expanding their comment with examples, Even if we decide to go with one particular "packing" over another (IE 3, 3, 2, 2 over 3, 3, 3, 1) we still have to decide the how of that packing. Are we going to "deal out" the items as they come in? Perhaps we take everything in, and then just "cut the deck" into equal chunks? IE:

"Deal Out"
0: a,e,i,m
1: b,f,j,n
2: c,g,k,o
3: d,h,l,p


"Cut Deck"
0: a,b,c,d
1: e,f,g,h
2: i,j,k,l
3: m,n,o,p

Or perhaps we do none of these, explicitly shuffling the items in the same vein as how maps are "shuffled" to prevent predicting and packing the map?

I feel that this is a situation where we need to examine what code publicly exists already. That would show us
1: That there is sufficient demand for this to begin with
2: What the most common implementation for dividing is
3: How often the implementation actually even matters

Unless/Until we get that examination, I cant see this proposal being accepted.

@rsc
Copy link
Contributor

rsc commented Feb 14, 2024

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Decline in Proposals Feb 14, 2024
@rsc
Copy link
Contributor

rsc commented Mar 1, 2024

No change in consensus, so declined.
— rsc for the proposal review group

@rsc rsc moved this from Likely Decline to Declined in Proposals Mar 1, 2024
@rsc rsc closed this as completed Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Declined
Development

No branches or pull requests