Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colexec: implement vectorized ntile window function #64977

Merged
merged 1 commit into from
May 18, 2021

Conversation

DrewKimball
Copy link
Collaborator

@DrewKimball DrewKimball commented May 11, 2021

This patch implements the ntile window function in the vectorized
engine. ntile takes in an integer argument num_buckets and then
distributes all rows in a partition equally between the buckets,
outputting the bucket number for each row.

In the vectorized implementation, batches are buffered until the end
of a partition is reached, at which point the ntile bucket values can
be calculated. The batches are emitted in a streaming fashion; as soon
as a batch is fully processed, it is returned and work is paused until
the next call to Next().

See #37035

Release note (sql change): the vectorized engine now supports the ntile window function.

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented May 11, 2021

CLA assistant check
All committers have signed the CLA.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@DrewKimball DrewKimball force-pushed the ntile branch 2 times, most recently from b7cabf9 to 577461d Compare May 13, 2021 00:38
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work! I have some suggestions and a bunch of nit comments, and I will definitely want to take another close look to follow the logic, but I think the PR is in a very good shape.

nit: in the commit title, exec -> either colexec or colexecwindow.

Reviewed 11 of 11 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball)


Makefile, line 913 at r1 (raw file):

  pkg/sql/colexec/colexecwindow/row_number.eg.go \
  pkg/sql/colexec/colexecwindow/window_peer_grouper.eg.go \
  pkg/sql/colexec/colexecwindow/ntile.eg.go

nit: keep these lexicographically sorted as other execgen targets.


pkg/sql/colexec/colexecwindow/BUILD.bazel, line 72 at r1 (raw file):

    ("row_number.eg.go", "row_number_tmpl.go"),
    ("window_peer_grouper.eg.go", "window_peer_grouper_tmpl.go"),
    ("ntile.eg.go", "ntile_tmpl.go"),

ditto


pkg/sql/colexec/colexecwindow/ntile.eg.go, line 72 at r1 (raw file):

	// nTileLoading is the state in which ntile operators load an additional
	// input batch into the currentBatch field. If necessary, the old value of
	// currentBatch will be pushed to the buffer queue.

It would be helpful to mention under which events one state is changed into another.


pkg/sql/colexec/colexecwindow/ntile.eg.go, line 188 at r1 (raw file):

		&colexecutils.NewSpillingQueueArgs{
			UnlimitedAllocator: r.allocator,
			Types:              append(r.inputTypes, types.Int),

nit: it might be worth extracting this out and performing a deep copy of


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 85 at r1 (raw file):

	nTileLoading nTileState = iota
	// nTileSeeking is the state in which ntile operators seek to the index of the
	// next partition. And retrieve the number of ntile buckets for the current

super nit: two sentences read a bit weird for me, maybe combine them into one?


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 192 at r1 (raw file):

	// {{end}}
	if partitionCol[i] {
		// Don't break for the start of the current partition.

I'm a bit confused by this comment, could you expand a bit?

Also the comment above _GET_NUM_BUCKETS says that if a new partition begins at index i, this code snippet records the size of the partition, how is that done? Is the comment stale.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 193 at r1 (raw file):

	if partitionCol[i] {
		// Don't break for the start of the current partition.
		// {{if $.HasSel}}

Seems like the code in the if case is the same as in else case.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 204 at r1 (raw file):

	}
	// {{end}}
	if !argNulls.NullAt(i) {

nit: we usually template out "definitely no nulls" vs "maybe has nulls" cases, I'm not sure how important will be to do it here, possibly leave a TODO.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 267 at r1 (raw file):

} // */}}

var errInvalidArgumentForNtile = pgerror.Newf(

nit: we could export the error in window_builtins.go and reuse it here.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 287 at r1 (raw file):

		&colexecutils.NewSpillingQueueArgs{
			UnlimitedAllocator: r.allocator,
			Types:              append(r.inputTypes, types.Int),

It might be a bit safer to allocate a deep copy of inputTypes and append types.Int to it. It's possible that the caller's input types have extra capacity, so there will be no new allocation when we append a single types.Int, yet the caller might pass the same input types to a next operator which might modify it. I don't think that this would happen in practice at the moment because of our limited support of window functions, but in the future it could.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 329 at r1 (raw file):

			n := batch.Length()
			sel := batch.Selection()
			r.allocator.PerformOperation(r.currentBatch.ColVecs(), func() {

I just realized that here we're performing a deep copy of outputIdx vector because it was appended to the input batch by the vectorTypeEnforcer (we have the same issue in other window functions). However, copying the output column seems pointless, it might be worth leaving a TODO about this.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 350 at r1 (raw file):

			argVec := r.currentBatch.ColVec(r.argIdx)
			argNulls := argVec.Nulls()
			argCol := argVec.Int64()

It would be great to add logic tests where the argument is of INT2 and INT4 types to make sure this works ok. (I think the unit test in colexecwindow package would crash, but that's probably ok.)


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 356 at r1 (raw file):

			sel := r.currentBatch.Selection()
			if sel != nil {

Hm, this if has quite a bit of code duplication. From what I can tell the crux is that we want i to serve the same meaning (the index of the row we're looking at) in sel and no-sel cases, yet we have different ways to get it; also in the latter case i serves the same goal as selIdx. I think we could use the same for loop in both cases like

for selIdx := r.seekIdx; selIdx < n; selIdx++

and introduce a bit more templating to get the index with something like

	// {{if .HasSel}}
	i = sel[selIdx]
	// {{else}}
	i = selIdx
	// {{end}}

(take a look at vec_to_datum_tmpl.go for inspiration).

This should allows us to remove the code duplication here.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 362 at r1 (raw file):

					// We must first find the first row for which num_buckets is not null.
					for selIdx, i = range sel[selIdx:n] {
						_GET_NUM_BUCKETS(true)

I think I would get rid off _GET_NUM_BUCKETS and _COMPUTE_PARTITION_SIZE entirely and inline their logic if we are templating out this whole nTileSeeking block.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 376 at r1 (raw file):

				r.nextPartitionIdx = selIdx
				// {{else}}
				r.partitionSize += len(sel[:n])

nit: this is just + n, right?


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 387 at r1 (raw file):

					// We must first find the first row for which num_buckets is not null.
					// {{if .HasPartition}}
					_ = partitionCol[i]

Is this line needed? I would assume given the line above it is not.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 399 at r1 (raw file):

				// partition.
				if i < n {
					_ = partitionCol[i]

ditto


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 486 at r1 (raw file):

	r.state = nTileProcessing
	if !r.hasArg {
		return

nit: a quick comment why we're short-circuiting here would be helpful.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 532 at r1 (raw file):

	}
	var lastErr error
	if err := r.bufferQueue.Close(ctx); err != nil {

nit: you can just return the result of Close.


pkg/sql/colexec/execgen/cmd/execgen/ntile_gen.go, line 1 at r1 (raw file):

// Copyright 2019 The Cockroach Authors.

nit: s/2019/2021/.


pkg/sql/logictest/testdata/logic_test/window, line 1061 at r1 (raw file):


query IIII
SELECT k, w, v, ntile(v) OVER (PARTITION BY w ORDER BY v, k) FROM kv ORDER BY 1

nit: maybe order the output by w, v, k so that it was easier to check the results visually.

Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look! Your comments have been very helpful.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @yuzefovich)


Makefile, line 913 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: keep these lexicographically sorted as other execgen targets.

Done.


pkg/sql/colexec/colexecwindow/BUILD.bazel, line 72 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

ditto

Done


pkg/sql/colexec/colexecwindow/ntile.eg.go, line 72 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

It would be helpful to mention under which events one state is changed into another.

Done.


pkg/sql/colexec/colexecwindow/ntile.eg.go, line 188 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: it might be worth extracting this out and performing a deep copy of

Done. (I think, the comment got cut off.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 85 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

super nit: two sentences read a bit weird for me, maybe combine them into one?

Oops, done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 192 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I'm a bit confused by this comment, could you expand a bit?

Also the comment above _GET_NUM_BUCKETS says that if a new partition begins at index i, this code snippet records the size of the partition, how is that done? Is the comment stale.

The comment's a bit stale, yeah. I'll change it.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 193 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Seems like the code in the if case is the same as in else case.

Ah, that first i is supposed to be selIdx. Fixed it.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 204 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: we usually template out "definitely no nulls" vs "maybe has nulls" cases, I'm not sure how important will be to do it here, possibly leave a TODO.

I decided not to do that because the loop exits immediately upon the first non-null anyway, so it probably wouldn't make things much more efficient in the "no nulls" case. Do you still want me to add that TODO?


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 267 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: we could export the error in window_builtins.go and reuse it here.

Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 287 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

It might be a bit safer to allocate a deep copy of inputTypes and append types.Int to it. It's possible that the caller's input types have extra capacity, so there will be no new allocation when we append a single types.Int, yet the caller might pass the same input types to a next operator which might modify it. I don't think that this would happen in practice at the moment because of our limited support of window functions, but in the future it could.

Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 329 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I just realized that here we're performing a deep copy of outputIdx vector because it was appended to the input batch by the vectorTypeEnforcer (we have the same issue in other window functions). However, copying the output column seems pointless, it might be worth leaving a TODO about this.

Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 350 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

It would be great to add logic tests where the argument is of INT2 and INT4 types to make sure this works ok. (I think the unit test in colexecwindow package would crash, but that's probably ok.)

Hm, it crashes. I guess this means I'll have to template out the go type?


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 356 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Hm, this if has quite a bit of code duplication. From what I can tell the crux is that we want i to serve the same meaning (the index of the row we're looking at) in sel and no-sel cases, yet we have different ways to get it; also in the latter case i serves the same goal as selIdx. I think we could use the same for loop in both cases like

for selIdx := r.seekIdx; selIdx < n; selIdx++

and introduce a bit more templating to get the index with something like

	// {{if .HasSel}}
	i = sel[selIdx]
	// {{else}}
	i = selIdx
	// {{end}}

(take a look at vec_to_datum_tmpl.go for inspiration).

This should allows us to remove the code duplication here.

This is a really good idea. I'll push up the changes I've made for your other comments now, and then get to work on this (and the overloads for INT2 and INT4) today or tomorrow.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 376 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: this is just + n, right?

Right, done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 387 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Is this line needed? I would assume given the line above it is not.

When the iterated variable isn't assigned a constant initial value, I think it's necessary. Interestingly, just doing _ = argCol[i] seems to allow the partitionCol bounds check to be removed too... weird.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 399 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

ditto

Same sort of deal as the other one. You have to remind the compiler that i is still in-bounds


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 486 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: a quick comment why we're short-circuiting here would be helpful.

Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 532 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: you can just return the result of Close.

Done.


pkg/sql/colexec/execgen/cmd/execgen/ntile_gen.go, line 1 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/2019/2021/.

Done.


pkg/sql/logictest/testdata/logic_test/window, line 1061 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: maybe order the output by w, v, k so that it was easier to check the results visually.

Done.

@DrewKimball DrewKimball changed the title exec: implement vectorized ntile window function [WIP] colexec: implement vectorized ntile window function [WIP] May 13, 2021
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushing the comments I have so far, will look later today.

Reviewed 3 of 8 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @yuzefovich)


pkg/sql/colexec/colexecwindow/BUILD.bazel, line 22 at r2 (raw file):

        "//pkg/sql/colmem",  # keep
        "//pkg/sql/execinfrapb",  # keep
        "//pkg/sql/pgwire/pgcode",  # keep

I think these are no longer needed given that we're importing the error from sem/builtins.


pkg/sql/colexec/colexecwindow/ntile.eg.go, line 188 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Done. (I think, the comment got cut off.

Yeah, this was about types - I realized I was typing on the generated code and not on the template but forgot to delete this leftover.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 193 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Ah, that first i is supposed to be selIdx. Fixed it.

It'd be great if you could come up with a unit test case when this bug would have been caught.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 204 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I decided not to do that because the loop exits immediately upon the first non-null anyway, so it probably wouldn't make things much more efficient in the "no nulls" case. Do you still want me to add that TODO?

Makes sense, let's ignore that thought.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 287 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Done.

We should extract the append outside of this NewSpillingQueue constructor and reuse the deep-copied types below for currentBatch constructor.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 350 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Hm, it crashes. I guess this means I'll have to template out the go type?

Yeah, we'll need to template the type out - we'll need to generate separate nTileInt64 etc operators which would "encode" which method to use on argVec to get the slice of integers.

Alternatively, we could plan a cast operator on top of input to the nTile in execplan since we know the types then. It might be an easier and cleaner choice.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 84 at r2 (raw file):

	//
	// nTileLoading transitions to nTileSeeking unless the end of the input has
	// been reached, in which the next state is nTileProcessing.

nit: s/in which the/in which case the/.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 148 at r2 (raw file):

	resetCurrentBatch bool

	// bufferQueue stores batches that are waiting to be fully processed and

nit: I think all batches in the queue belong to the same partition, it'd be great to mention that.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 153 at r2 (raw file):

	// nextPartitionIdx is the index of the start of the next partition within the
	// current batch. If the next partition does not start in the current batch,

The second sentence here indicates that we're using nextPartitionIdx for two purposes, so I wonder whether it would be better to introduce a special sentinel value (e.g. -1) to indicate that the next partition doesn't start in the current batch. I imagine this would come with some perf hit, but I think it'd be acceptable if the resulting code is more clear, wdyt?

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some more comments, but I think the code makes sense to me. I'll wait for you to figure out the templating of nTileSeeking state, but looks great overall.

Reviewed 5 of 8 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @yuzefovich)


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 312 at r2 (raw file):

		switch r.state {
		case nTileLoading:
			if r.resetCurrentBatch {

Hm, I wonder whether we can remove resetCurrentBatch boolean entirely and always reset the current batch after enqueueing it into the buffer queue if its length is greater than 0 (if it is zero, then the batch must be in reset state already). (I'm just trying to think how we can reduce the amount of state we're keeping track of.)


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 439 at r2 (raw file):

				}
				if r.processingIdx >= output.Length() {
					// This batch has already been fully processed.

When can this occur?


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 473 at r2 (raw file):

				break
			}
			// There are no tuples. This can happen if the first batch returned by the

nit: I think it might be a bit cleaner to detect this scenario in nTileLoading state and transitioning to nTileFinished state right away. My reasoning is that nTileProcessing state is more complicated, so it's better to move edge cases elsewhere.


pkg/sql/logictest/testdata/logic_test/window, line 1061 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Done.

I meant that it might be worth ordering the output on PARTITION BY followed by ORDER BY columns of window functions (but probably not in all cases). I.e. in this case it would be ORDER BY w, v, k - this way it'll be very clear where the partition boundaries are and then what was the internal ordering for window function processing. Feel free to ignore this idea though.


pkg/sql/sem/builtins/window_builtins.go, line 593 at r2 (raw file):

}

var ErrInvalidArgumentForNtile = pgerror.Newf(

nit: this will need a comment now to satisfy the linter.

Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball and @yuzefovich)


pkg/sql/colexec/colexecwindow/BUILD.bazel, line 22 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I think these are no longer needed given that we're importing the error from sem/builtins.

Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 193 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

It'd be great if you could come up with a unit test case when this bug would have been caught.

Hm, it doesn't seem like the sel != nil branch is ever being reached, even for the other window functions. I'll have to investigate this.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 287 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

We should extract the append outside of this NewSpillingQueue constructor and reuse the deep-copied types below for currentBatch constructor.

Ah, didn't even notice that use of inputTypes. Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 350 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Yeah, we'll need to template the type out - we'll need to generate separate nTileInt64 etc operators which would "encode" which method to use on argVec to get the slice of integers.

Alternatively, we could plan a cast operator on top of input to the nTile in execplan since we know the types then. It might be an easier and cleaner choice.

I don't think templating should be too hard in this case since I think argCol := argVec.Int64() is the only line that needs to change. It is a bit unfortunate to have that much extra generated code for just a single line though... Do you think the performance gains will be significant from templating this / is it worth worrying about the INT2 and INT4 cases that much?


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 84 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/in which the/in which case the/.

Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 148 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: I think all batches in the queue belong to the same partition, it'd be great to mention that.

Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 153 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

The second sentence here indicates that we're using nextPartitionIdx for two purposes, so I wonder whether it would be better to introduce a special sentinel value (e.g. -1) to indicate that the next partition doesn't start in the current batch. I imagine this would come with some perf hit, but I think it'd be acceptable if the resulting code is more clear, wdyt?

Yeah, I could do that if you think it'd be more clear. I do kind of like how it is, since you just have to set nextPartitionIdx to i after the conclusion of the loop to get this behavior. It would be pretty easy to use the sentinel value as well, though.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 312 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Hm, I wonder whether we can remove resetCurrentBatch boolean entirely and always reset the current batch after enqueueing it into the buffer queue if its length is greater than 0 (if it is zero, then the batch must be in reset state already). (I'm just trying to think how we can reduce the amount of state we're keeping track of.)

The issue is that we don't always want to push to the queue as things are right now. When the argument is null up to the end of a batch, we want to emit the batch immediately, which means we can't reset it until the next call to Next.

I guess other options are to copy the batch before emitting or to not emit immediately and just increment processingIdx accordingly. This second option would definitely require this bit of logic you asked about earlier:

if r.processingIdx >= output.Length() {
    // This batch has already been fully processed.
    r.processingIdx -= output.Length()
        return output
}

That's because processingIdx would then refer to a point beyond the end of the current batch.

I think I'll try and go for this second option. Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 439 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

When can this occur?

If the queue ever split up batches after they were put in, something like this could happen - say, the first row of a batch is a fully processed partition, and the next partition continues from the next row onto the next batch. If the first batch popped from of the queue only contained that first row, r.processing would refer to an index beyond the length of the dequeued batch.
Can I rely on the queue never doing something like this?

Edit: Since I decided to just push batches that are null up to the end onto the queue, this branch can be hit when one of those batches is dequeued.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 473 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: I think it might be a bit cleaner to detect this scenario in nTileLoading state and transitioning to nTileFinished state right away. My reasoning is that nTileProcessing state is more complicated, so it's better to move edge cases elsewhere.

Done. I've added an assertion in that place instead.


pkg/sql/logictest/testdata/logic_test/window, line 1061 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I meant that it might be worth ordering the output on PARTITION BY followed by ORDER BY columns of window functions (but probably not in all cases). I.e. in this case it would be ORDER BY w, v, k - this way it'll be very clear where the partition boundaries are and then what was the internal ordering for window function processing. Feel free to ignore this idea though.

Oh, not sure why I read that the way I did. Done.


pkg/sql/sem/builtins/window_builtins.go, line 593 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: this will need a comment now to satisfy the linter.

Done.

@DrewKimball DrewKimball force-pushed the ntile branch 3 times, most recently from 1f30b15 to 56ba11d Compare May 17, 2021 20:04
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have the last round of comments, the only concerning is the question in nTileProcessing state, others seem minor or nits.

Reviewed 6 of 7 files at r3, 4 of 4 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball)


pkg/sql/colexec/colbuilder/execplan.go, line 1181 at r4 (raw file):

								"failed to cast window function argument to type %v", expectedType))
						}
						typs = typs[:castIdx+1]

nit: these two lines could be replaced with typs = append(typs, expectedType) which looks more Go-natural to me.


pkg/sql/colexec/colexecwindow/BUILD.bazel, line 52 at r4 (raw file):

        "//pkg/sql/execinfra",
        "//pkg/sql/execinfrapb",
        "//pkg/sql/sem/builtins",

nit: I think the linter tells us that this line is not necessary.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 193 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Hm, it doesn't seem like the sel != nil branch is ever being reached, even for the other window functions. I'll have to investigate this.

Indeed, very nice find!


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 350 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I don't think templating should be too hard in this case since I think argCol := argVec.Int64() is the only line that needs to change. It is a bit unfortunate to have that much extra generated code for just a single line though... Do you think the performance gains will be significant from templating this / is it worth worrying about the INT2 and INT4 cases that much?

I don't think it'll be worth it to template the type out - I imagine that INT2 and INT4 are edge cases, so it's ok if our performance is slightly suboptimal (still, I imagine the vectorized cast + vectorized ntile will be much faster than the row execution, so overall it'll be a nice speedup). Plus the cast operation between integers of different widths (especially when upcasting to larger width) should be very quick so that it could be considered negligible.

So I think the cleanliness of the ntile template and its maintainability is more important in this case.

Update: I see that you went with the cast, LGTM.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 84 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Done.

nit: I think in this line there is one more instance of this.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 153 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Yeah, I could do that if you think it'd be more clear. I do kind of like how it is, since you just have to set nextPartitionIdx to i after the conclusion of the loop to get this behavior. It would be pretty easy to use the sentinel value as well, though.

Great, let's keep it as is.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 80 at r3 (raw file):

// Transition graph for nTileState:
//     ┌──────────────────┬───────────┐
//     │                  │TileLoading├────────────────┐

Awesome diagram!

nit: s/TileLoading/nTileLoading/


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 176 at r3 (raw file):

	// processingIdx is an index into the tuples buffered in the queue, or into
	// currentBatch if the queue is empty). It is used to preserve state when the

nit: missing opening parenthesis.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 222 at r4 (raw file):

var _ colexecop.Operator = &_NTILE_STRINGOp{}

func (r *_NTILE_STRINGOp) Init(ctx context.Context) {

nit: in order to reduce the amount of generated code, we can define all methods that don't care about the type of nTile operator (i.e. w/ or w/o partition) with nTileBase receiver. I think only Next method needs to be operator-specific. However, if you prefer the current way for clarity, feel free to ignore this idea.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 241 at r4 (raw file):

		},
	)
	r.currentBatch = r.allocator.NewMemBatchWithFixedCapacity(outputTypes, coldata.BatchSize())

An idea on how to improve a bit on the initialization is to not allocate a current batch of maximum size originally (this will matter if the input data set is very small). We could refactor the code so that nothing is done to currentBatch here in Init and modify nTileLoading state with something like:

// Load the next batch into currentBatch. If currentBatch still has data,
// move it into the queue.
if r.currentBatch != nil && r.currentBatch.Length() > 0 {
	r.bufferQueue.Enqueue(r.Ctx, r.currentBatch)
}
r.currentBatch, _ = r.allocator.ResetMaybeReallocate(r.outputTypes, r.currentBatch, batch.Length(), math.MaxInt64)

This will allow us to allocate the current batch lazily of just the right size. However, this might not matter that much in the grand scheme of things, so feel free to ignore this idea.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 286 at r4 (raw file):

				r.currentBatch.SetLength(n)
			})
			if r.currentBatch.Selection() != nil {

It actually makes perfect sense that the current batch never has a selection vector - above, we're copying one vector at a time with deselection step - and we never set the selection on the current batch ourselves, so I think I would actually make sense to not have the assertion (it seems a bit like dead code).


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 379 at r4 (raw file):

				// Set all the ntile bucket values that remain unset, then emit this
				// batch. Note that if a batch is in r.bufferQueue, the end of the
				// current partition must be located beyond it.

nit: "beyond it" - is it "beyond the batch" or "beyond the buffer queue"?


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 383 at r4 (raw file):

				r.processingIdx -= output.Length()
				if r.processingIdx < 0 {
					// The first of the tuples to be processed is located in this batch.

I'm a bit confused by this comment - does it mean that the next partition begins in output batch? Does it mean that still unprocessed tuples from the current partition start in output batch? Do we need to deep-copy the batch then?


pkg/sql/colexec/colexecwindow/window_functions_util.go, line 17 at r4 (raw file):

	"github.com/cockroachdb/cockroach/pkg/sql/execinfrapb"
	"github.com/cockroachdb/errors"
	"github.com/cockroachdb/cockroach/pkg/sql/types"

nit: ordering.

Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/colexec/colbuilder/execplan.go, line 1181 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: these two lines could be replaced with typs = append(typs, expectedType) which looks more Go-natural to me.

Done.


pkg/sql/colexec/colexecwindow/BUILD.bazel, line 52 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: I think the linter tells us that this line is not necessary.

Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 193 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Indeed, very nice find!

Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 350 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I don't think it'll be worth it to template the type out - I imagine that INT2 and INT4 are edge cases, so it's ok if our performance is slightly suboptimal (still, I imagine the vectorized cast + vectorized ntile will be much faster than the row execution, so overall it'll be a nice speedup). Plus the cast operation between integers of different widths (especially when upcasting to larger width) should be very quick so that it could be considered negligible.

So I think the cleanliness of the ntile template and its maintainability is more important in this case.

Update: I see that you went with the cast, LGTM.

Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 84 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: I think in this line there is one more instance of this.

Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 153 at r2 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Great, let's keep it as is.

Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 80 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Awesome diagram!

nit: s/TileLoading/nTileLoading/

Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 176 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: missing opening parenthesis.

Done. Changed the comment a bit for clarity as well.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 222 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: in order to reduce the amount of generated code, we can define all methods that don't care about the type of nTile operator (i.e. w/ or w/o partition) with nTileBase receiver. I think only Next method needs to be operator-specific. However, if you prefer the current way for clarity, feel free to ignore this idea.

I like this idea, I'll do it. Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 241 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

An idea on how to improve a bit on the initialization is to not allocate a current batch of maximum size originally (this will matter if the input data set is very small). We could refactor the code so that nothing is done to currentBatch here in Init and modify nTileLoading state with something like:

// Load the next batch into currentBatch. If currentBatch still has data,
// move it into the queue.
if r.currentBatch != nil && r.currentBatch.Length() > 0 {
	r.bufferQueue.Enqueue(r.Ctx, r.currentBatch)
}
r.currentBatch, _ = r.allocator.ResetMaybeReallocate(r.outputTypes, r.currentBatch, batch.Length(), math.MaxInt64)

This will allow us to allocate the current batch lazily of just the right size. However, this might not matter that much in the grand scheme of things, so feel free to ignore this idea.

Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 286 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

It actually makes perfect sense that the current batch never has a selection vector - above, we're copying one vector at a time with deselection step - and we never set the selection on the current batch ourselves, so I think I would actually make sense to not have the assertion (it seems a bit like dead code).

Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 379 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: "beyond it" - is it "beyond the batch" or "beyond the buffer queue"?

I've tried to clarify the comment. Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 383 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I'm a bit confused by this comment - does it mean that the next partition begins in output batch? Does it mean that still unprocessed tuples from the current partition start in output batch? Do we need to deep-copy the batch then?

Yes to the second one, sort of. I need to make the comment past-tense - the first of the unprocessed tuples was located in this batch, but all tuples are now processed, so no copying necessary.


pkg/sql/colexec/colexecwindow/window_functions_util.go, line 17 at r4 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: ordering.

Done.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! :lgtm:

Make sure to update the commit message (to remove "WIP"), and it's probably worth including a release note (that the vectorized engine now supports ntile window function).

Also, please update the PR description and PR title accordingly. It also would be good to link #37035 in the PR description too.

Reviewed 6 of 6 files at r5.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 379 at r4 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I've tried to clarify the comment. Done.

Makes sense, thanks!


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 100 at r5 (raw file):

const (
	// nTileLoading is the state in which case the ntile operators load an

nit: sorry, I think my previous nit comments might have been a bit confusing. I think it makes sense that we're saying "the state in which the ntile operators ..." (in the first paragraphs of the state comments) and something like "unless the end of the input has been reached, in which case the next state ..." (in the second paragraphs of the state comments). Currently, nTileLoading is the state in which case the ntile operators load sounds a bit weird to me. But English is not my native language, feel free to ignore.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 132 at r5 (raw file):

)

// nTileBase extracts common initializations of two variations of nTile

nit: s/nTileBase/nTileInitFields/.

Copy link
Collaborator Author

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 100 at r5 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: sorry, I think my previous nit comments might have been a bit confusing. I think it makes sense that we're saying "the state in which the ntile operators ..." (in the first paragraphs of the state comments) and something like "unless the end of the input has been reached, in which case the next state ..." (in the second paragraphs of the state comments). Currently, nTileLoading is the state in which case the ntile operators load sounds a bit weird to me. But English is not my native language, feel free to ignore.

Ah, I see what you mean. Done.


pkg/sql/colexec/colexecwindow/ntile_tmpl.go, line 132 at r5 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: s/nTileBase/nTileInitFields/.

Done.

@DrewKimball DrewKimball marked this pull request as ready for review May 18, 2021 01:59
@DrewKimball DrewKimball changed the title colexec: implement vectorized ntile window function [WIP] colexec: implement vectorized ntile window function May 18, 2021
This patch implements the `ntile` window function in the vectorized
engine. `ntile` takes in an integer argument `num_buckets` and then
distributes all rows in a partition equally between the buckets,
outputting the bucket number for each row.

In the vectorized implementation, batches are buffered until the end
of a partition is reached, at which point the `ntile` bucket values can
be calculated. The batches are emitted in a streaming fashion; as soon
as a batch is fully processed, it is returned and work is paused until
the next call to `Next()`.

See cockroachdb#37035

Release note (sql change): the vectorized engine now supports the ntile
window function.
@DrewKimball
Copy link
Collaborator Author

Thanks for all the reviewing Yahor!

@DrewKimball
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented May 18, 2021

Build succeeded:

@craig craig bot merged commit 64dad5f into cockroachdb:master May 18, 2021
@DrewKimball DrewKimball deleted the ntile branch May 18, 2021 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants