Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: expose mvcc timestamps to SQL #50519

Closed
wants to merge 2 commits into from
Closed

Conversation

rohany
Copy link
Contributor

@rohany rohany commented Jun 23, 2020

Fixes #50102.

This PR adds introspection into the KV layer's concurrency control from
the SQL level. In particular, we now expose the MVCC HLC timestamp of a
row as a special system column on every table. This system column is
exposed as a decimal, and is computed as wall time * 10^10 + logical time.

To accomplish this, this PR adds planning and execution infrastructure
for implicit system columns that can be produced by the execution layer
for a particular row.

Release note (sql change): Expose the MVCC timestamp of each row as a
system column on tables.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rohany
Copy link
Contributor Author

rohany commented Jun 23, 2020

I'm looking for some early feedback on this PR, on whether this approach looks sane, or if there is another way. All that works right now is just doing:

select mvcc_timestamp, cols... from t

I need help/discussion on:

  • Should this notion of a "system column" escape the optimizer or not? Adding the system column to the optTable works fine throughout the optimizer codepaths, but breaks down when leaving the optimizer (I've "handled" the scan case, but I'm running into some problems now with mutations). Throughout physical planning and processor setup, we constantly look at the column set on the raw descriptor. There are variety of places where we won't find the system column ID that was requested, or there will be an off by one index due to planning saying a system column will be present, while the descriptor doesn't think so. To get around some of this, I'm synthesizing column descriptors (which is quite ugly). Edit: after hacking on this for a while, I'm getting more convinced that these system columns should actually live on the TableDescriptor, but then have special handling in some other places to ensure that they don't get written to or display in certain contexts. There's also a problem with migrating existing tables to having this column.
  • I think that the timestamp column should never be part of columns fetched or written as part of mutations (and should be banned from being explicitly written to). I'm not sure what's the best place in the optimizer to do this check.
  • What ID should we assign to this system column? I've been using desc.NextColumnID, but this feels like a hack, and violates the assumption that all of the column ID's are less than this. One possibility that is somewhat related is using MaxInt as the ID. However, this has the problem of causing all of the FastIntSets we use everywhere to fall back to the slow implementation.
  • What is the type that we want to expose for this column? There is an existing function that converts the hlc to a DDecimal by shifting up the timestamp portion and adding the logical portion.

@rohany
Copy link
Contributor Author

rohany commented Jun 23, 2020

What is the type that we want to expose for this column? There is an existing function that converts the hlc to a DDecimal by shifting up the timestamp portion and adding the logical portion.

@ajwerner if you have opinions on this one

@ajwerner
Copy link
Contributor

What is the type that we want to expose for this column? There is an existing function that converts the hlc to a DDecimal by shifting up the timestamp portion and adding the logical portion.

@ajwerner if you have opinions on this one

Yeah I think we've converged on the decimal as our full-fidelity SQL representation of HLCs.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I have any input on the questions you posed. I'll let others share their perspectives.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @yuzefovich)

@RaduBerinde
Copy link
Member

Some interesting issues here. For the optimizer perhaps it would be better to expose a special API MVCCTimestampColumn() Column (it would return nil for virtual tables). We would only use it in Metadata.AddTable and in the optbuilder. That way we won't need to do anything special for mutations. But note that lookup joins must be able to produce this column when scanning the primary index.

Actually, this brings up a question - is this column really only available in the primary index? What if we have duplicate indexes? Wouldn't we be able to get it from any index that stores all columns?

@rohany
Copy link
Contributor Author

rohany commented Jun 24, 2020

Hmm, I'll give that a try. Is there a hook in the optbuilder to do this like Metadata.AddTable, or is that it?

But note that lookup joins must be able to produce this column when scanning the primary index.

I have logic for this right now, so hopefully it transfers over.

Actually, this brings up a question - is this column really only available in the primary index? What if we have duplicate indexes? Wouldn't we be able to get it from any index that stores all columns?

I think technically we could get it from any covering index, but for a v0 we'll just start with this.

@RaduBerinde
Copy link
Member

For the optbuilder, look at buildScan, probably just adding the column to outScope.cols.

I think technically we could get it from any covering index, but for a v0 we'll just start with this.
Sure, we should just make sure the cat API is extensible. I think it would be, we'd just add a bool function for indexes to indicate if we can read that column.

@rohany
Copy link
Contributor Author

rohany commented Jun 24, 2020

For the optbuilder, look at buildScan, probably just adding the column to outScope.cols.

Yup, thats what I did, and it looks like most of the out of bounds error in mutations have gone away :D

@rohany
Copy link
Contributor Author

rohany commented Jun 24, 2020

@RaduBerinde can you take another quick peek and tell me what you think so far? There are still some places where I have to fabricate column descriptors. To avoid this, I think it might be reasonable to move these system columns to the ImmutableTableDescriptor on a new field, and expose some utilities for accessing them.

One thing I don't like now is that there are some extra column ordinals that are "valid", but aren't visible to the output of cat.Table.DeleteableColumnCount() etc.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @rohany)


pkg/sql/opt/metadata.go, line 341 at r1 (raw file):

	colCount := tab.DeletableColumnCount()
	if md.cols == nil {
		md.cols = make([]ColumnMeta, 0, colCount)

This should be the total column count.


pkg/sql/opt/cat/table.go, line 73 at r1 (raw file):

	Column(i int) Column

	SystemColumns() []Column

If you look at the rest of the interface, we are avoiding having to create slices just to return things. Everything has a count and an accessor for a given element.

I would just add a DeletableAndSystemColumnCount() instead (and still use Column() to access the details of the column).

@rohany rohany force-pushed the mvcc-col branch 4 times, most recently from c490abd to a3b7492 Compare June 27, 2020 02:50
@rohany
Copy link
Contributor Author

rohany commented Jun 27, 2020

Alright, I'm feeling much better about this PR. I've set this up so that it is easy to add new system columns. For example, we could also add an approximate timestamp column that just uses the physical portion of the HLC to give users a free "updated at" column. Care to take another look @RaduBerinde?

Things remaining (off of the top of my head):

  • Add support for the cfetcher to do this timestamp tracking
  • Add more tests
  • Ensure that the planning into distsql specs directly can perform this logic too.
  • Disable new column creation of system column names
  • Don't add the system column if a colliding column name already exists in the table
  • Add it to sqlsmith
  • Ensure that requests for the mvcc column don't get split across families

@rohany rohany force-pushed the mvcc-col branch 6 times, most recently from 83b2b1b to a075e14 Compare June 28, 2020 03:16
@rohany rohany changed the title wip: expose mvcc timestamps to SQL sql: expose mvcc timestamps to SQL Jun 28, 2020
@rohany rohany force-pushed the mvcc-col branch 2 times, most recently from 0fd6841 to 8d418c5 Compare July 9, 2020 20:11
@rohany
Copy link
Contributor Author

rohany commented Jul 9, 2020

Ah it is impossible to keep this getting conflicts :(

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimizer part :lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, @rohany, and @yuzefovich)


pkg/sql/opt/memo/statistics_builder.go, line 2267 at r13 (raw file):

	colSet opt.ColSet, withScan *WithScanExpr,
) *props.ColumnStatistic {

[nit] don't add this random blank line

Fixes cockroachdb#50102.

This PR adds introspection into the KV layer's concurrency control from
the SQL level. In particular, we now expose the MVCC HLC timestamp of a
row as a special system column on every table. This system column is
exposed as a decimal, and is computed as `wall time * 10^10 + logical time`.

To accomplish this, this PR adds planning and execution infrastructure
for implicit system columns that can be produced by the execution layer
for a particular row.

Release note (sql change): Expose the MVCC timestamp of each row as a
system column on tables.
This commit updates the opt catalog to include the MVCC system column in
all tables, and updates the existing opt tests after the inclusion.

Release note: None
@rohany
Copy link
Contributor Author

rohany commented Jul 10, 2020

Thanks all for the reviews and guidance.

I'm going to attempt to merge this, and then we can discuss the name (which will be easier to adjust later).

I'll also create issues for related things, like adding to SQLSmith, and a builtin that extracts certain time components from this decimal.

bors r=raduberinde

@craig
Copy link
Contributor

craig bot commented Jul 10, 2020

Build failed

@rohany
Copy link
Contributor Author

rohany commented Jul 10, 2020

I think its a flake, retrying

bors r=raduberinde

@andy-kimball
Copy link
Contributor

I saw this PR mentioned in the milestone doc - sorry I didn't take a look earlier. I'm concerned that this approach doesn't allow indexing. How sure are we that a column that cannot be indexed will be useful in real-world applications? Did we verify with customers who have asked for this feature that it's still useful to them with that limitation?

@rohany
Copy link
Contributor Author

rohany commented Jul 10, 2020

bors r-

@craig
Copy link
Contributor

craig bot commented Jul 10, 2020

Canceled

@rohany
Copy link
Contributor Author

rohany commented Jul 10, 2020

@andy-kimball you're right that it doesn't allow for indexing, but discussions around this feature indicate that it would be useful even if we couldn't index it (just to get a free updated_at column for display purposes, or to use when finding a good time to backup from). There is some more context on the issue #50102 and the slack thread it links.

I'll hold off on the merge until you get a chance to respond

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis, @rohany, and @yuzefovich)


pkg/sql/opt/cat/table.go, line 62 at r14 (raw file):

	// DeletableAndSystemColumnCount returns the number of DeletableColumns as
	// well as the extra system columns, such as the MVCC timestamp column.
	DeletableAndSystemColumnCount() int

I'm concerned about adding yet another kind of weird column. Those darn mutation columns are already the source of so many bugs, and have caused us to reach crazy levels of complexity in the mutation code. I want to be 100% sure that this is a compelling feature that is worth significant maintenance pain, probably forever. Because that is what we'll be getting with this design, unfortunately. Every new developer will struggle with these complex concepts, or even worse, not know about them, and sprinkle bugs all over the place where they don't properly consider the question of, "how does my code work with MVCC timestamp columns"?

@rohany
Copy link
Contributor Author

rohany commented Jul 10, 2020

Those darn mutation columns are already the source of so many bugs, and have caused us to reach crazy levels of complexity in the mutation code.

You have more insight into this than I do, but I imagine that these system columns can be mostly forgotten by future developers -- we only include them in the scope in certain places, and disallow them from being written in this PR.

I want to be 100% sure that this is a compelling feature that is worth significant maintenance pain, probably forever.

I was working on this because I found it interesting so I think that it would be useful. Maybe @ajwerner or @solongordon could give some more context on how much we want it or not.

@andy-kimball
Copy link
Contributor

OK, this is a super-massive PR that will take time to think about. My main concern is that we're creating another "interleaved" feature that has its uses, and is a neat feature from a customer perspective, but that also makes every thing that touches it more complex. I think @RaduBerinde has had more contact with this PR - I can talk with him to get more details about optimizer impact, and long-term maintenance burden.

@andy-kimball
Copy link
Contributor

andy-kimball commented Jul 10, 2020

Strange, I can't seem to find this branch in your repo... I'm trying to look at it locally, b/c the code review tools are nearly impossible with this size PR. Maybe if all the test diffs were in a separate commit? I think with this size, that'd make it way easier to review.

@rohany
Copy link
Contributor Author

rohany commented Jul 10, 2020

@andy-kimball
Copy link
Contributor

Whoops, I used cockroachdb rather than cockroach. Thanks.

@andy-kimball
Copy link
Contributor

andy-kimball commented Jul 10, 2020

Did you consider the approach of making this column a "hidden" column like rowid? With perhaps an extra accessor on the Column interface called IsSystem() bool, where both IsHidden() bool and IsSystem() bool return true? That'd have been my instinct, since adding extra columns after the mutation columns sets off alarm bells for me. Those mutation columns were always meant to be the last columns, and I wouldn't be surprised if there are subtle reasons why it's not safe to add columns after them.

Also, was there any builder performance regression that came from adding a new system column to every table? Especially on super simple KV and TPCC queries in opt/bench? Something like:

GOMAXPROCS=1 make bench PKG=./pkg/sql/opt/bench TESTTIMEOUT=9999s TESTFLAGS='-count 10 -logtostderr=false -benchmem' BENCHES=Benchmark//OptBuild | tee out && bin/benchstat out && rm out

@rohany
Copy link
Contributor Author

rohany commented Jul 10, 2020

It is hidden like rowid, but i don’t think we want real column descriptors for it — things break down outside of the optimizer where the interfaces aren’t as clean. Additionally, migrations aren’t something I wanted to deal with.

Where would you have put these columns, if not after the mutation columns? It felt like the right place to put them, given that we only will use them in very specific contexts.

Re benchmarks, no I didn’t run any, but I can do so. I don’t foresee a perf impact at the execution layer, since we don’t do anything if system columns aren’t requested.

@andy-kimball
Copy link
Contributor


pkg/sql/opt/norm/prune_cols_funcs.go, line 187 at r6 (raw file):

Previously, RaduBerinde wrote…

That's fine, I would just add optbuilder tests which verify that we error when the column is used.

I'm still confused on this. If we never allow these columns to be part of the FetchCols and/or ReturnCols, then why do we need to remove them here? Can you show the test case that hits the cols.Remove logic below so I can see the reason?

Copy link
Contributor Author

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @jordanlewis, @RaduBerinde, @rohany, and @yuzefovich)


pkg/sql/opt/norm/prune_cols_funcs.go, line 187 at r6 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I'm still confused on this. If we never allow these columns to be part of the FetchCols and/or ReturnCols, then why do we need to remove them here? Can you show the test case that hits the cols.Remove logic below so I can see the reason?

I don't have a test case that causes it, but this rule adds in the system columns into the set of needed columns via line 152. If this rule fires, then we'd add back in the system columns which would be problematic.

@andy-kimball
Copy link
Contributor


pkg/sql/opt/norm/prune_cols_funcs.go, line 187 at r6 (raw file):

Previously, rohany (Rohan Yadav) wrote…

I don't have a test case that causes it, but this rule adds in the system columns into the set of needed columns via line 152. If this rule fires, then we'd add back in the system columns which would be problematic.

So are system columns part of the table descriptor or the index descriptor, or is there some other way of storing their metadata? Is there a comment (or comments) I can read to understand how these things work?

@rohany
Copy link
Contributor Author

rohany commented Jul 21, 2020

Closing this in favor of #51494

@rohany rohany closed this Jul 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: expose MVCC timestamps to SQL
8 participants