Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql, *: make CockroachDB understand schema names #22371

Closed
wants to merge 1 commit into from

Conversation

knz
Copy link
Contributor

@knz knz commented Feb 5, 2018

Fixes #22700.

"Why":

Prior to this patch, CockroachDB only recognized names of the form
"db.tbl", like MySQL, whereas PostgreSQL compatibility mandates that
"db.public.tbl" be also valid. This needed a change.

"What"

The visible tip of this iceberg patch is as follows:

  • new name resolution functions which are more correct and more easy
    to use than the previous code!

    Use Normalize() on a TableName, then ResolveExistingObject()
    or ResolveTargetObject().

  • this in turn relies on generic, well-abstracted name resolution
    algorithms in sem/tree/name_resolution.go, the definition
    of which closely follows the specs in the accompanying RFC (rfcs: proposal to make schemas more pg-compatible #21456).

    This handles the pg syntax for catalog and schema names, together
    with compatibility rules with CockroachDB v1.x.

  • a new schema accessor interface that truly encapsulates schema access!
    Check out the gorgeous documentation in sql/schema_accessors.go.

    In particular:

    • planner now implements a new interface SchemaResolver. Use it.

    • if you are not happy with SchemaResolver directly, use
      SchemaAccessor or its various consistuent interfaces.

      Depending on whether virtual schemas are to be considered, consider
      alternatively planner.LogicalSchemaAccessor() and
      planner.PhysicalSchemaAccessor().

One may be surprised to see this schema accessor work happen in the
same patch. This was, unfortunately, a requirement to successfully
plug catalog and schema name resolution in all the right places.
Also, it incidentally answers a long-standing demand for a better
interface to access descriptors.

"How"

The itinerary to achieve this brought me through the following steps:

  • the various name resolution algorithms are concentrated in a new
    file sql/sem/tree/name_resolution.go. They use abstract
    interfaces to interface with the "name provider": database, table,
    column things coming from the database.

  • in the process of writing the name resolution algorithms, I found
    that our previous understanding of name resolution was problematic:

    • the previous code assumed it was always possible to "qualify a
      name using the current database" by just looking at the name
      itself and the current database, without any additional
      information.

      In contrast, the correct qualification algorithms requires both
      the current database, the search path, and descriptor
      lookups.

      This is why this patch fuses all occurrences of separate
      "qualification" and "resolution" phases together.

      Before: tn = tn.QualifyWithDatabase(tcurDb); desc = MustGetDesc(tn)

      After: desc = ResolveExistingObject(p, tn)

    • the resolution code was pushing a VirtualTabler very deep in the
      lookup routines, with the mistaken understanding that
      VirtualTabler is able to respond to requests for database names.

      In contrast, VirtualTabler really has nothing to do with database
      names, and the corresponding code had to get rid of it.

      This was the first motivation for implementing new schema accessor
      interfaces.

    • the path to decide whether to use cached or non-cached descriptors
      was very hard to read; in many instances the code was using
      uncached descriptors where cached descriptors would be needed
      instead. The various APIs were a mess, and information needed to
      decide whether a lookup was possible or not was not available at
      the right level(s) of abstraction.

      This was the second motivation for implementing new schema accessor
      interfaces.

  • Once this all was said done, the various consumers of name
    resolution had to implement the interfaces. They are:

    • resolution of zone specifiers;
    • resolution of target names for CCL statements;
    • resolution of names (both existing and targets) in the sql package;
    • some testing code in sql/opt and sql/opt/build.

Details and suggestions to the reviewers

The following items are fairly self-contained and can be reviewed in
parallel, and probably should be reviewed first:

  1. the changes in ccl/sqlccl. This achieves the following two things:

    • the search path is plumbed where needed.
    • the check for valid table names does not use
      QualifyWithDatabase any more (because that was removed);
      instead they use the new name resolution interface with a shim
      that just checks the schema name is valid.
  2. the changes in sql/opt and sql/opt/build. This merely
    interfaces with the name resolution algorithm for column items. I
    noticed that the underlying data structures there only know about
    unqualified table names, so I did not attempt to plumb
    catalog/schema name resolution logic in there. No functionality is
    lost.

  3. the changes in pkg/config for zone specifiers. These are
    straightforward and merely deal with the correspondance between the
    schema part in the CLI zone specifier syntax and the catalog part of
    the internal ZoneSpecifier struct.

  4. the new file resolver.go, which contains the entry-point
    functions for name resolution using a SchemaResolver.

  5. the changes to server/admin.go. These simply get rid of the
    virtual database checks (we have virtual schemas, not databases,
    now.)

  6. the name resolution algorithms in sem/tree/name_resolution.go
    and the corresponding tests in sem/tree/name_resolution_test.go.

    This is where it all started and should be compared to the
    specifications in the accompanying RFC (rfcs: proposal to make schemas more pg-compatible #21456).

  7. the interface definitions in schema_accessors.go and their
    implementations in phy_schema_accessors.go and
    logical_schema_accessors.go.

Once the last two items in the list above have been reviewed,
it becomes possible to proceed as follows:

  1. check that the various planNode constructors access the new APIs
    consistently, with no surprises.

  2. check that no harm was done in table.go (table cache) and
    database.go (database cache).

  3. check that the construction of the main accessors in
    session.go make sense.

Suggested review attributions:

Release note (sql change): CockroachDB now recognizes the syntax
db.public.tbl in addition to db.tbl, for better compatibility with
PostgreSQL.

@knz knz requested review from a team February 5, 2018 05:46
@knz knz requested a review from a team as a code owner February 5, 2018 05:46
@knz knz requested review from a team February 5, 2018 05:46
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor Author

knz commented Feb 5, 2018

Note especially the section "Details and suggestions to the reviewers" in the PR description.

cc:

@vivekmenezes: items 7 and 9.
@petermattis, @rytaft, @andy-kimball : items 2 and 6.
@jordanlewis: items 4 and 7.
@benesch: items 1 and 3.
@m-schneider: item 8.
@justinj: items 6.
@BramGruneir, @a-robinson: item 5.
@andreimatei: item 10.

@RaduBerinde
Copy link
Member

sql/opt changes (item 2) LGTM


Review status: 0 of 90 files reviewed at latest revision, 1 unresolved discussion, some commit checks broke.


pkg/sql/opt/scope.go, line 53 at r1 (raw file):

}

// FindSourceMatchingName implements the tree.ColumnItemResolver interface.

[nit] incorrect name.

Also "is part of" is better than "implements" (which really only makes sense if this is the only method in the interface).


Comments from Reviewable

@RaduBerinde
Copy link
Member

Review status: 0 of 90 files reviewed at latest revision, 5 unresolved discussions, some commit checks broke.


pkg/sql/sem/tree/name_part.go, line 159 at r1 (raw file):

	// allocation for the slice every time we construct an
	// UnresolvedName. It does imply however that Parts does not have
	// a meaningful "length"; its actually length (the number of parts

[nit, not this change] actual length


pkg/sql/sem/tree/name_resolution.go, line 32 at r1 (raw file):

// - resolution algorithms. These are used to map an unresolved name
//   or pattern to something looked up from the database.
//

[nit] extra lines


pkg/sql/sem/tree/name_resolution.go, line 58 at r1 (raw file):

	}

	return TableName{tblName{

This construction code is repeated a bunch of times, maybe it should be a function.


pkg/sql/sem/tree/name_resolution.go, line 115 at r1 (raw file):

	}

	// Check that all the parts specified are not empty.

This empty checking code appears in many places, it should probably be a separate function.


Comments from Reviewable

@jordanlewis
Copy link
Member

Is there any way to break this up? It will be very difficult for your reviewers to successfully determine whether a +3,301 −1,574 patch over 90 files looks good.

@knz
Copy link
Contributor Author

knz commented Feb 5, 2018 via email

@jordanlewis
Copy link
Member

Yes! I should have started by saying thanks for the excellent executive summary and directions for review. I do think that will help.

However, I worry that it'll be hard for any single reviewer to get a big picture of what's happening here. The whole is more than the sum of the parts and all that.

@andreimatei
Copy link
Contributor

A silly question - the new functionality (as opposed to refactoring) in this PR is for more than supporting one schema literally called "public", right?


Review status: 0 of 90 files reviewed at latest revision, 5 unresolved discussions, some commit checks broke.


Comments from Reviewable

@petermattis
Copy link
Collaborator

Splitting the review responsibility seems like a good way for something to slip through the cracks. I imagine there was a way to break this into smaller chunks, but what is done is done.

My main anxiety is that name resolution can affect performance. @jordanlewis can you either show Raphael how to run kv and tpcc or take this PR for a spin? I'm less concerned about correctness given the test cases and the lack of changes to the logic tests. The one exception is that the addition of .public to a few random tests (noted in comments below).


Review status: 0 of 90 files reviewed at latest revision, 32 unresolved discussions, some commit checks broke.


pkg/ccl/sqlccl/targets_test.go, line 70 at r1 (raw file):

		{"data", "TABLE foo, baz", nil, nil, `table "foo" does not exist`},

		{"", "TABLE system.public.foo", []string{"system", "foo"}, nil, ``},

Does TABLE system.foo no longer work? That seems like a backward incompatible change. I thought we were trying to make that work, though, which means we should be adding new test cases instead of modifying existing ones.


pkg/config/zone.go, line 135 at r1 (raw file):

	// Table prefixes in CLI zone specifiers always have the form
	// "table" or "db.table" and do not know about schemas. They never
	// refer to virtual schemas. So migrate the db part in catalog position.

s/part in/part to the/g?


pkg/sql/create_index.go, line 42 at r1 (raw file):

	}

	defer p.useNewDescriptors()()

Note to self: what is this? Why is it not needed before ResolveTargetObject.


pkg/sql/create_stats.go, line 38 at r1 (raw file):

	}

	tableDesc, err := ResolveExistingObject(ctx, p, tn, true /*required*/, requireTableDesc)

No call to defer p.useNewDescriptors()(). I'm not sure what it does, but you're using it elsewhere that ResolveExistingObject is called.


pkg/sql/data_source.go, line 384 at r1 (raw file):

		}
		// Usual case: a table.
		log.VEventf(ctx, 2, "before table src: %q", t)

Did you intend to keep these? log.VEventf calls are not free and this is a low-level routine.


pkg/sql/data_source.go, line 587 at r1 (raw file):

	ctx context.Context, tn *tree.TableName,
) (*sqlbase.TableDescriptor, error) {
	return ResolveExistingObject(ctx, p, tn, true /*required*/, anyDescType)

Do you need a defer p.useNewDescriptors()() line?


pkg/sql/data_source.go, line 806 at r1 (raw file):

// for the set of sources and if it is, qualifies the missing database name.
func (sources multiSourceInfo) checkDatabaseName(tn tree.TableName) (tree.TableName, error) {
	// FIXME XXX

Reminder to address this. And below too.


pkg/sql/database.go, line 107 at r1 (raw file):

// DatabaseDescEditor provides helper methods for using SQL database descriptors.
type DatabaseDescEditor interface {

nit: The previous name wasn't accurate, but this new name isn't accurate either. How about DatabaseStorage? Hmm, that is a bit odd too as this is only the namespace for databases.


pkg/sql/phy_schema_accessors.go, line 1 at r1 (raw file):

// Copyright 2018 The Cockroach Authors.

nit: Do we use the abbreviation phy_ elsewhere? I'd prefer to spell this out as physical_schema_accessors.go for consistency with logical_schema_accessors.


pkg/sql/phy_schema_accessors.go, line 169 at r1 (raw file):

		return nil, nil, newInvalidSchemaError(name.Catalog(), name.Schema())
	}
	dbName, tblName := name.Catalog(), name.Table()

It is unfortunate that we're using Database in some places and Catalog in others and this is leaking throughout the code. Not for this PR: I think we should settle on one name and then leave the other isolated to some small piece of code.


pkg/sql/planner.go, line 216 at r1 (raw file):

	ctx := log.WithLogTagStr(context.Background(), opName, "")

	curDb := "" // FIXME XXX: change this to system later

Reminder to address this.


pkg/sql/resolver.go, line 110 at r1 (raw file):

// resolutions will be able to observe newly added descriptors still
// in the ADD state.
// The caller should use this as follows: defer p.useNewDescriptors()()

Can you expand on when this needs to be called? You're often calling it before calling ResolveExistingObject, but not always. It isn't clear to me what the rule is.


pkg/sql/resolver.go, line 127 at r1 (raw file):

)

var requiredTypeNames = map[requiredType]string{

nit: This can be [...]string. The initialization otherwise stays the same.


pkg/sql/resolver.go, line 162 at r1 (raw file):

func (p *planner) commonLookupFlags(ctx context.Context) CommonLookupFlags {
	return CommonLookupFlags{ctx: ctx, txn: p.txn, required: false, avoidCached: p.avoidCachedDescriptors}

nit: This would be more readable as a line per field.


pkg/sql/schema_accessors.go, line 51 at r1 (raw file):

type (
	// ObjectName is provided for convenience and to make the interface definitions below more intuitive.

I'm not sure I buy the convenience argument. Type aliases were mainly added to ease refactoring when you can't do so "atomically" (i.e. the refactor is spread across repos). On the other hand, I don't particularly mind this.

nit: These comments exceed 100 columns.


pkg/sql/sequence.go, line 263 at r1 (raw file):

			return nil, err
		}
		// FIXME XXX: uses nil resolver

Reminder to address this (or convert to TODO(knz)).


pkg/sql/set_zone_config.go, line 76 at r1 (raw file):

	}

	table, err := params.p.resolveTableForzone(params.ctx, &n.zoneSpecifier)

s/Forzone/ForZone/g


pkg/sql/zone_test.go, line 68 at r1 (raw file):

	jobsRow := sqlutils.ZoneRow{
		ID:           keys.JobsTableID,
		CLISpecifier: "system.public.jobs",

Shouldn't there be test cases for both system.jobs and system.public.jobs?


pkg/sql/pgwire/conn_test.go, line 176 at r1 (raw file):

		}
		query := exec.Stmt.String()
		t.Logf("received query: %s", query)

Did you intend to leave this in?


pkg/sql/sem/tree/name_resolution.go, line 103 at r1 (raw file):

		return &AllTablesSelector{prefix}, nil
	}
	return &TableName{tblName{TableName: Name(n.Parts[0]), TableNamePrefix: prefix}}, nil

Heh, TableName, tblName, TableName, and TableNamePrefix.


pkg/sql/sem/tree/name_resolution.go, line 252 at r1 (raw file):

		}
		// Two parts: D.T.
		// Try use the current database, and be satisfied if it's sufficient to find the object.

s/Try use/Try to use/g.

Should this be protected by if curDB != ""?


pkg/sql/sem/tree/name_resolution.go, line 295 at r1 (raw file):

		}
		// Two parts: D.T.
		// Try use the currebt database, and be satisfied if it's sufficient to find the object.

s/Try use/Try to use/g, s/currebt/current/g


pkg/sql/sem/tree/name_resolution_test.go, line 589 at r1 (raw file):

}

type spath = sessiondata.SearchPath

I believe you can define this inside TestResolveTablePatternOrName.


pkg/sql/sem/tree/name_resolution_test.go, line 591 at r1 (raw file):

type spath = sessiondata.SearchPath

var mpath = sessiondata.MakeSearchPath

You can define this inside TestResolveTablePatternOrName.


pkg/sql/sem/tree/name_resolution_test.go, line 593 at r1 (raw file):

var mpath = sessiondata.MakeSearchPath

func TestResolveTablePatternOrName(t *testing.T) {

Nice tests.


pkg/sql/sem/tree/name_resolution_test.go, line 604 at r1 (raw file):

		expanded string // The prefix after resolution, with hidden fields revealed.
		scName   string // The schema name after resolution.
		err      string // Trror, if expected.

s/Trror/Error/g


pkg/sql/sem/tree/name_resolution_test.go, line 612 at r1 (raw file):

		// Names of length 1.

		{`kv`, `db1`, mpath([]string{"public", "pg_catalog"}), true, `kv`, `db1.public.kv`, `db1.public[1]`, ``},

You could get rid of the []string by changing MakeSearchPath take a ...string argument.


Comments from Reviewable

@andreimatei
Copy link
Contributor

Here's my main area of concern: historically we haven't been principled about when we can use "cached" descriptors and when we can't. Our understanding has improved significantly with the last iteration on the schema change RFC, but I'm not sure to what extent the code has caught up. I believe we can always use cached descriptors, as long as the cache is aware of the timestamps involved. I always thought that the allowCached in the Executor can probably go away, and also most of the session.tables structure. So I'd love it if you discussed the current state of caching here and check together that it makes any sense.

Other than that, for point #10 - the changes in session.go - LGTM with the comments I've sprinkled elsewhere - I suspect that we have too many types and interfaces now, but if you tell me we need them, I believe you.


Review status: 0 of 90 files reviewed at latest revision, 36 unresolved discussions, some commit checks broke.


pkg/sql/schema_accessors.go, line 25 at r1 (raw file):

)

// This file proposes to abstract access to the physical schema (the descriptors).

nit: "this file proposes" is a weird phrasing for a code file. "This file contains interfaces for accessing descriptors (physical schema)". It'd also be nice if a hint to what is the opposite of physical - what's the "logical" schema?


pkg/sql/schema_accessors.go, line 28 at r1 (raw file):

//
// The following interfaces are defined:
// - DatabaseAccessor, which provides access to database descriptors.

Unless there's good reason for so many interfaces and without knowing anything about anything, they look too many for my taste. If possible to have less...


pkg/sql/session.go, line 675 at r1 (raw file):

		StatusServer:          statusServer,
		MemMetrics:            s.memMetrics,
		Tables:                &s.tables,

can this guy go away now with the new schemaAccessors?


pkg/sql/session.go, line 681 at r1 (raw file):

		TxnModesSetter:        &s.TxnState,
		SchemaChangers:        &s.TxnState.schemaChangers,
		schemaAccessors:       scInterface,

I'd make this exported like everything else unless you have reason not to


Comments from Reviewable

@rytaft
Copy link
Collaborator

rytaft commented Feb 5, 2018

items 2 and 6 LGTM


Reviewed 7 of 90 files at r1.
Review status: 7 of 90 files reviewed at latest revision, 40 unresolved discussions, some commit checks broke.


pkg/sql/opt/build/scope.go, line 67 at r1 (raw file):

var _ tree.Visitor = &scope{}

// FindSourceMatchingName implements the tree.ColumnItemResolver interface.

[nit] fix name.


pkg/sql/opt/build/scope.go, line 76 at r1 (raw file):

			col := &s.cols[i]
			if col.matches("", colName) {
				// TODO(whomever): source names in a FROM clause also have a

Do you have a pointer we can look at for where this is done correctly (to fix this issue)?


pkg/sql/opt/build/scope.go, line 96 at r1 (raw file):

			if col.table == tblName {
				// TODO(whomever): this improperly fails to recognize when a source table
				// is ambiguous, e.g. SELECT kv.k FROM db1.kv, db2.kv

Ditto - do you have a pointer we can look at for where this is done correctly (to fix these issues)?


pkg/sql/sem/tree/name_resolution.go, line 124 at r1 (raw file):

	// `select "".crdb_internal.tables.table_id from "".crdb_internal.tables`.
	lastCheck := n.NumParts
	if lastCheck > 3 {

There are a lot of bare numbers here. Maybe make an enum for the different name parts?


Comments from Reviewable

@BramGruneir
Copy link
Member

Item 5, won't this fail admin_test.go\TestAdminAPIDatabaseVirtual() ?


Reviewed 1 of 90 files at r1.
Review status: 8 of 90 files reviewed at latest revision, 40 unresolved discussions, some commit checks broke.


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented Feb 5, 2018

Review status: 8 of 90 files reviewed at latest revision, 40 unresolved discussions, some commit checks broke.


pkg/sql/opt/build/scope.go, line 76 at r1 (raw file):

Previously, rytaft wrote…

Do you have a pointer we can look at for where this is done correctly (to fix this issue)?

pkg/sql/data_source.go, func (sources multiSourceInfo) findColumn

As I am currently rewriting this, I'll make an attempt at documenting it better.


Comments from Reviewable

@vivekmenezes
Copy link
Contributor

Item 10 LGTM


Review status: 8 of 90 files reviewed at latest revision, 40 unresolved discussions, some commit checks broke.


Comments from Reviewable

@vivekmenezes
Copy link
Contributor

Item 7 LGTM


Review status: 8 of 90 files reviewed at latest revision, 40 unresolved discussions, some commit checks broke.


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented Feb 7, 2018

Bram yes indeed it broke that test. I had to remove it.

Peter, Rebecca, Vivek thank you!

Andrei regarding desc caching it would be great if we could confirm that indeed we can always use the cache, it would simplify the code very greatly. I am not fond of the hoops I had to jump through to keep that conditional. Let's talk soon.

I am still working on the PR, as CI revealed my last amend broke several things and Nathan reminded me I still had to update the virtual table generators. Will ping when I have news.


Review status: 0 of 88 files reviewed at latest revision, 40 unresolved discussions, all commit checks successful.


pkg/ccl/sqlccl/targets_test.go, line 70 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Does TABLE system.foo no longer work? That seems like a backward incompatible change. I thought we were trying to make that work, though, which means we should be adding new test cases instead of modifying existing ones.

It still works, I was just a bit too trigger-happy on the global search-and-replace. Added the issing tests.


pkg/config/zone.go, line 135 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/part in/part to the/g?

Done.


pkg/sql/create_index.go, line 42 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Note to self: what is this? Why is it not needed before ResolveTargetObject.

It exposes non-public descriptors to the resolution algorithm. This is because CREATE INDEX can target newly created tables.


pkg/sql/create_stats.go, line 38 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

No call to defer p.useNewDescriptors()(). I'm not sure what it does, but you're using it elsewhere that ResolveExistingObject is called.

CREATE STATS cannot target newly created tables inside the same txn (as indicated by the allowAdding=false argument to MustGetTableDesc in the previous code)


pkg/sql/data_source.go, line 384 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Did you intend to keep these? log.VEventf calls are not free and this is a low-level routine.

woops, oversight.


pkg/sql/data_source.go, line 587 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Do you need a defer p.useNewDescriptors()() line?

no, in here we're conditional on what the caller has set (idem as the previous code)


pkg/sql/data_source.go, line 806 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Reminder to address this. And below too.

yes thanks for noticing!


pkg/sql/database.go, line 107 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

nit: The previous name wasn't accurate, but this new name isn't accurate either. How about DatabaseStorage? Hmm, that is a bit odd too as this is only the namespace for databases.

I'll welcome any suggestion. No skin in this game.


pkg/sql/planner.go, line 216 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Reminder to address this.

yep thanks


pkg/sql/resolver.go, line 110 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Can you expand on when this needs to be called? You're often calling it before calling ResolveExistingObject, but not always. It isn't clear to me what the rule is.

Done.


pkg/sql/resolver.go, line 127 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

nit: This can be [...]string. The initialization otherwise stays the same.

oh cool TIL. Thanks!


pkg/sql/resolver.go, line 162 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

nit: This would be more readable as a line per field.

Done.


pkg/sql/schema_accessors.go, line 25 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: "this file proposes" is a weird phrasing for a code file. "This file contains interfaces for accessing descriptors (physical schema)". It'd also be nice if a hint to what is the opposite of physical - what's the "logical" schema?

Done.


pkg/sql/schema_accessors.go, line 28 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Unless there's good reason for so many interfaces and without knowing anything about anything, they look too many for my taste. If possible to have less...

I'd like to discuss this with you and jordan. Fewer interfaces make it harder to layer the functionality cleanly.


pkg/sql/schema_accessors.go, line 51 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I'm not sure I buy the convenience argument. Type aliases were mainly added to ease refactoring when you can't do so "atomically" (i.e. the refactor is spread across repos). On the other hand, I don't particularly mind this.

nit: These comments exceed 100 columns.

Fixed the width.

The utility came in by enabling me to write less code, and making the remaining declarations much more compact. It accelerated my work; for me this is a net win.


pkg/sql/sequence.go, line 263 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Reminder to address this (or convert to TODO(knz)).

no I need to fix this really before this can merge. looking into it.


pkg/sql/session.go, line 675 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

can this guy go away now with the new schemaAccessors?

huh hmm maybe I haven't thought about it. If there are cleanup opportunities I'll look into it at the end.


pkg/sql/session.go, line 681 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'd make this exported like everything else unless you have reason not to

Yes I have a reason not to, that's to ensure the user clearly decides to either call LogicalSchemaAccessor() or PhysicalSchemaAccessor() as defined below.


pkg/sql/set_zone_config.go, line 76 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/Forzone/ForZone/g

Done.


pkg/sql/opt/scope.go, line 53 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] incorrect name.

Also "is part of" is better than "implements" (which really only makes sense if this is the only method in the interface).

Interesting, TIL. Modified.


pkg/sql/opt/build/scope.go, line 67 at r1 (raw file):

Previously, rytaft wrote…

[nit] fix name.

Doneprevious


pkg/sql/sem/tree/name_part.go, line 159 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit, not this change] actual length

Done.


pkg/sql/sem/tree/name_resolution.go, line 32 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] extra lines

Done.


pkg/sql/sem/tree/name_resolution.go, line 58 at r1 (raw file):

Previously, RaduBerinde wrote…

This construction code is repeated a bunch of times, maybe it should be a function.

Done.


pkg/sql/sem/tree/name_resolution.go, line 103 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Heh, TableName, tblName, TableName, and TableNamePrefix.

Simplified.


pkg/sql/sem/tree/name_resolution.go, line 115 at r1 (raw file):

Previously, RaduBerinde wrote…

This empty checking code appears in many places, it should probably be a separate function.

The 3 occurrences are oh so slightly different from each other. One doesn't use the same start value, and the 2 others don't use the same error message. Suggestions welcome.


pkg/sql/sem/tree/name_resolution.go, line 124 at r1 (raw file):

Previously, rytaft wrote…

There are a lot of bare numbers here. Maybe make an enum for the different name parts?

How do you suggest to call them?


pkg/sql/sem/tree/name_resolution.go, line 252 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/Try use/Try to use/g.

Should this be protected by if curDB != ""?

I had that initially, but the tests showed it was wrong :)

Added a comment to clarify.


pkg/sql/sem/tree/name_resolution.go, line 295 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/Try use/Try to use/g, s/currebt/current/g

Done.


pkg/sql/sem/tree/name_resolution_test.go, line 589 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I believe you can define this inside TestResolveTablePatternOrName.

TIL, done


pkg/sql/sem/tree/name_resolution_test.go, line 591 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

You can define this inside TestResolveTablePatternOrName.

Done


pkg/sql/sem/tree/name_resolution_test.go, line 604 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/Trror/Error/g

Done.


pkg/sql/sem/tree/name_resolution_test.go, line 612 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

You could get rid of the []string by changing MakeSearchPath take a ...string argument.

Yay! done.


pkg/sql/zone_test.go, line 68 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Shouldn't there be test cases for both system.jobs and system.public.jobs?

no my changes here were mistaken. the CLI specifier syntax (which will soon disappear, btw) is not the same as SQL syntax.


pkg/sql/pgwire/conn_test.go, line 176 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Did you intend to leave this in?

Nope!


pkg/sql/phy_schema_accessors.go, line 1 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

nit: Do we use the abbreviation phy_ elsewhere? I'd prefer to spell this out as physical_schema_accessors.go for consistency with logical_schema_accessors.

Ok done


pkg/sql/phy_schema_accessors.go, line 169 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

It is unfortunate that we're using Database in some places and Catalog in others and this is leaking throughout the code. Not for this PR: I think we should settle on one name and then leave the other isolated to some small piece of code.

Agreed


Comments from Reviewable

@vivekmenezes
Copy link
Contributor

please do not make the change to have AS OF SYSTEM TIME queries use the descripotor cache in this change. I tried doing that this week and ran into some trouble. The scope of this change is large enough that we shouldn't do that here. Thanks!

nvanbenschoten added a commit to cockroachdb/examples-orms that referenced this pull request Feb 15, 2018
The information_schema introspection queries were incorrect after the
change made in cockroachdb/cockroach#22371.
@knz
Copy link
Contributor Author

knz commented Feb 15, 2018

Merged the 6 schema accessor interfaces into just 2 as requested by Andrei.
Was also needed for correctness.


Review status: 98 of 233 files reviewed at latest revision, 34 unresolved discussions, some commit checks pending.


pkg/sql/logictest/testdata/logic_test/information_schema, line 284 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why are the other schemas hidden though? Shouldn't we see pg_catalog tables here?

Yes we'll see the pg_catalog tables here but only with table_catalog = 'test' (and table_schema = 'pg_catalog'). There is no row where table_catalog = 'other_db' because of the filtering.


pkg/sql/logictest/testdata/logic_test/information_schema, line 293 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Remove ORDER BY 1, use rowsort.

Done.


pkg/sql/logictest/testdata/logic_test/information_schema, line 1203 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Don't fully qualify these targets in the next two queries, since we just set the database.

Done.


pkg/sql/opt/optbuilder/builder.go, line 770 at r6 (raw file):

Previously, justinj (Justin Jaffray) wrote…

nit: I always thought these names were less about responsibility and more about giving the person who eventually tackles it a clue as to who to ask for help - I think it might be more valuable to have your name in here?

Oh interesting. TIL.
Nevertheless what is written here still matches your definition.


pkg/sql/sem/tree/name_resolution.go, line 211 at r5 (raw file):

Previously, RaduBerinde wrote…

It should be documented what the interface{} is (or that it's passed through opaquely), both here and for the Resolve functions. Actually, we should make some interfaces with dummy marker functions and use those instead of interface{}. That makes it clearer what's going on, and much easier to find out the possible types for scMeta/objMeta across the codebase.

Ok will do before this merges.


Comments from Reviewable

nvanbenschoten added a commit to cockroachdb/examples-orms that referenced this pull request Feb 15, 2018
The information_schema introspection queries were incorrect after the
change made in cockroachdb/cockroach#22371.
nvanbenschoten added a commit to cockroachdb/examples-orms that referenced this pull request Feb 15, 2018
The information_schema introspection queries were incorrect after the
change made in cockroachdb/cockroach#22371.
@knz knz force-pushed the 20180202-schema-proto branch 4 times, most recently from fe8c690 to 310277e Compare February 16, 2018 00:31
@knz knz force-pushed the 20180202-schema-proto branch 3 times, most recently from 528adaf to b0d2d32 Compare February 16, 2018 01:55
@knz
Copy link
Contributor Author

knz commented Feb 16, 2018

Implemented the interface abstractions suggested by Radu.


Review status: 92 of 308 files reviewed at latest revision, 34 unresolved discussions, some commit checks pending.


Comments from Reviewable

@knz knz force-pushed the 20180202-schema-proto branch 2 times, most recently from da47831 to fa6797e Compare February 16, 2018 02:26
"Why":

Prior to this patch, CockroachDB only recognized names of the form
"db.tbl", like MySQL, whereas PostgreSQL compatibility mandates that
"db.public.tbl" be also valid. This needed a change.

**What:**

The visible tip of this iceberg patch is as follows:

- new name resolution functions which are more correct and more easy
  to use than the previous code!

  Use `Normalize()` on a `TableName`, then `ResolveExistingObject()`
  or `ResolveTargetObject()`.

- this in turn relies on generic, well-abstracted name resolution
  algorithms in `sem/tree/name_resolution.go`, the definition
  of which closely follows the specs in the accompanying RFC.

  This handles the pg syntax for catalog and schema names, together
  with compatibility rules with CockroachDB v1.x.

- a new schema accessor interface that truly encapsulates schema access!
  Check out the gorgeous documentation in `sql/schema_accessors.go`.

  In particular:

  - `planner` now implements a new interface `SchemaResolver`. Use it.
  - if you are not happy with `SchemaResolver` directly, use
    `SchemaAccessor` or its various consistuent interfaces.

    Depending on whether virtual schemas are to be considered, consider
    alternatively `planner.LogicalSchemaAccessor()` and
    `planner.PhysicalSchemaAccessor()`.

One may be surprised to see this schema accessor work happen in the
same patch. This was, unfortunately, a requirement to successfully
plug catalog and schema name resolution in all the right places.
Also, it incidentally answers a long-standing demand for a better
interface to access descriptors.

**How:**

The itinerary to achieve this brought me through the following steps:

- the various name resolution algorithms are concentrated in a new
  file `sql/sem/tree/name_resolution.go`. They use abstract
  interfaces to interface with the "name provider": database, table,
  column things coming from the database.

- in the process of writing the name resolution algorithms, I found
  that our previous understanding of name resolution was problematic:

  - the previous code assumed it was always possible to "qualify a
    name using the current database" by just looking at the name
    itself and the current database, without any additional
    information.

    In contrast, the correct qualification algorithms requires both
    the current database, the search path, and descriptor
    lookups.

    This is why this patch fuses all occurrences of separate
    "qualification" and "resolution" phases together.

    Before: `tn = tn.QualifyWithDatabase(tcurDb); desc = MustGetDesc(tn)`

    After: `desc = ResolveExistingObject(p, tn)`

  - the resolution code was pushing a `VirtualTabler` very deep in the
    lookup routines, with the mistaken understanding that
    VirtualTabler is able to respond to requests for database names.

    In contrast, VirtualTabler really has nothing to do with database
    names, and the corresponding code had to get rid of it.

    This was the first motivation for implementing new schema accessor
    interfaces.

  - the path to decide whether to use cached or non-cached descriptors
    was very hard to read; in many instances the code was using
    uncached descriptors where cached descriptors would be needed
    instead. The various APIs were a mess, and information needed to
    decide whether a lookup was possible or not was not available at
    the right level(s) of abstraction.

    This was the second motivation for implementing new schema accessor
    interfaces.

- Once this all was said done, the various consumers of name
  resolution had to implement the interfaces. They are:

  - resolution of zone specifiers;
  - resolution of target names for CCL statements;
  - resolution of names (both existing and targets) in the `sql` package;
  - some testing code in `sql/opt` and `sql/opt/build`.
  - the virtual schemas and tables.

Release note (sql change): CockroachDB now recognizes the syntax
`db.public.tbl` in addition to `db.tbl`, for better compatibility with
PostgreSQL. The handling of the session variable `search_path`, as
well as that of the built-in functions `current_schemas()` and
`current_schema()`, is now closer to that of PostgreSQL.

Release note (sql change): SHOW TABLES FROM can now inspect the tables
of a specific shema, for example `SHOW TABLES FROM db.public` or `SHOW
TABLES FROM db.pg_catalog`.

Release note (sql change): SHOW GRANTS now also shows the schema of
the databases and tables.
@knz
Copy link
Contributor Author

knz commented Feb 16, 2018

Replaced by #22753.

@knz knz closed this Feb 16, 2018
@vivekmenezes
Copy link
Contributor

vivekmenezes commented Feb 16, 2018 via email

@knz
Copy link
Contributor Author

knz commented Feb 16, 2018

I had based this PR off another branch than master, where I had accumulated some unrelated troubleshooting changes that were aiding development. Unfortunately github does not allow me to change the base branch before merging.

I could have merged this PR, and then created another PR to merge the base branch into master; but in the end there would still be two PRs.

benesch added a commit to benesch/cockroach that referenced this pull request Mar 22, 2018
The name resolution overhaul (cockroachdb#22371) caused TestDockerCSharp to start
failing. Specifically, I suspect that Npgsql performs some query on a
table in the pg_catalog schema, which now requires the session database
to exist. There seems to be no way to prevent Npgsql from performing
this query, so simply instruct it to connect to the system database,
which is guaranteed to exist.

Also delete a test that checked the oid of the pg_catalog namespace
entry. That's changed since the test has been skipped, and seems like a
moving target. I'm not sure why we cared about the oid's value in the
first palce.

Fix cockroachdb#22769.

Release note: None
@knz knz deleted the 20180202-schema-proto branch April 27, 2018 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.