sql: 100x performance degradation in pgjdbc query to fetch columns #55140

rafiss · 2020-10-01T20:09:21Z

Describe the problem

I was running the liquibase integration tests and encountered a panic:

* ERROR: [n1,client=[::1]:65462,hostnossl,user=root] a panic has occurred!
* txn: no bytes in account to release, current 47738, free 47764
* (1) attached stack trace
*   -- stack trace:
*   | github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn.func1
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:508
*   | runtime.gopanic
*   | 	/usr/local/opt/go@1.13/libexec/src/runtime/panic.go:679
*   | github.com/cockroachdb/cockroach/pkg/util/log.ReportOrPanic
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/util/log/crash_reporting.go:340
*   | github.com/cockroachdb/cockroach/pkg/util/mon.(*BoundAccount).Shrink
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/util/mon/bytes_usage.go:583
*   | github.com/cockroachdb/cockroach/pkg/sql/rowcontainer.(*RowContainer).PopFirst
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowcontainer/datum_row_container.go:298
*   | github.com/cockroachdb/cockroach/pkg/sql.(*vTableLookupJoinNode).Next
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/virtual_table.go:275
*   | github.com/cockroachdb/cockroach/pkg/sql.(*planNodeToRowSource).Next
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/plan_node_to_row_source.go:178
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*hashJoiner).receiveNext
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/hashjoiner.go:656
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*hashJoiner).build
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/hashjoiner.go:303
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*hashJoiner).Next
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/hashjoiner.go:237
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*windower).accumulateRows
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/windower.go:282
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*windower).Next
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/windower.go:234
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*sortAllProcessor).fill
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/sorter.go:291
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*sortAllProcessor).Start
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/sorter.go:272
*   | github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBase).Run
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:774
*   | github.com/cockroachdb/cockroach/pkg/sql/flowinfra.(*FlowBase).Run
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:392
*   | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).Run
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:422
*   | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).PlanAndRun
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:991
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execWithDistSQLEngine
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1002
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).dispatchToExecutionEngine
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:873
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:639
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:114
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCmd
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1472
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1401
*   | github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:510
*   | github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).processCommandsAsync.func1
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:627
* Wraps: (2) while executing: SELECT * FROM (SELECT _._, _._, _._, _._, _._ OR ((_._ = _) AND _._) AS _, _._, _._, _._, row_number() OVER (PARTITION BY _._ ORDER BY _._) AS _, _ AS _, pg_get_expr(_._, _._) AS _, _._, _._, _._ FROM _._ AS _ JOIN _._ AS _ ON (_._ = _._) JOIN _._ AS _ ON (_._ = _._) JOIN _._ AS _ ON (_._ = _._) LEFT JOIN _._ AS _ ON ((_._ = _._) AND (_._ = _._)) LEFT JOIN _._ AS _ ON ((_._ = _._) AND (_._ = _._)) LEFT JOIN _._ AS _ ON ((_._ = _._) AND (_._ = _)) LEFT JOIN _._ AS _ ON ((_._ = _._) AND (_._ = _)) WHERE ((((_._ IN (_, _, __more3__)) AND (_._ > _)) AND (NOT _._)) AND (_._ LIKE _)) AND (_._ LIKE _)) AS _ WHERE _ AND (_ LIKE _) ORDER BY _, _._, _
* Wraps: (3) attached stack trace
*   -- stack trace:
*   | github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn.func1
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:508
*   | runtime.gopanic
*   | 	/usr/local/opt/go@1.13/libexec/src/runtime/panic.go:679
*   | [...repeated from below...]
* Wraps: (4) attached stack trace
*   -- stack trace:
*   | github.com/cockroachdb/cockroach/pkg/util/log.ReportOrPanic
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/util/log/crash_reporting.go:338
*   | github.com/cockroachdb/cockroach/pkg/util/mon.(*BoundAccount).Shrink
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/util/mon/bytes_usage.go:583
*   | github.com/cockroachdb/cockroach/pkg/sql/rowcontainer.(*RowContainer).PopFirst
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowcontainer/datum_row_container.go:298
*   | github.com/cockroachdb/cockroach/pkg/sql.(*vTableLookupJoinNode).Next
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/virtual_table.go:275
*   | github.com/cockroachdb/cockroach/pkg/sql.(*planNodeToRowSource).Next
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/plan_node_to_row_source.go:178
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*hashJoiner).receiveNext
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/hashjoiner.go:656
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*hashJoiner).build
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/hashjoiner.go:303
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*hashJoiner).Next
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/hashjoiner.go:237
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*windower).accumulateRows
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/windower.go:282
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*windower).Next
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/windower.go:234
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*sortAllProcessor).fill
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/sorter.go:291
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*sortAllProcessor).Start
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/sorter.go:272
*   | github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBase).Run
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:774
*   | github.com/cockroachdb/cockroach/pkg/sql/flowinfra.(*FlowBase).Run
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:392
*   | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).Run
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:422
*   | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).PlanAndRun
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:991
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execWithDistSQLEngine
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1002
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).dispatchToExecutionEngine
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:873
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:639
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:114
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCmd
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1472
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1401
*   | github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:510
*   | github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).processCommandsAsync.func1
*   | 	/Users/rafiss/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:627
*   | runtime.goexit
*   | 	/usr/local/opt/go@1.13/libexec/src/runtime/asm_amd64.s:1357
* Wraps: (5) txn: no bytes in account to release, current 47738, free 47764
* Error types: (1) *withstack.withStack (2) *safedetails.withSafeDetails (3) *withstack.withStack (4) *withstack.withStack (5) *errutil.leafError

To Reproduce

Ran the Liquibase integration tests on a fresh cluster.

This is the query that caused the panic:

SELECT
  *
FROM
  (
    SELECT
      n.nspname,
      c.relname,
      a.attname,
      a.atttypid,
      a.attnotnull
      OR ((t.typtype = 'd') AND t.typnotnull)
        AS attnotnull,
      a.atttypmod,
      a.attlen,
      t.typtypmod,
      row_number() OVER (
        PARTITION BY a.attrelid ORDER BY a.attnum
      )
        AS attnum,
      NULL AS attidentity,
      pg_get_expr(def.adbin, def.adrelid) AS adsrc,
      dsc.description,
      t.typbasetype,
      t.typtype
    FROM
      pg_catalog.pg_namespace AS n
      JOIN pg_catalog.pg_class AS c ON (c.relnamespace = n.oid)
      JOIN pg_catalog.pg_attribute AS a ON (a.attrelid = c.oid)
      JOIN pg_catalog.pg_type AS t ON (a.atttypid = t.oid)
      LEFT JOIN pg_catalog.pg_attrdef AS def ON
          (
            (a.attrelid = def.adrelid)
            AND (a.attnum = def.adnum)
          )
      LEFT JOIN pg_catalog.pg_description AS dsc ON
          ((c.oid = dsc.objoid) AND (a.attnum = dsc.objsubid))
      LEFT JOIN pg_catalog.pg_class AS dc ON
          (
            (dc.oid = dsc.classoid)
            AND (dc.relname = 'pg_class')
          )
      LEFT JOIN pg_catalog.pg_namespace AS dn ON
          (
            (dc.relnamespace = dn.oid)
            AND (dn.nspname = 'pg_catalog')
          )
    WHERE
      (
        (
          (
            (c.relkind IN ('r', 'p', 'v', 'f', 'm'))
            AND (a.attnum > 0)
          )
          AND (NOT a.attisdropped)
        )
        AND (n.nspname LIKE 'public')
      )
      AND (c.relname LIKE '%')
  )
    AS c
WHERE
  true AND (attname LIKE '%')
ORDER BY
  nspname, c.relname, attnum;

But running just the query on a fresh cluster is fine. I haven't figured out what preceding steps would make this panic.

Environment:
Using CockroachDB master (sha 33c1bb6)

The text was updated successfully, but these errors were encountered:

yuzefovich · 2020-10-01T20:19:58Z

I think it's the same as #53416.

yuzefovich · 2020-10-01T20:21:49Z

Yeah, the query matches the signature of #53416 (comment), thanks for the report.

rafiss · 2020-10-01T20:26:05Z

No problem. I still have the cluster so I'll attach the debug zip here at least until I can figure out all the repro steps.

Nice catch recognizing that query string signature!

rafiss · 2020-10-01T20:28:16Z

debug55140.zip

rafiss · 2020-10-02T18:27:16Z

Actually it looks like the query is from PGJDBC's getColumns, meaning that any Java ORM/app could hit this. I'm still working on figuring out what initial conditions cause this query to fail.

rafiss · 2020-10-02T19:20:28Z

OK I found a reliable repro. Basically, just creating a lot of columns will cause the issue. master and 20.2 seem affected. v20.2.0-beta.3 does not have the panic (since log.ReportOrPanic is used here, and will not trigger a panic for official releases), but it does have incorrect results. In the repro below, there are supposed to be 1375 rows returned, but only 1100 rows are returned.

Steps:

Run the following Python program:

f = open('creates.sql', 'w')
for i in range(275):
    f.write('create table t{} (a int primary key, b string, c date, e interval);\n'.format(i))

Start a fresh cluster
Run ./cockroach sql --insecure -f creates.sql
Open a Cockroach SQL shell and run the query. Here it is in one line.

SELECT * FROM (SELECT n.nspname, c.relname, a.attname, a.atttypid, a.attnotnull OR ((t.typtype = 'd') AND t.typnotnull) AS attnotnull, a.atttypmod, a.attlen, t.typtypmod, row_number() OVER (PARTITION BY a.attrelid ORDER BY a.attnum) AS attnum, NULL AS attidentity, pg_get_expr(def.adbin, def.adrelid) AS adsrc, dsc.description, t.typbasetype, t.typtype FROM pg_catalog.pg_namespace AS n JOIN pg_catalog.pg_class AS c ON (c.relnamespace = n.oid) JOIN pg_catalog.pg_attribute AS a ON (a.attrelid = c.oid) JOIN pg_catalog.pg_type AS t ON (a.atttypid = t.oid) LEFT JOIN pg_catalog.pg_attrdef AS def ON ((a.attrelid = def.adrelid) AND (a.attnum = def.adnum)) LEFT JOIN pg_catalog.pg_description AS dsc ON ((c.oid = dsc.objoid) AND (a.attnum = dsc.objsubid)) LEFT JOIN pg_catalog.pg_class AS dc ON ((dc.oid = dsc.classoid) AND (dc.relname = 'pg_class')) LEFT JOIN pg_catalog.pg_namespace AS dn ON ((dc.relnamespace = dn.oid) AND (dn.nspname = 'pg_catalog')) WHERE ((((c.relkind IN ('r', 'p', 'v', 'f', 'm')) AND (a.attnum > 0)) AND (NOT a.attisdropped)) AND (n.nspname LIKE 'public')) AND (c.relname LIKE '%')) AS c WHERE true AND (attname LIKE '%') ORDER BY nspname, c.relname, attnum;

1100 rows returned but 1375 are expected.

Here is the explain output of the problematic query.

---------------------------------------------------------------------------------+--------------+----------------------------------------------------------------
                                                                                 | distribution | local
                                                                                 | vectorized   | false
  sort                                                                           |              |
   │                                                                             | order        | +relname,+row_number
   └── render                                                                    |              |
        └── filter                                                               |              |
             │                                                                   | filter       | attname LIKE '%'
             └── window                                                          |              |
                  └── hash join (left outer)                                     |              |
                       │                                                         | equality     | (relnamespace) = (oid)
                       ├── virtual table lookup join                             |              |
                       │    │                                                    | table        | pg_type@pg_type_oid_idx
                       │    │                                                    | equality     | (atttypid) = (oid)
                       │    └── virtual table lookup join (left outer)           |              |
                       │         │                                               | table        | pg_class@pg_class_oid_idx
                       │         │                                               | equality     | (classoid) = (oid)
                       │         │                                               | pred         | relname = 'pg_class'
                       │         └── hash join (right outer)                     |              |
                       │              │                                          | equality     | (objoid, objsubid) = (oid, attnum)
                       │              ├── filter                                 |              |
                       │              │    │                                     | filter       | objsubid > 0
                       │              │    └── virtual table                     |              |
                       │              │                                          | table        | pg_description@primary
                       │              └── virtual table lookup join (left outer) |              |
                       │                   │                                     | table        | pg_attrdef@pg_attrdef_adrelid_idx
                       │                   │                                     | equality     | (attrelid) = (adrelid)
                       │                   │                                     | pred         | (attnum = adnum) AND (adnum > 0)
                       │                   └── virtual table lookup join         |              |
                       │                        │                                | table        | pg_attribute@pg_attribute_attrelid_idx
                       │                        │                                | equality     | (oid) = (attrelid)
                       │                        │                                | pred         | (attnum > 0) AND (NOT attisdropped)
                       │                        └── hash join                    |              |
                       │                             │                           | equality     | (relnamespace) = (oid)
                       │                             ├── filter                  |              |
                       │                             │    │                      | filter       | (relkind IN ('f', 'm', 'p', 'r', 'v')) AND (relname LIKE '%')
                       │                             │    └── virtual table      |              |
                       │                             │                           | table        | pg_class@primary
                       │                             └── filter                  |              |
                       │                                  │                      | filter       | nspname LIKE 'public'
                       │                                  └── virtual table      |              |
                       │                                                         | table        | pg_namespace@primary
                       └── filter                                                |              |
                            │                                                    | filter       | nspname = 'pg_catalog'
                            └── virtual table                                    |              |
                                                                                 | table        | pg_namespace@primary
(45 rows)

jordanlewis · 2020-10-02T19:23:39Z

Nice repro, works for me

rafiss · 2020-10-02T19:28:39Z

It seems like v20.1.3 also has incorrect results. It also returns 1100 rows instead of 1375 rows.

Interestingly, even though v20.1 and v20.2 both have the wrong results, v20.1 completes the getColumns query in 69ms while v20.2 takes ~7 seconds.

jordanlewis · 2020-10-02T19:33:00Z

20.1 didn't have the vtableLookupJoinNode stuff I don't think, but I'm also noticing that the plan in 20.2 doesn't even use that. The purpose of that is to avoid fully materializing all of the descriptors in memory.

But, the query you pasted also doesn't have a constraint on the name of the table - is that what you're testing in 20.2 and 20.1 both?

jordanlewis · 2020-10-02T19:33:50Z

#48226 is the PR that added vtable lookup join

rafiss · 2020-10-02T19:36:05Z

That's right -- none of the queries should have a constraint on the table name.

Hmmm Is my math off? Postgres is also returning 1100 rows. whoops.

rafiss · 2020-10-02T19:41:38Z

OK, so I guess there's not a correctness issue in v20.1 or v20.2.

But v20.2 does have the memory accounting issue in the logs:

W201002 19:08:26.295868 1885 util/log/crash_reporting.go:342 ⋮ [n1,client=‹[::1]:62043›,hostnossl,user=root] ‹txn›: no bytes in account to release, current 25392, free 25393
E201002 19:08:26.302761 1885 util/log/crash_reporting.go:321 ⋮ [n1,client=‹[::1]:62043›,hostnossl,user=root] Queued as error ‹0286cbcfac3644ebb05d261e36db399a›

jordanlewis · 2020-10-02T19:42:42Z

Makes sense - release binaries don't crash on things like this, merely reporting to sentry. Development binaries crash instead, to make it easier to notice.

jordanlewis · 2020-10-02T19:48:11Z

This is looking like a weirder than normal memory account issue. We are trying to shrink by 25418 bytes, but only have 25416 bytes available. The small difference makes nervous that there was some kind of mutation to the data being accounted for, rather than a double close or whatever, in which situation we'd see a much larger delta.

rafiss · 2020-10-02T19:49:01Z

Yeah, I noted the thing about not crashing on official releases above.

And just to close the loop, my math was wrong because I thought I was making 5 columns per table, but I was actually making 4 columns. Confusingly, I named them a, b, c, and e.

So the goals of this issue should be:

figure out why the query takes 100 times as long starting in v20.2
figure out why the memory accounting is off starting in v20.2

I'm guessing those two could be related?

jordanlewis · 2020-10-02T20:40:08Z

I'm starting to believe my theory about mutated rows. Here is the log of the sizes of all row additions and subtractions from the lookup join node. I used the following modified query that uses only the vtable lookup join that crashes:

SELECT
	*
FROM
	(
		SELECT
			n.nspname,
			c.relname,
			a.attname,
			a.atttypid,
			a.attnotnull
			OR ((t.typtype = 'd') AND t.typnotnull)
				AS attnotnull,
			a.atttypmod,
			a.attlen,
			t.typtypmod,
			row_number() OVER (
				PARTITION BY a.attrelid ORDER BY a.attnum
			)
				AS attnum,
			NULL AS attidentity,
			pg_get_expr(def.adbin, def.adrelid) AS adsrc,
			dsc.description,
			t.typbasetype,
			t.typtype
		FROM
			pg_catalog.pg_namespace AS n
			JOIN pg_catalog.pg_class AS c ON
					(c.relnamespace = n.oid)
			JOIN pg_catalog.pg_attribute AS a ON
					(a.attrelid = c.oid)
			inner hash JOIN pg_catalog.pg_type AS t ON
					(a.atttypid = t.oid)
			LEFT JOIN pg_catalog.pg_attrdef AS def ON
					(
						(a.attrelid = def.adrelid)
						AND (a.attnum = def.adnum)
					)
			LEFT JOIN pg_catalog.pg_description AS dsc ON
					(
						(c.oid = dsc.objoid)
						AND (a.attnum = dsc.objsubid)
					)
			LEFT JOIN pg_catalog.pg_class AS dc ON
					(
						(dc.oid = dsc.classoid)
						AND (dc.relname = 'pg_class')
					)
			LEFT JOIN pg_catalog.pg_namespace AS dn ON
					(
						(dc.relnamespace = dn.oid)
						AND (dn.nspname = 'public')
					)
		WHERE
			(
				(
					(
						(
							c.relkind
							IN ('r', 'p', 'v', 'f', 'm')
						)
						AND (a.attnum > 0)
					)
					AND (NOT a.attisdropped)
				)
				AND (n.nspname LIKE 'public')
			)
			AND (c.relname LIKE '%')
	)
		AS c
WHERE
	true AND (attname LIKE '%')
ORDER BY
	nspname, c.relname, attnum;

0xc000139a20 +171
0xc000139a20 +171
0xc000139a20 -171
0xc000139a20 -171
0xc000139a20 +184
0xc000139a20 +188
0xc000139a20 -184
0xc000139a20 -188
0xc000139a20 +174
0xc000139a20 -174
0xc000139a20 +171
0xc000139a20 +174
0xc000139a20 +175
0xc000139a20 -171
0xc000139a20 -174
0xc000139a20 -175
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 +172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 -172
0xc000139a20 +173
0xc000139a20 +173
0xc000139a20 +173
0xc000139a20 +173
0xc000139a20 -173
0xc000139a20 -173
0xc000139a20 -173
0xc000139a20 -173
0xc000139a20 +173
0xc000139a20 +173
0xc000139a20 +173
0xc000139a20 +173
0xc000139a20 -173
0xc000139a20 -173
0xc000139a20 -173
0xc000139a20 -173
0xc000139a20 +173
0xc000139a20 +173
0xc000139a20 +173
0xc000139a20 +173
0xc000139a20 -173
0xc000139a20 -173
0xc000139a20 -173
0xc000139a20 -173
0xc000139a20 +173
0xc000139a20 +173
0xc000139a20 +173
0xc000139a20 +173
0xc000139a20 -173
0xc000139a20 -173
0xc000139a20 -173
0xc000139a20 -173

The numbers in this log sum to zero, as expected. But, I also captured the row sizes from the perspective of the row container, which only actually edits the memory account every 64th removed row due to chunking. Here is the log of those row sizes:

Summing up the subtractions in the second log with the additions in the first, we get -2 bytes, the same as the discrepancy in the error.

This means that, in between some calls to At(0)/PopFirst and the time that PopFirst is called, one of the rows' sizes changes, which should never happen. Or there's another bug in RowContainer, which I kind of doubt because this infrastructure is quite old.

jordanlewis · 2020-10-02T20:55:28Z

Here is the log of the rows removed. The first half ("fake" at the end of the line) is the output of At(0) of the row container. The second half is the output of what PopFirst actually deletes. Notice the discrepancy - the first half removes two rows with "b" first, and then removes 4 rows with "t13". The second half removes 1 row with "b" first, and then removes 5 rows with "t13". The first "t13" row in the second half appears to be identical to the last "t13" row in the first half in the last half. I am starting to think there is a bug in RowContainer...

jordanlewis · 2020-10-02T20:55:39Z

Forgot to include the log.

[487 'b' 1330834471 'r' 1330834471 'public' 487 'a' 2950 16 1 -1 true false] 171 fake
[487 'b' 1330834471 'r' 1330834471 'public' 487 'b' 2950 16 2 -1 true false] 171 fake
[500 'i suck. badly)' 1330834471 'r' 1330834471 'public' 500 'a' 20 8 1 -1 false false] 184 fake
[500 'i suck. badly)' 1330834471 'r' 1330834471 'public' 500 'rowid' 20 8 2 -1 true false] 188 fake
[516 'derp' 1330834471 'r' 1330834471 'public' 516 'a' 2950 16 1 -1 true false] 174 fake
[517 'a' 1330834471 'r' 1330834471 'public' 517 'a' 20 8 1 -1 false false] 171 fake
[517 'a' 1330834471 'r' 1330834471 'public' 517 'blob' 25 -1 2 -1 false false] 174 fake
[517 'a' 1330834471 'r' 1330834471 'public' 517 'rowid' 20 8 3 -1 true false] 175 fake
[518 't0' 1330834471 'r' 1330834471 'public' 518 'a' 20 8 1 -1 true false] 172 fake
[518 't0' 1330834471 'r' 1330834471 'public' 518 'b' 25 -1 2 -1 false false] 172 fake
[518 't0' 1330834471 'r' 1330834471 'public' 518 'c' 1082 16 3 -1 false false] 172 fake
[518 't0' 1330834471 'r' 1330834471 'public' 518 'e' 1186 24 4 -1 false false] 172 fake
[519 't1' 1330834471 'r' 1330834471 'public' 519 'a' 20 8 1 -1 true false] 172 fake
[519 't1' 1330834471 'r' 1330834471 'public' 519 'b' 25 -1 2 -1 false false] 172 fake
[519 't1' 1330834471 'r' 1330834471 'public' 519 'c' 1082 16 3 -1 false false] 172 fake
[519 't1' 1330834471 'r' 1330834471 'public' 519 'e' 1186 24 4 -1 false false] 172 fake
[520 't2' 1330834471 'r' 1330834471 'public' 520 'a' 20 8 1 -1 true false] 172 fake
[520 't2' 1330834471 'r' 1330834471 'public' 520 'b' 25 -1 2 -1 false false] 172 fake
[520 't2' 1330834471 'r' 1330834471 'public' 520 'c' 1082 16 3 -1 false false] 172 fake
[520 't2' 1330834471 'r' 1330834471 'public' 520 'e' 1186 24 4 -1 false false] 172 fake
[521 't3' 1330834471 'r' 1330834471 'public' 521 'a' 20 8 1 -1 true false] 172 fake
[521 't3' 1330834471 'r' 1330834471 'public' 521 'b' 25 -1 2 -1 false false] 172 fake
[521 't3' 1330834471 'r' 1330834471 'public' 521 'c' 1082 16 3 -1 false false] 172 fake
[521 't3' 1330834471 'r' 1330834471 'public' 521 'e' 1186 24 4 -1 false false] 172 fake
[522 't4' 1330834471 'r' 1330834471 'public' 522 'a' 20 8 1 -1 true false] 172 fake
[522 't4' 1330834471 'r' 1330834471 'public' 522 'b' 25 -1 2 -1 false false] 172 fake
[522 't4' 1330834471 'r' 1330834471 'public' 522 'c' 1082 16 3 -1 false false] 172 fake
[522 't4' 1330834471 'r' 1330834471 'public' 522 'e' 1186 24 4 -1 false false] 172 fake
[523 't5' 1330834471 'r' 1330834471 'public' 523 'a' 20 8 1 -1 true false] 172 fake
[523 't5' 1330834471 'r' 1330834471 'public' 523 'b' 25 -1 2 -1 false false] 172 fake
[523 't5' 1330834471 'r' 1330834471 'public' 523 'c' 1082 16 3 -1 false false] 172 fake
[523 't5' 1330834471 'r' 1330834471 'public' 523 'e' 1186 24 4 -1 false false] 172 fake
[524 't6' 1330834471 'r' 1330834471 'public' 524 'a' 20 8 1 -1 true false] 172 fake
[524 't6' 1330834471 'r' 1330834471 'public' 524 'b' 25 -1 2 -1 false false] 172 fake
[524 't6' 1330834471 'r' 1330834471 'public' 524 'c' 1082 16 3 -1 false false] 172 fake
[524 't6' 1330834471 'r' 1330834471 'public' 524 'e' 1186 24 4 -1 false false] 172 fake
[525 't7' 1330834471 'r' 1330834471 'public' 525 'a' 20 8 1 -1 true false] 172 fake
[525 't7' 1330834471 'r' 1330834471 'public' 525 'b' 25 -1 2 -1 false false] 172 fake
[525 't7' 1330834471 'r' 1330834471 'public' 525 'c' 1082 16 3 -1 false false] 172 fake
[525 't7' 1330834471 'r' 1330834471 'public' 525 'e' 1186 24 4 -1 false false] 172 fake
[526 't8' 1330834471 'r' 1330834471 'public' 526 'a' 20 8 1 -1 true false] 172 fake
[526 't8' 1330834471 'r' 1330834471 'public' 526 'b' 25 -1 2 -1 false false] 172 fake
[526 't8' 1330834471 'r' 1330834471 'public' 526 'c' 1082 16 3 -1 false false] 172 fake
[526 't8' 1330834471 'r' 1330834471 'public' 526 'e' 1186 24 4 -1 false false] 172 fake
[527 't9' 1330834471 'r' 1330834471 'public' 527 'a' 20 8 1 -1 true false] 172 fake
[527 't9' 1330834471 'r' 1330834471 'public' 527 'b' 25 -1 2 -1 false false] 172 fake
[527 't9' 1330834471 'r' 1330834471 'public' 527 'c' 1082 16 3 -1 false false] 172 fake
[527 't9' 1330834471 'r' 1330834471 'public' 527 'e' 1186 24 4 -1 false false] 172 fake
[528 't10' 1330834471 'r' 1330834471 'public' 528 'a' 20 8 1 -1 true false] 173 fake
[528 't10' 1330834471 'r' 1330834471 'public' 528 'b' 25 -1 2 -1 false false] 173 fake
[528 't10' 1330834471 'r' 1330834471 'public' 528 'c' 1082 16 3 -1 false false] 173 fake
[528 't10' 1330834471 'r' 1330834471 'public' 528 'e' 1186 24 4 -1 false false] 173 fake
[529 't11' 1330834471 'r' 1330834471 'public' 529 'a' 20 8 1 -1 true false] 173 fake
[529 't11' 1330834471 'r' 1330834471 'public' 529 'b' 25 -1 2 -1 false false] 173 fake
[529 't11' 1330834471 'r' 1330834471 'public' 529 'c' 1082 16 3 -1 false false] 173 fake
[529 't11' 1330834471 'r' 1330834471 'public' 529 'e' 1186 24 4 -1 false false] 173 fake
[530 't12' 1330834471 'r' 1330834471 'public' 530 'a' 20 8 1 -1 true false] 173 fake
[530 't12' 1330834471 'r' 1330834471 'public' 530 'b' 25 -1 2 -1 false false] 173 fake
[530 't12' 1330834471 'r' 1330834471 'public' 530 'c' 1082 16 3 -1 false false] 173 fake
[530 't12' 1330834471 'r' 1330834471 'public' 530 'e' 1186 24 4 -1 false false] 173 fake
[531 't13' 1330834471 'r' 1330834471 'public' 531 'a' 20 8 1 -1 true false] 173 fake
[531 't13' 1330834471 'r' 1330834471 'public' 531 'b' 25 -1 2 -1 false false] 173 fake
[531 't13' 1330834471 'r' 1330834471 'public' 531 'c' 1082 16 3 -1 false false] 173 fake
[531 't13' 1330834471 'r' 1330834471 'public' 531 'e' 1186 24 4 -1 false false] 173 fake

[487 'b' 1330834471 'r' 1330834471 'public' 487 'a' 2950 16 1 -1 true false] 171 real
[500 'i suck. badly)' 1330834471 'r' 1330834471 'public' 500 'rowid' 20 8 2 -1 true false] 188 real
[500 'i suck. badly)' 1330834471 'r' 1330834471 'public' 500 'a' 20 8 1 -1 false false] 184 real
[516 'derp' 1330834471 'r' 1330834471 'public' 516 'a' 2950 16 1 -1 true false] 174 real
[517 'a' 1330834471 'r' 1330834471 'public' 517 'rowid' 20 8 3 -1 true false] 175 real
[517 'a' 1330834471 'r' 1330834471 'public' 517 'a' 20 8 1 -1 false false] 171 real
[517 'a' 1330834471 'r' 1330834471 'public' 517 'blob' 25 -1 2 -1 false false] 174 real
[518 't0' 1330834471 'r' 1330834471 'public' 518 'e' 1186 24 4 -1 false false] 172 real
[518 't0' 1330834471 'r' 1330834471 'public' 518 'a' 20 8 1 -1 true false] 172 real
[518 't0' 1330834471 'r' 1330834471 'public' 518 'b' 25 -1 2 -1 false false] 172 real
[518 't0' 1330834471 'r' 1330834471 'public' 518 'c' 1082 16 3 -1 false false] 172 real
[519 't1' 1330834471 'r' 1330834471 'public' 519 'e' 1186 24 4 -1 false false] 172 real
[519 't1' 1330834471 'r' 1330834471 'public' 519 'a' 20 8 1 -1 true false] 172 real
[519 't1' 1330834471 'r' 1330834471 'public' 519 'b' 25 -1 2 -1 false false] 172 real
[519 't1' 1330834471 'r' 1330834471 'public' 519 'c' 1082 16 3 -1 false false] 172 real
[520 't2' 1330834471 'r' 1330834471 'public' 520 'e' 1186 24 4 -1 false false] 172 real
[520 't2' 1330834471 'r' 1330834471 'public' 520 'a' 20 8 1 -1 true false] 172 real
[520 't2' 1330834471 'r' 1330834471 'public' 520 'b' 25 -1 2 -1 false false] 172 real
[520 't2' 1330834471 'r' 1330834471 'public' 520 'c' 1082 16 3 -1 false false] 172 real
[521 't3' 1330834471 'r' 1330834471 'public' 521 'e' 1186 24 4 -1 false false] 172 real
[521 't3' 1330834471 'r' 1330834471 'public' 521 'a' 20 8 1 -1 true false] 172 real
[521 't3' 1330834471 'r' 1330834471 'public' 521 'b' 25 -1 2 -1 false false] 172 real
[521 't3' 1330834471 'r' 1330834471 'public' 521 'c' 1082 16 3 -1 false false] 172 real
[522 't4' 1330834471 'r' 1330834471 'public' 522 'e' 1186 24 4 -1 false false] 172 real
[522 't4' 1330834471 'r' 1330834471 'public' 522 'a' 20 8 1 -1 true false] 172 real
[522 't4' 1330834471 'r' 1330834471 'public' 522 'b' 25 -1 2 -1 false false] 172 real
[522 't4' 1330834471 'r' 1330834471 'public' 522 'c' 1082 16 3 -1 false false] 172 real
[523 't5' 1330834471 'r' 1330834471 'public' 523 'e' 1186 24 4 -1 false false] 172 real
[523 't5' 1330834471 'r' 1330834471 'public' 523 'a' 20 8 1 -1 true false] 172 real
[523 't5' 1330834471 'r' 1330834471 'public' 523 'b' 25 -1 2 -1 false false] 172 real
[523 't5' 1330834471 'r' 1330834471 'public' 523 'c' 1082 16 3 -1 false false] 172 real
[524 't6' 1330834471 'r' 1330834471 'public' 524 'e' 1186 24 4 -1 false false] 172 real
[524 't6' 1330834471 'r' 1330834471 'public' 524 'a' 20 8 1 -1 true false] 172 real
[524 't6' 1330834471 'r' 1330834471 'public' 524 'b' 25 -1 2 -1 false false] 172 real
[524 't6' 1330834471 'r' 1330834471 'public' 524 'c' 1082 16 3 -1 false false] 172 real
[525 't7' 1330834471 'r' 1330834471 'public' 525 'e' 1186 24 4 -1 false false] 172 real
[525 't7' 1330834471 'r' 1330834471 'public' 525 'a' 20 8 1 -1 true false] 172 real
[525 't7' 1330834471 'r' 1330834471 'public' 525 'b' 25 -1 2 -1 false false] 172 real
[525 't7' 1330834471 'r' 1330834471 'public' 525 'c' 1082 16 3 -1 false false] 172 real
[526 't8' 1330834471 'r' 1330834471 'public' 526 'e' 1186 24 4 -1 false false] 172 real
[526 't8' 1330834471 'r' 1330834471 'public' 526 'a' 20 8 1 -1 true false] 172 real
[526 't8' 1330834471 'r' 1330834471 'public' 526 'b' 25 -1 2 -1 false false] 172 real
[526 't8' 1330834471 'r' 1330834471 'public' 526 'c' 1082 16 3 -1 false false] 172 real
[527 't9' 1330834471 'r' 1330834471 'public' 527 'e' 1186 24 4 -1 false false] 172 real
[527 't9' 1330834471 'r' 1330834471 'public' 527 'a' 20 8 1 -1 true false] 172 real
[527 't9' 1330834471 'r' 1330834471 'public' 527 'b' 25 -1 2 -1 false false] 172 real
[527 't9' 1330834471 'r' 1330834471 'public' 527 'c' 1082 16 3 -1 false false] 172 real
[528 't10' 1330834471 'r' 1330834471 'public' 528 'e' 1186 24 4 -1 false false] 173 real
[528 't10' 1330834471 'r' 1330834471 'public' 528 'a' 20 8 1 -1 true false] 173 real
[528 't10' 1330834471 'r' 1330834471 'public' 528 'b' 25 -1 2 -1 false false] 173 real
[528 't10' 1330834471 'r' 1330834471 'public' 528 'c' 1082 16 3 -1 false false] 173 real
[529 't11' 1330834471 'r' 1330834471 'public' 529 'e' 1186 24 4 -1 false false] 173 real
[529 't11' 1330834471 'r' 1330834471 'public' 529 'a' 20 8 1 -1 true false] 173 real
[529 't11' 1330834471 'r' 1330834471 'public' 529 'b' 25 -1 2 -1 false false] 173 real
[529 't11' 1330834471 'r' 1330834471 'public' 529 'c' 1082 16 3 -1 false false] 173 real
[530 't12' 1330834471 'r' 1330834471 'public' 530 'e' 1186 24 4 -1 false false] 173 real
[530 't12' 1330834471 'r' 1330834471 'public' 530 'a' 20 8 1 -1 true false] 173 real
[530 't12' 1330834471 'r' 1330834471 'public' 530 'b' 25 -1 2 -1 false false] 173 real
[530 't12' 1330834471 'r' 1330834471 'public' 530 'c' 1082 16 3 -1 false false] 173 real
[531 't13' 1330834471 'r' 1330834471 'public' 531 'e' 1186 24 4 -1 false false] 173 real
[531 't13' 1330834471 'r' 1330834471 'public' 531 'a' 20 8 1 -1 true false] 173 real
[531 't13' 1330834471 'r' 1330834471 'public' 531 'b' 25 -1 2 -1 false false] 173 real
[531 't13' 1330834471 'r' 1330834471 'public' 531 'c' 1082 16 3 -1 false false] 173 real
[531 't13' 1330834471 'r' 1330834471 'public' 531 'e' 1186 24 4 -1 false false] 173 real

jordanlewis · 2020-10-02T21:07:31Z

The rows on the output side are also out of order... something seems really messed up

jordanlewis · 2020-10-03T16:05:56Z

It was a memory aliasing issue, a bug in the vTableLookupJoinNode. Here's a diff that fixes it:

@@ -271,7 +277,9 @@ func (v *vTableLookupJoinNode) Next(params runParams) (bool, error) {
        for {
                // Check if there are any rows left to emit from the last input row.
                if v.run.rows.Len() > 0 {
-                       v.run.row = v.run.rows.At(0)
+                       copy(v.run.row, v.run.rows.At(0))

this only explains the crash, not the slowdown, though.

This commit adds a new function called ConstantWithTestValue that is designed to be used to initialize "magic constants", like batch sizes, that cause edge conditions in the code. ConstantWithTestValue should be used to initialize "magic constants" that should be varied during test scenarios to check for bugs at boundary conditions. When built with the test_constants build tag, the test value will be used. In all other cases, the production value will be used. An example of a "magic constant" that behaves this way is a batch size. Batch sizes tend to present testing problems, because often the logic that deals with what to do when a batch is finished is less likely to be exercised by simple unit tests that don't use enough data to fill up a batch. For example, instead of writing: ``` const batchSize = 64 ``` you should write: ``` var batchSize = util.ConstantWithTestValue(64, 1) ``` This will give your code a batch size of 1 in the test_constants build configuration, increasing the amount of exercise the edge conditions get. This commit also adds several uses of `ConstantWithTestValue` in the SQL package. One of them in particular (the datum row container chunk size) was tailored to catch cockroachdb#55140. Release note: None

This commit adds a cost equal to 10*randIOCostFactor for each virtual scan in order to represent the cost of fetching table descriptors. This cost is especially important when perfoming lookup joins, because the descriptors may need to be fetched on each lookup. As a result of this change, the optimizer is much less likely to plan a lookup join into a virtual table. This commit also includes some fixes to the test catalog to provide better support for testing virtual tables with indexes. Fixes cockroachdb#55140 Release note (performance improvement): Adjusted the cost model in the optimizer so that the optimizer is less likely to plan a lookup join into a virtual table. Performing a lookup join into a virtual table is expensive, so this change will generally result in better performance for queries involving joins with virtual tables.

55833: opt: add cost for table descriptor fetch during virtual scan r=rytaft a=rytaft This commit adds a cost equal to `10*randIOCostFactor` for each virtual scan in order to represent the cost of fetching table descriptors. This cost is especially important when perfoming lookup joins, because the descriptors may need to be fetched on each lookup. As a result of this change, the optimizer is much less likely to plan a lookup join into a virtual table. Fixes #55140 Release note (performance improvement): Adjusted the cost model in the optimizer so that the optimizer is less likely to plan a lookup join into a virtual table. Performing a lookup join into a virtual table is expensive, so this change will generally result in better performance for queries involving joins with virtual tables. Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>

This commit adds a cost equal to 10*randIOCostFactor for each virtual scan in order to represent the cost of fetching table descriptors. This cost is especially important when perfoming lookup joins, because the descriptors may need to be fetched on each lookup. As a result of this change, the optimizer is much less likely to plan a lookup join into a virtual table. This commit also includes some fixes to the test catalog to provide better support for testing virtual tables with indexes. Fixes cockroachdb#55140 Release note (performance improvement): Adjusted the cost model in the optimizer so that the optimizer is less likely to plan a lookup join into a virtual table. Performing a lookup join into a virtual table is expensive, so this change will generally result in better performance for queries involving joins with virtual tables.

rafiss · 2020-10-29T15:27:59Z

@rytaft apologies for noticing this late, but I just tried running my reproduction above (#55140 (comment)) and I found that the query is still taking around 6 seconds when I test with the latest version from master (commit f945cae), as opposed to ~50ms when I run with v20.1.3 locally.

I'm going to re-open and hopefully we can diagnose this further.

rytaft · 2020-10-29T16:06:30Z

Hmm -- I did run those reproduction steps when I was working on this PR and confirmed that the new plan was significantly faster than the old one (at least an order of magnitude).

Are you sure that what you're seeing is plan-related?

rafiss · 2020-10-30T16:11:59Z

Here is the plan I am getting now (taking 6 seconds to execute)

                                                         info
-----------------------------------------------------------------------------------------------------------------------
  distribution: local
  vectorized: false

  • sort
  │ order: +relname,+row_number
  │
  └── • render
      │
      └── • filter
          │ filter: attname LIKE '%'
          │
          └── • window
              │
              └── • merge join
                  │ equality: (oid) = (atttypid)
                  │
                  ├── • virtual table
                  │     table: pg_type@pg_type_oid_idx
                  │
                  └── • sort
                      │ order: +atttypid
                      │
                      └── • hash join (left outer)
                          │ equality: (classoid) = (oid)
                          │
                          ├── • hash join (right outer)
                          │   │ equality: (adrelid, adnum) = (attrelid, attnum)
                          │   │
                          │   ├── • filter
                          │   │   │ filter: adnum > 0
                          │   │   │
                          │   │   └── • virtual table
                          │   │         table: pg_attrdef@primary
                          │   │
                          │   └── • hash join (right outer)
                          │       │ equality: (objoid, objsubid) = (oid, attnum)
                          │       │
                          │       ├── • filter
                          │       │   │ filter: objsubid > 0
                          │       │   │
                          │       │   └── • virtual table
                          │       │         table: pg_description@primary
                          │       │
                          │       └── • virtual table lookup join
                          │           │ table: pg_attribute@pg_attribute_attrelid_idx
                          │           │ equality: (oid) = (attrelid)
                          │           │ pred: (attnum > 0) AND (NOT attisdropped)
                          │           │
                          │           └── • hash join
                          │               │ equality: (relnamespace) = (oid)
                          │               │
                          │               ├── • filter
                          │               │   │ filter: (relkind IN ('f', 'm', 'p', 'r', 'v')) AND (relname LIKE '%')
                          │               │   │
                          │               │   └── • virtual table
                          │               │         table: pg_class@primary
                          │               │
                          │               └── • filter
                          │                   │ filter: nspname LIKE 'public'
                          │                   │
                          │                   └── • virtual table
                          │                         table: pg_namespace@primary
                          │
                          └── • hash join (left outer)
                              │ equality: (relnamespace) = (oid)
                              │
                              ├── • filter
                              │   │ filter: relname = 'pg_class'
                              │   │
                              │   └── • virtual table
                              │         table: pg_class@primary
                              │
                              └── • filter
                                  │ filter: nspname = 'pg_catalog'
                                  │
                                  └── • virtual table
                                        table: pg_namespace@primary

Here is explain analyze output.

It looks like the EXPLAIN format itself has changed, but it also does seem like this is different plan... There is only one virtual table lookup join now, but it seems to be taking 5.6 seconds. Not sure what's going on (or what is expected) yet.

This commit adds a cost equal to 10*randIOCostFactor for each virtual scan in order to represent the cost of fetching table descriptors. This cost is especially important when perfoming lookup joins, because the descriptors may need to be fetched on each lookup. As a result of this change, the optimizer is much less likely to plan a lookup join into a virtual table. This commit also includes some fixes to the test catalog to provide better support for testing virtual tables with indexes. Fixes cockroachdb#55140 Release note (performance improvement): Adjusted the cost model in the optimizer so that the optimizer is less likely to plan a lookup join into a virtual table. Performing a lookup join into a virtual table is expensive, so this change will generally result in better performance for queries involving joins with virtual tables.

This commit bumps the cost of each virtual scan to 25*randIOCostFactor from its previous value of 10*randIOCostFactor. This new value threads the needle so that a lookup join will still be chosen if the predicate is very selective, but the plan for the PGJDBC query identified in cockroachdb#55140 no longer includes lookup joins. Fixes cockroachdb#55140 Release note (performance improvement): Adjusted the cost model in the optimizer so that the optimizer is less likely to plan a lookup join into a virtual table. Performing a lookup join into a virtual table is expensive, so this change will generally result in better performance for queries involving joins with virtual tables.

55808: kvserver: skip non-live nodes when considering candidates for transfers r=tbg a=knz (This PR is forked off #55460 to simplify the discussion. I believe there's no discussion left here? Maybe I can merge it directly?) Fixes #55440. Prior to this patch, 3 components could attempt to transfer a replica to a node currently being drained: - the store rebalancer, which rebalances replicas based on disk usage and QPS. - the allocator, to place new replicas. - the allocator, to rebalance replicas depending on load. This commit introduces a consideration for node liveness when building the list of candidates, to detect whether a target node is acceptable. Any node that is not LIVE according to its liveness status is not considered for a transfer. Release note (bug fix): In some cases CockroachDB would attempt to transfer ranges to nodes in the process of being decommissioned or being shut down; this could cause disruption the moment the node did actually terminate. This bug has been fixed. It had been introduced some time before v2.0. 56334: kvserver: use messages on NotLeaseholderErrors everywhere r=andreimatei a=andreimatei NLHE permits custom messages in it, but the field was rarely used. This patch makes every instance where we instantiate the error provide a message, since this error comes from a wide variety of conditions. Release note: None 56345: opt: increase cost for table descriptor fetch during virtual scan r=rytaft a=rytaft This commit bumps the cost of each virtual scan to `25*randIOCostFactor` from its previous value of `10*randIOCostFactor`. This new value threads the needle so that a lookup join will still be chosen if the predicate is very selective, but the plan for the PGJDBC query identified in #55140 no longer includes lookup joins. Fixes #55140 Release note (performance improvement): Adjusted the cost model in the optimizer so that the optimizer is less likely to plan a lookup join into a virtual table. Performing a lookup join into a virtual table is expensive, so this change will generally result in better performance for queries involving joins with virtual tables. 56525: bazel: Move third party repositories to c-deps/REPOSITORIES.bzl r=otan a=alan-mas bazel: Move third party repositories to c-deps/REPOSITORIES.bzl This is one of the Bazel re-factoring that we are working on and it is about to move third party repositories out of root WORKSPACE. fixes #56053 Best practices is to separate external dependencies and it also hides the repo WORKSPACE from being used by other directories. We are creating a new .bzl file inside c-deps with all the external dependencies and then load then inside our root WORKSPACE Release note: None 56589: sql: resolve error due to drop table after schema change in same txn r=ajwerner a=jayshrivastava Previously, if a drop table statement was executed in a transaction following other schema changes to the table in the same transaction, an error would occur. This error was due to the drop table statement marking previous jobs as succeeded and then proceeding to modify them. This change ensures that drop table statement will delete all existing jobs from the job cache so that it does not interfere with previous jobs. Release note (sql change): A table can successfully be dropped in a transaction following other schema changes to the table in the same transaction. This resolves one of the issues in #56235 56597: colflow: fix recent misuse of two slices in the flow setup r=yuzefovich a=yuzefovich We've recently added the reusing of metadataSourcesQueue and toClose slices in order to reduce some allocations. However, the components that are using those slices don't make a deep copy, and as a result, we introduced a bug in which we were breaking the current contract. This commit fixes the issue by going back to the old method (with slight difference in that we currently delay any allocations unlike previously when we allocated a slice with capacity of 1). Release note: None (no release with this bug) 56598: tree: introduce concept of "default" collation r=rafiss a=otan Resolves #54989 Release note (sql change): Introduced a pg_collation of "default". Strings now return the "default" collation OID in the pg_attribute table (this was previously en_US). The "default" collation is also visible on the pg_collation virtual table. 56602: roachpb: remove various `(gogoproto.equal)` options r=nvanbenschoten a=tbg The first commit explains why some cleanup was necessary, the others are the result of spending a little extra time cleaning up "unnecessarily". There are plenty of Equals left to clean up, but the returns were diminishing. The important part is that when additions to the KV API are made, nobody will be forced to add the `equal` option any more. - roachpb: don't generate Equal() on Error - roachpb: remove more `Equal` methods - roachpb: remove (gogoproto.equal) from api.proto - roachpb: mostly remove (gogoproto.equal) from data.proto - roachpb: remove Value.Equal - kvserverpb: remove Equal from ReplicatedEvalResult Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: Rebecca Taft <becca@cockroachlabs.com> Co-authored-by: Alanmas <acostas.alan@gmail.com> Co-authored-by: Jayant Shrivastava <jayants@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Oliver Tan <otan@cockroachlabs.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>

rafiss added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-memmon SQL memory monitoring labels Oct 1, 2020

yuzefovich mentioned this issue Oct 1, 2020

sql: v20.2.0-alpha.2: panic with "no bytes in account to release" #53416

Closed

rafiss changed the title ~~sql: panic in memory accounting caused by liquibase query to fetch columns~~ sql: panic in memory accounting caused by pgjdbc query to fetch columns Oct 2, 2020

jordanlewis mentioned this issue Oct 3, 2020

sql: bugfix to vtable lookup joins #55212

Merged

jordanlewis mentioned this issue Oct 5, 2020

util: introduce ConstantWithTestValue; use in sql #55223

Merged

craig bot closed this as completed in dcec6ee Oct 24, 2020

rafiss mentioned this issue Oct 26, 2020

release-20.2: opt: add cost for table descriptor fetch during virtual scan #56000

Closed

rafiss reopened this Oct 29, 2020

rytaft mentioned this issue Nov 5, 2020

opt: increase cost for table descriptor fetch during virtual scan #56345

Merged

craig bot closed this as completed in 0830131 Nov 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: 100x performance degradation in pgjdbc query to fetch columns #55140

sql: 100x performance degradation in pgjdbc query to fetch columns #55140

rafiss commented Oct 1, 2020 •

edited

Loading

yuzefovich commented Oct 1, 2020

yuzefovich commented Oct 1, 2020

rafiss commented Oct 1, 2020 •

edited

Loading

rafiss commented Oct 1, 2020

rafiss commented Oct 2, 2020

rafiss commented Oct 2, 2020

jordanlewis commented Oct 2, 2020

rafiss commented Oct 2, 2020 •

edited

Loading

jordanlewis commented Oct 2, 2020

jordanlewis commented Oct 2, 2020

rafiss commented Oct 2, 2020

rafiss commented Oct 2, 2020

jordanlewis commented Oct 2, 2020

jordanlewis commented Oct 2, 2020

rafiss commented Oct 2, 2020 •

edited

Loading

jordanlewis commented Oct 2, 2020

jordanlewis commented Oct 2, 2020

jordanlewis commented Oct 2, 2020

jordanlewis commented Oct 2, 2020

jordanlewis commented Oct 3, 2020

rafiss commented Oct 29, 2020

rytaft commented Oct 29, 2020

rafiss commented Oct 30, 2020

sql: 100x performance degradation in pgjdbc query to fetch columns #55140

sql: 100x performance degradation in pgjdbc query to fetch columns #55140

Comments

rafiss commented Oct 1, 2020 • edited Loading

yuzefovich commented Oct 1, 2020

yuzefovich commented Oct 1, 2020

rafiss commented Oct 1, 2020 • edited Loading

rafiss commented Oct 1, 2020

rafiss commented Oct 2, 2020

rafiss commented Oct 2, 2020

Steps:

jordanlewis commented Oct 2, 2020

rafiss commented Oct 2, 2020 • edited Loading

jordanlewis commented Oct 2, 2020

jordanlewis commented Oct 2, 2020

rafiss commented Oct 2, 2020

rafiss commented Oct 2, 2020

jordanlewis commented Oct 2, 2020

jordanlewis commented Oct 2, 2020

rafiss commented Oct 2, 2020 • edited Loading

jordanlewis commented Oct 2, 2020

jordanlewis commented Oct 2, 2020

jordanlewis commented Oct 2, 2020

jordanlewis commented Oct 2, 2020

jordanlewis commented Oct 3, 2020

rafiss commented Oct 29, 2020

rytaft commented Oct 29, 2020

rafiss commented Oct 30, 2020

rafiss commented Oct 1, 2020 •

edited

Loading

rafiss commented Oct 1, 2020 •

edited

Loading

rafiss commented Oct 2, 2020 •

edited

Loading

rafiss commented Oct 2, 2020 •

edited

Loading