Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

row: fix crash during tracing given un-decodable datum #55783

Merged
merged 1 commit into from
Oct 22, 2020

Conversation

jordanlewis
Copy link
Member

Previously, if tracing was enabled on a query that read bytes that
weren't decodable into datum, the database would crash.

Release note (bug fix): fix rare crash during tracing when reading
un-decodable data.

Previously, if tracing was enabled on a query that read bytes that
weren't decodable into datum, the database would crash.

Release note (bug fix): fix rare crash during tracing when reading
un-decodable data.
@jordanlewis jordanlewis requested a review from a team October 20, 2020 21:08
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jordanlewis
Copy link
Member Author

@rytaft I ran into this crash when experimenting with some of the inverted joins, which seems a little suspicious but maybe expected. Repro instructions:

demo@127.0.0.1:54106/defaultdb> create table a (a int[], inverted index(a));
CREATE TABLE

Time: 3ms total (execution 3ms / network 0ms)

demo@127.0.0.1:54106/defaultdb> create table b (b int[], inverted index(b));
CREATE TABLE

Time: 3ms total (execution 3ms / network 0ms)

demo@127.0.0.1:54106/defaultdb> insert into a values(array[1,2,3,4]);
INSERT 1

Time: 3ms total (execution 3ms / network 0ms)

demo@127.0.0.1:54106/defaultdb> insert into b values(array[1,2,3,4]);
INSERT 1

Time: 3ms total (execution 3ms / network 0ms)

demo@127.0.0.1:54106/defaultdb> select * from a@a_a_idx join b on a @> b;
      a     |     b
------------+------------
  {1,2,3,4} | {1,2,3,4}
(1 row)

Time: 1ms total (execution 1ms / network 0ms)

demo@127.0.0.1:54106/defaultdb> \set auto_trace=on,kv
demo@127.0.0.1:54106/defaultdb> select * from a@a_a_idx join b on a @> b;
*
* ERROR: [n1,client=127.0.0.1:54107,hostssl,user=demo] a SQL panic has occurred while executing the following statement:
* SELECT * FROM a@a_a_idx JOIN b ON a @> b
*
*
* ERROR: [n1,client=127.0.0.1:54107,hostssl,user=demo] a panic has occurred!
* runtime error: invalid memory address or nil pointer dereference
* (1) attached stack trace
*   -- stack trace:
*   | github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn.func1
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:479
*   | [...repeated from below...]
* Wraps: (2) while executing: SELECT * FROM _@_ JOIN _ ON _ @> _
* Wraps: (3) attached stack trace
*   -- stack trace:
*   | github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn.func1
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:479
*   | runtime.gopanic
*   | 	/usr/local/Cellar/go@1.13/1.13.15/libexec/src/runtime/panic.go:679
*   | runtime.panicmem
*   | 	/usr/local/Cellar/go@1.13/1.13.15/libexec/src/runtime/panic.go:199
*   | runtime.sigpanic
*   | 	/usr/local/Cellar/go@1.13/1.13.15/libexec/src/runtime/signal_unix.go:394
*   | github.com/cockroachdb/cockroach/pkg/sql/row.(*Fetcher).prettyEncDatums
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/fetcher.go:811
*   | github.com/cockroachdb/cockroach/pkg/sql/row.(*Fetcher).processKV
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/fetcher.go:926
*   | github.com/cockroachdb/cockroach/pkg/sql/row.(*Fetcher).NextRow
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/row/fetcher.go:1250
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*rowFetcherStatCollector).NextRow
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/stats.go:143
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*invertedJoiner).performScan
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_joiner.go:484
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*invertedJoiner).Next
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/inverted_joiner.go:377
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*inputStatCollector).Next
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/stats.go:66
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*joinReader).readInput
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/joinreader.go:483
*   | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*joinReader).Next
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/rowexec/joinreader.go:438
*   | github.com/cockroachdb/cockroach/pkg/sql/execinfra.Run
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/base.go:170
*   | github.com/cockroachdb/cockroach/pkg/sql/execinfra.(*ProcessorBase).Run
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/execinfra/processorsbase.go:775
*   | github.com/cockroachdb/cockroach/pkg/sql/flowinfra.(*FlowBase).Run
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow.go:392
*   | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).Run
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:422
*   | github.com/cockroachdb/cockroach/pkg/sql.(*DistSQLPlanner).PlanAndRun
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:996
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execWithDistSQLEngine
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1002
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).dispatchToExecutionEngine
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:873
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:639
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:114
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCmd.func1
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1443
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCmd
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1445
*   | github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1369
*   | github.com/cockroachdb/cockroach/pkg/sql.(*Server).ServeConn
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:481
*   | github.com/cockroachdb/cockroach/pkg/sql/pgwire.(*conn).processCommandsAsync.func1
*   | 	/Users/jordan/go/src/github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:627
*   | runtime.goexit
*   | 	/usr/local/Cellar/go@1.13/1.13.15/libexec/src/runtime/asm_amd64.s:1357

After this patch, the trace looks like the following. Does this make sense to you?

               timestamp              |       age       |                                                   message                                                    |                            tag                             |                location                 |    operation     | span
--------------------------------------+-----------------+--------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+-----------------------------------------+------------------+-------
  2020-10-20 21:03:20.019292+00:00:00 | 00:00:00.000832 | querying next range at /Table/60/1                                                                           | [n1,client=127.0.0.1:54170,hostssl,user=demo]              | kv/kvclient/kvcoord/range_iter.go:159   | exec stmt        |    4
  2020-10-20 21:03:20.019492+00:00:00 | 00:00:00.001032 | Scan /Table/60/{1-2}                                                                                         | [n1,client=127.0.0.1:54170,hostssl,user=demo]              | sql/row/kv_batch_fetcher.go:337         | table reader     |    7
  2020-10-20 21:03:20.019521+00:00:00 | 00:00:00.001061 | querying next range at /Table/60/1                                                                           | [n1,client=127.0.0.1:54170,hostssl,user=demo,txn=13e15291] | kv/kvclient/kvcoord/range_iter.go:159   | dist sender send |    9
  2020-10-20 21:03:20.019558+00:00:00 | 00:00:00.001098 | r67: sending batch 1 Scan to (n1,s1):1                                                                       | [n1,client=127.0.0.1:54170,hostssl,user=demo,txn=13e15291] | kv/kvclient/kvcoord/dist_sender.go:1850 | dist sender send |    9
  2020-10-20 21:03:20.019803+00:00:00 | 00:00:00.001343 | fetched: /b/primary/600170157219414017/b -> /ARRAY[1,2,3,4]                                                  | [n1,client=127.0.0.1:54170,hostssl,user=demo]              | sql/row/fetcher.go:1256                 | table reader     |    7
  2020-10-20 21:03:20.019901+00:00:00 | 00:00:00.001441 | Scan /Table/59/2/{1-5}                                                                                       | [n1,client=127.0.0.1:54170,hostssl,user=demo]              | sql/row/kv_batch_fetcher.go:337         | inverted joiner  |   11
  2020-10-20 21:03:20.019919+00:00:00 | 00:00:00.001459 | querying next range at /Table/59/2/1                                                                         | [n1,client=127.0.0.1:54170,hostssl,user=demo,txn=13e15291] | kv/kvclient/kvcoord/range_iter.go:159   | dist sender send |   14
  2020-10-20 21:03:20.019954+00:00:00 | 00:00:00.001494 | r66: sending batch 1 Scan to (n1,s1):1                                                                       | [n1,client=127.0.0.1:54170,hostssl,user=demo,txn=13e15291] | kv/kvclient/kvcoord/dist_sender.go:1850 | dist sender send |   14
  2020-10-20 21:03:20.020164+00:00:00 | 00:00:00.001704 | fetched: /a/a_a_idxerror decoding: error decoding 1 bytes: invalid type found Int/600170152326430721 -> NULL | [n1,client=127.0.0.1:54170,hostssl,user=demo]              | sql/row/fetcher.go:1256                 | inverted joiner  |   11
  2020-10-20 21:03:20.020221+00:00:00 | 00:00:00.001761 | fetched: /a/a_a_idxerror decoding: error decoding 1 bytes: invalid type found Int/600170152326430721 -> NULL | [n1,client=127.0.0.1:54170,hostssl,user=demo]              | sql/row/fetcher.go:1256                 | inverted joiner  |   11
  2020-10-20 21:03:20.020254+00:00:00 | 00:00:00.001794 | fetched: /a/a_a_idxerror decoding: error decoding 1 bytes: invalid type found Int/600170152326430721 -> NULL | [n1,client=127.0.0.1:54170,hostssl,user=demo]              | sql/row/fetcher.go:1256                 | inverted joiner  |   11
  2020-10-20 21:03:20.020284+00:00:00 | 00:00:00.001824 | fetched: /a/a_a_idxerror decoding: error decoding 1 bytes: invalid type found Int/600170152326430721 -> NULL | [n1,client=127.0.0.1:54170,hostssl,user=demo]              | sql/row/fetcher.go:1256                 | inverted joiner  |   11
  2020-10-20 21:03:20.020408+00:00:00 | 00:00:00.001948 | Scan /Table/59/1/600170152326430721{-/#}                                                                     | [n1,client=127.0.0.1:54170,hostssl,user=demo]              | sql/row/kv_batch_fetcher.go:337         | join reader      |   12
  2020-10-20 21:03:20.020428+00:00:00 | 00:00:00.001968 | querying next range at /Table/59/1/600170152326430721                                                        | [n1,client=127.0.0.1:54170,hostssl,user=demo,txn=13e15291] | kv/kvclient/kvcoord/range_iter.go:159   | dist sender send |   17
  2020-10-20 21:03:20.02045+00:00:00  | 00:00:00.00199  | r66: sending batch 1 Scan to (n1,s1):1                                                                       | [n1,client=127.0.0.1:54170,hostssl,user=demo,txn=13e15291] | kv/kvclient/kvcoord/dist_sender.go:1850 | dist sender send |   17
  2020-10-20 21:03:20.020578+00:00:00 | 00:00:00.002118 | fetched: /a/primary/600170152326430721/a -> /ARRAY[1,2,3,4]                                                  | [n1,client=127.0.0.1:54170,hostssl,user=demo]              | sql/row/fetcher.go:1256                 | join reader      |   12
  2020-10-20 21:03:20.020676+00:00:00 | 00:00:00.002216 | rows affected: 1                                                                                             | [n1,client=127.0.0.1:54170,hostssl,user=demo]              | sql/conn_executor_exec.go:639           | exec stmt        |    4
(17 rows)

Time: 4ms total (execution 3ms / network 1ms)

@rytaft
Copy link
Collaborator

rytaft commented Oct 20, 2020

Hey Jordan -- yea, that looks like what I would expect. The inverted joiner has to fetch the four keys in ARRAY[1,2,3,4] in the inverted index on a in order to evaluate the predicate. Then it has to join back with the primary index of a to get the array column.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how we would behave if cfetcher is used: maybe there won't be an issue because we are doing a poor job of collecting traces in the vectorized engine, or maybe it'll be fine.

By the way, which sha is this on? I tried to answer my question myself, but I'm getting ERROR: index "a_a_idx" is inverted and cannot be used for this query.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@rytaft
Copy link
Collaborator

rytaft commented Oct 22, 2020

I assume he's testing with #55679

@jordanlewis
Copy link
Member Author

Yes, exactly @rytaft.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

FYI if we have a vectorized flow, we're simply catching the null pointer exception, so there wouldn't be a crash.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

@jordanlewis
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Oct 22, 2020

Build succeeded:

@craig craig bot merged commit 3853555 into cockroachdb:master Oct 22, 2020
@jordanlewis jordanlewis deleted the fix-fetcher-decode-error branch November 6, 2020 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants