Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: improve SHOW TABLES to show row count #52203

Merged
merged 1 commit into from
Aug 10, 2020

Conversation

ekalinin
Copy link
Contributor

@ekalinin ekalinin commented Jul 31, 2020

Fixes #48755

This change adds column estimated_row_count into result of the "SHOW TABLES"
to show estimated (not real) number of rows.

Besides that the body of the delegator.delegateShowTables was a bit
simplified (we do not need two almost similar versions of the query) and
new virtual table crdb_internal.table_row_statistics was added (to
show stats for non-root users).

Release note (sql change): This change modifies SHOW TABLES to return
estimates number of rows. New column's name is estimated_row_count.
Number of rows is taken from system.table_statistics table (via
crdb_internal.table_row_statistics which shows only tables accessible
for current user).

@blathers-crl
Copy link

blathers-crl bot commented Jul 31, 2020

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

I have added a few people who may be able to assist in reviewing:

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Jul 31, 2020
@blathers-crl blathers-crl bot requested a review from awoods187 July 31, 2020 21:00
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ekalinin ekalinin force-pushed the show-tables-add-rows-estimate branch from 78c3875 to 36c0f80 Compare July 31, 2020 23:19
@ekalinin ekalinin requested review from a team and dt and removed request for a team July 31, 2020 23:19
@dt dt requested review from rohany and a team and removed request for dt July 31, 2020 23:21
@rohany
Copy link
Contributor

rohany commented Aug 1, 2020

Thanks for the contribution @ekalinin! I gave this a try earlier and ran into a problem that I think your implementation here also hits. The problem is that only the admin user is allowed to access the system.table_statistics table. A non-admin user is going to get a permission denied error running SHOW TABLES even if they have permissions to see all of the tables that they want to access statistics on. It is a limitation with the current way we do these delegate statements. cc @jordanlewis this is what I was talking about re delegate permissions are annoying.

@rohany
Copy link
Contributor

rohany commented Aug 1, 2020

A potential solution is to implement a new crdb internal virtual table that exposes the latest statistics on tables, and then restricts its own output to only the tables that the current user has privileges to access. Finally, the delegate command can join against it.

If this sounds like something you want to take on, I can link a commit that adds a new virtual table.

pkg/sql/delegate/show_tables.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @ekalinin, and @rohany)


pkg/sql/delegate/show_tables.go, line 54 at r1 (raw file):

            ELSE 'table'
       END                                AS "type"
     , (select max("rowCount")

The maximum may not be the most accurate. You'll probably want to look at the "createdAt" column to get the most recent value for the given table.

@ekalinin
Copy link
Contributor Author

ekalinin commented Aug 2, 2020

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @ekalinin, and @rohany)

pkg/sql/delegate/show_tables.go, line 54 at r1 (raw file):

            ELSE 'table'
       END                                AS "type"
     , (select max("rowCount")

The maximum may not be the most accurate. You'll probably want to look at the "createdAt" column to get the most recent value for the given table.

Thanks. Fixed.

@ekalinin ekalinin force-pushed the show-tables-add-rows-estimate branch from 36c0f80 to 0bb6dd6 Compare August 2, 2020 22:03
@blathers-crl
Copy link

blathers-crl bot commented Aug 2, 2020

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@ekalinin
Copy link
Contributor Author

ekalinin commented Aug 2, 2020

A potential solution is to implement a new crdb internal virtual table that exposes the latest statistics on tables, and then restricts its own output to only the tables that the current user has privileges to access. Finally, the delegate command can join against it.

@rohany sounds like a plan :).

If this sounds like something you want to take on, I can link a commit that adds a new virtual table.

Looks good to me. Could you link that commit, please?

Thanks.

@ekalinin ekalinin force-pushed the show-tables-add-rows-estimate branch from 0bb6dd6 to 206dd21 Compare August 2, 2020 22:27
@ekalinin
Copy link
Contributor Author

ekalinin commented Aug 2, 2020

Thanks for the contribution @ekalinin! I gave this a try earlier and ran into a problem that I think your implementation here also hits. The problem is that only the admin user is allowed to access the system.table_statistics table. A non-admin user is going to get a permission denied error running SHOW TABLES even if they have permissions to see all of the tables that they want to access statistics on. It is a limitation with the current way we do these delegate statements.

@rohany Are there any contraindications for allowing all users to read system.table_statistics?

I've updated the tests with this logic (moved a table into a separate db and added GRANT SELECT ON system.table_statistics):

statement ok
CREATE DATABASE rowtest

statement ok
GRANT CREATE, INSERT, SELECT, DELETE ON DATABASE rowtest TO testuser

statement ok
GRANT SELECT ON system.table_statistics TO testuser

user testuser

statement ok
CREATE TABLE rowtest.t1 (a INT)

statement ok
INSERT INTO rowtest.t1 SELECT a FROM generate_series(1, 1024) AS a(a)

query T
SELECT rows FROM [SHOW TABLES from rowtest]
----
NULL

statement ok
ANALYZE rowtest.t1

query I
SELECT rows FROM [SHOW TABLES from rowtest] where table_name = 't1'
----
1024

...

And it works good.

(see full test here: https://github.com/cockroachdb/cockroach/pull/52203/files#diff-cdf15f008dd1eaef8d7900a2736e8a97 )

@rohany
Copy link
Contributor

rohany commented Aug 3, 2020

Looks good to me. Could you link that commit, please?

#49771

Are there any contraindications for allowing all users to read system.table_statistics?

Yes -- we can't allow users to get information about tables that they don't have permissions for.

@ekalinin ekalinin force-pushed the show-tables-add-rows-estimate branch 2 times, most recently from 7b25483 to 45d614d Compare August 3, 2020 23:24
@ekalinin
Copy link
Contributor Author

ekalinin commented Aug 3, 2020

Looks good to me. Could you link that commit, please?

#49771

Thanks.

Just added a crdbInternalTablesTableLastStats (crdb_internal.table_row_stats). PTAL.
Am I heading in the right direction?
(tests are not fixed yet)

Are there any contraindications for allowing all users to read system.table_statistics?

Yes -- we can't allow users to get information about tables that they don't have permissions for.

Got it.

Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this is definitely on the right track!

pkg/sql/crdb_internal.go Outdated Show resolved Hide resolved
pkg/sql/crdb_internal.go Outdated Show resolved Hide resolved
pkg/sql/crdb_internal.go Outdated Show resolved Hide resolved
pkg/sql/crdb_internal.go Outdated Show resolved Hide resolved
@ekalinin ekalinin force-pushed the show-tables-add-rows-estimate branch 3 times, most recently from b34aadb to ae78004 Compare August 4, 2020 17:14
@ekalinin
Copy link
Contributor Author

ekalinin commented Aug 4, 2020

Fixed some tests. PTAL.

@ekalinin ekalinin force-pushed the show-tables-add-rows-estimate branch from decf9a7 to 28bac8f Compare August 4, 2020 21:17
@ekalinin ekalinin requested a review from a team as a code owner August 4, 2020 21:17
Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if background stats collection could result in these test's row counts flaking. Could that happen @rytaft?

Also, what are the thoughts on displaying this row count only if requested, like SHOW TABLES ... WITH ROW COUNT or something @awoods187?

Also, this row count is irrelevant for virtual tables, so I would prefer if we reported null rather than 0 for them.

pkg/sql/crdb_internal.go Outdated Show resolved Hide resolved
pkg/sql/delegate/show_tables.go Outdated Show resolved Hide resolved
pkg/sql/delegate/show_tables.go Outdated Show resolved Hide resolved
, (
SELECT estimated_row_count
FROM crdb_internal.table_row_statistics
WHERE table_id = pc.oid::int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this handle when the requested table ID doesn't exist in the stats table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate, please.
Currently, If there's no requested table ID in the stats table then we return NULL.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if background stats collection could result in these test's row counts flaking.

Yes, it could. I think we currently have automatic stats collection disabled for logic tests, but I think the hope was that we'd be able to re-enable it eventually. You should probably explicitly disable automatic stats collection with SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false in any file where an automatic stats run could cause flakiness.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @ekalinin, @knz, and @rohany)

@ekalinin ekalinin force-pushed the show-tables-add-rows-estimate branch from 28bac8f to 80876f5 Compare August 5, 2020 15:42
@ekalinin
Copy link
Contributor Author

ekalinin commented Aug 5, 2020

Also, what are the thoughts on displaying this row count only if requested, like SHOW TABLES ... WITH ROW COUNT or something?

Yeah, good point. Will update parser & ShowTables struct.

Also, this row count is irrelevant for virtual tables, so I would prefer if we reported null rather than 0 for them.

Done.

@rohany
Copy link
Contributor

rohany commented Aug 5, 2020

Yeah, good point. Will update parser & ShowTables struct.

Hold off on updating until we decide whether we should do this or not -- it's a good amount of work to be going back and forth.

@ekalinin
Copy link
Contributor Author

ekalinin commented Aug 5, 2020

Yeah, good point. Will update parser & ShowTables struct.

Hold off on updating until we decide whether we should do this or not -- it's a good amount of work to be going back and forth.

Ok :)

@ekalinin ekalinin force-pushed the show-tables-add-rows-estimate branch from 80876f5 to 45f12e2 Compare August 6, 2020 10:19
@ekalinin ekalinin requested a review from a team as a code owner August 6, 2020 10:19
@ekalinin
Copy link
Contributor Author

ekalinin commented Aug 6, 2020

Rebased. Fixed some tests.

Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also improve the release note here to say the new column name, and how it is calculated (for the docs team to document).

I'd also like you to add a test where you select from this new table and ensure that only tables the current user has permission to see show up

tableID := tree.DInt(table.ID)
rowCount := tree.DNull
// For Virtual Tables report NULL row count.
if !table.IsVirtualTable() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do virtual tables show up in the stats? I think we also want to just default row count to NULL if we cannot find the table in statMap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, virtual tables will show up in crdb_internal.table_row_statistics (even if there's no any rows for it in system.table_statistics).
All virtual tables will have NULL number of rows as you asked here:

Also, this row count is irrelevant for virtual tables, so I would prefer if we reported null rather than 0 for them.

@ekalinin ekalinin force-pushed the show-tables-add-rows-estimate branch from 45f12e2 to 36f161c Compare August 6, 2020 19:57
@ekalinin
Copy link
Contributor Author

ekalinin commented Aug 6, 2020

I would also improve the release note here to say the new column name, and how it is calculated (for the docs team to document).

Done.

I'd also like you to add a test where you select from this new table and ensure that only tables the current user has permission to see show up

Done:

Test user can see only virtual tables (estimated_row_count is NULL) and tables user has access to (estimated_row_count >= 0).

@ekalinin ekalinin force-pushed the show-tables-add-rows-estimate branch 4 times, most recently from 9e8b49d to 2118a44 Compare August 8, 2020 12:40
@ekalinin
Copy link
Contributor Author

ekalinin commented Aug 8, 2020

This PR still has one failed test:

------- Stdout: -------
=== RUN   TestTenantLogic/3node-tenant/table
=== PAUSE TestTenantLogic/3node-tenant/table
=== CONT  TestTenantLogic/3node-tenant/table
        --- FAIL: TestTenantLogic/3node-tenant/table (9.41s)
            logic.go:2308: 
                
                ../../sql/logictest/testdata/logic_test/table:652: 
                expected success, but found
                (XX000) internal error: unexpected concurrency for a flow that was forced to be planned locally
                distsql_running.go:415: in Run()
                DETAIL: stack trace:
                github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:415: Run()
                github.com/cockroachdb/cockroach/pkg/sql/distsql_plan_stats.go:272: planAndRunCreateStats()
                github.com/cockroachdb/cockroach/pkg/sql/create_stats.go:502: func2()
                github.com/cockroachdb/cockroach/pkg/kv/db.go:707: func1()
                github.com/cockroachdb/cockroach/pkg/kv/txn.go:811: exec()
                github.com/cockroachdb/cockroach/pkg/kv/db.go:706: Txn()
                github.com/cockroachdb/cockroach/pkg/sql/create_stats.go:490: Resume()
                github.com/cockroachdb/cockroach/pkg/jobs/registry.go:851: stepThroughStateMachine()
                github.com/cockroachdb/cockroach/pkg/jobs/registry.go:987: func1()
                github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:347: func1()
                runtime/asm_amd64.s:1357: goexit()
                
                NOTE: internal errors may have more details in logs. Use -show-logs.
            logic.go:2644: 
                ../../sql/logictest/testdata/logic_test/table:652: error while processing
            logic.go:2644: pq: internal error: unexpected concurrency for a flow that was forced to be planned locally

It failed here:

And looks like it related to this issue:

@rohany
Copy link
Contributor

rohany commented Aug 10, 2020

Oh, it's started to fail because you've added a stats collection run. Just add a # LogicTest: !3node-tenant(50049) to the top of the table logictest file.

This change adds column "estimated_row_count" into result of the "SHOW TABLES"
to show estimated (not real) number of rows.

Besides that the body of the `delegator.delegateShowTables` was a bit
simplified (we do not need two almost similar versions of the query) and
new virtual table `crdb_internal.table_row_statistics` was added (to
show stats for non-root users).

Release note (sql change): This change modifies SHOW TABLES to return
estimates number of rows. New column's name is `estimated_row_count`.
Number of rows is taken from `system.table_statistics` table (via
`crdb_internal.table_row_statistics` which shows only tables accessible
for current user).
@ekalinin ekalinin force-pushed the show-tables-add-rows-estimate branch from 2118a44 to a9fbed6 Compare August 10, 2020 16:24
@ekalinin
Copy link
Contributor Author

Oh, it's started to fail because you've added a stats collection run. Just add a # LogicTest: !3node-tenant(50049) to the top of the table logictest file.

Thanks! Added.

@rohany
Copy link
Contributor

rohany commented Aug 10, 2020

This looks good to me! Thanks for your contribution.

bors r=rohany

@craig
Copy link
Contributor

craig bot commented Aug 10, 2020

Build succeeded:

@craig craig bot merged commit 82d1347 into cockroachdb:master Aug 10, 2020
@ekalinin ekalinin deleted the show-tables-add-rows-estimate branch August 11, 2020 11:32
@Amruta-Ranade
Copy link
Contributor

@ekalinin Thank you for contributing to CockroachDB this year. As a token of our appreciation, we would like to send you a gift. Please DM me on our community Slack @amruta so I can send you a link. (If you don’t want a gift, we also have a charitable contribution choice.) All orders need to be in by December 13, so please contact me as soon as possible :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-community Originated from the community X-blathers-triaged blathers was able to find an owner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: add row count estimates to show tables
6 participants