Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: Internal sessions exempt from disallow_full_table_scans setting #137681

Merged

Conversation

spilchen
Copy link
Contributor

Previously, enabling the cluster setting sql.defaults.disallow_full_table_scans.enabled prevented the creation of indexes due to the internal table scans required during the process. This change ensures that the setting is bypassed for internal operations, while still applying to user queries.

Epic: None
Closes: #137404
Release note (bug fix): Internal scans are now exempt from the sql.defaults.disallow_full_table_scans.enabled setting, allowing index creation even when the cluster setting is active.

@spilchen spilchen requested a review from a team December 18, 2024 13:06
@spilchen spilchen self-assigned this Dec 18, 2024
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rafiss rafiss added backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2 backport-24.3.x Flags PRs that need to be backported to 24.3 labels Dec 18, 2024
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@@ -440,6 +440,27 @@ CREATE INVERTED INDEX ON opclasses(c DESC)
statement ok
CREATE INVERTED INDEX ON opclasses(a DESC, c)

# Regression test for GH issue #137404
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests are great!

we could add an additional one here, similar to what we have for statement_timeout:

# Ensure that StmtTimeout for a session-independent IE cannot be overriden.
subtest stmt_timeout
statement ok
SET CLUSTER SETTING sql.defaults.statement_timeout = '36000000ms';
statement ok
SET statement_timeout = '39600000ms';
query T
SELECT crdb_internal.execute_internally('SHOW statement_timeout;');
----
0
# Ensure that a session-bound IE still inherits from session vars, if available;
# otherwise, it inherits from the cluster setting.
query T
SELECT crdb_internal.execute_internally('SHOW statement_timeout;', true);
----
39600000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Thanks.

Previously, enabling the cluster setting
sql.defaults.disallow_full_table_scans.enabled prevented the creation of
indexes due to the internal table scans required during the process.
This change ensures that the setting is bypassed for internal
operations, while still applying to user queries.

Epic: None
Closes: cockroachdb#137404
Release note (bug fix): Internal scans are now exempt from the
sql.defaults.disallow_full_table_scans.enabled setting, allowing index
creation even when the cluster setting is active.
@spilchen spilchen force-pushed the gh-137404/241218/0757/disallow-full-ts-fix branch from 117a82b to 0024f4f Compare December 18, 2024 20:01
@spilchen spilchen requested a review from a team as a code owner December 18, 2024 20:01
@spilchen spilchen requested review from michae2 and removed request for a team December 18, 2024 20:01
Copy link
Contributor Author

@spilchen spilchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @rafiss)

@@ -440,6 +440,27 @@ CREATE INVERTED INDEX ON opclasses(c DESC)
statement ok
CREATE INVERTED INDEX ON opclasses(a DESC, c)

# Regression test for GH issue #137404
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. Thanks.

@spilchen
Copy link
Contributor Author

TFTR!

bors r+

@michae2 michae2 requested a review from rafiss December 18, 2024 21:44
Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I thought this check for executorType meant that internal executor was already exempt from disallow_full_table_scans?

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @spilchen)

@rafiss
Copy link
Collaborator

rafiss commented Dec 18, 2024

I didn't know about that exclusion. Since we have a simple test case, we should check with a debugger to see how it's possible to reach this condition for the verify-idx operation.

@spilchen
Copy link
Contributor Author

We never reach line 3195:

if flags.IsSet(planFlagContainsFullIndexScan) || flags.IsSet(planFlagContainsFullTableScan) {
if ex.executorType == executorTypeExec && planner.EvalContext().SessionData().DisallowFullTableScans {

The previous condition is false for CREATE INDEX. The only flags set are:

  • planFlagIsDDL
  • planFlagContainsMutation

Should I revert this fix and work on getting the planFlagContainsFullTableScan flag set for CREATE INDEX instead?

@rafiss
Copy link
Collaborator

rafiss commented Dec 19, 2024

Should I revert this fix and work on getting the planFlagContainsFullTableScan flag set for CREATE INDEX instead?

I'd prefer to leave to the SQL Queries team to decide how to tackle that (cc @michae2). In the meantime, I think the fix you've made here is a robust way of addressing the issue, and it uses the same pattern we have for statement_timeout so i don't think we need to change it.

@michae2
Copy link
Collaborator

michae2 commented Dec 19, 2024

We never reach line 3195:

if flags.IsSet(planFlagContainsFullIndexScan) || flags.IsSet(planFlagContainsFullTableScan) {
if ex.executorType == executorTypeExec && planner.EvalContext().SessionData().DisallowFullTableScans {

The previous condition is false for CREATE INDEX. The only flags set are:

  • planFlagIsDDL
  • planFlagContainsMutation

Should I revert this fix and work on getting the planFlagContainsFullTableScan flag set for CREATE INDEX instead?

If we never reach line 3195, then how is disallow_full_table_scans preventing the creation of indexes?

@spilchen
Copy link
Contributor Author

Without this fix, CREATE INDEX fails at this point:

err = pgerror.Newf(pgcode.WrongObjectType,
"index \"%s\" cannot be used for this query", idx.Name())

It's driving a full index scan of the form SELECT count(1) FROM [%d AS t]@[%d] from here:

cockroach/pkg/sql/backfill.go

Lines 2104 to 2111 in e6a5104

query := fmt.Sprintf(`SELECT count(1) FROM [%d AS t]@[%d]`, desc.GetID(), idx.GetID())
// If the index is a partial index the predicate must be added
// as a filter to the query to force scanning the index.
if idx.IsPartial() {
query = fmt.Sprintf(`%s WHERE %s`, query, idx.GetPredicate())
}
return txn.WithSyntheticDescriptors([]catalog.Descriptor{desc}, func() error {
row, err := txn.QueryRowEx(ctx, "verify-idx-count", txn.KV(), execOverride, query)

In conn_executor_exec.go, it's really close to the DisallowFullTableScan check you mentioned. It's at line 3187 here.

if err := planner.makeOptimizerPlan(ctx); err != nil {
log.VEventf(ctx, 1, "optimizer plan failed: %v", err)
return err
}
flags := planner.curPlan.flags
if flags.IsSet(planFlagContainsFullIndexScan) || flags.IsSet(planFlagContainsFullTableScan) {
if ex.executorType == executorTypeExec && planner.EvalContext().SessionData().DisallowFullTableScans {

I don't believe the statements in conn_executor_exec.go can be reordered, as the logic depends on the flags from the planner, which are still being built.

@michae2
Copy link
Collaborator

michae2 commented Dec 23, 2024

I see, thanks. I didn't realized we also checked disallow_full_table_scans in execbuild when the index is forced. And that check doesn't look at the executorType.

I'm going to open a follow-up PR removing the executorType part, since this fix is cleaner and the executorType part doesn't work in execbuild anyway.

michae2 added a commit to michae2/cockroach that referenced this pull request Dec 23, 2024
Now that cockroachdb#137681 explicitly turns off `disallow_full_table_scans` for
the internal executor, we no longer need to look at executorType in the
connExecutor check. This makes the connExecutor check match the
execBuilder check added in cockroachdb#71317.

Informs: cockroachdb#137404

Release note: None
craig bot pushed a commit that referenced this pull request Dec 23, 2024
137949: sql: remove executorType condition from disallow_full_table_scans check r=yuzefovich,mgartner a=michae2

Now that #137681 explicitly turns off `disallow_full_table_scans` for the internal executor, we no longer need to look at executorType in the connExecutor check. This makes the connExecutor check match the execBuilder check added in #71317.

Informs: #137404

Release note: None

Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>
michae2 added a commit to michae2/cockroach that referenced this pull request Dec 24, 2024
We have various checks for disallow_full_table_scans:
1. in connExecutor we fail the statement with an error if it uses a full
   scan
2. in execbuild we fail the statement with an error if there is an index
   hint that causes a full scan
3. in the coster we add hugeCost to full scans, to try and avoid them

We don't want disallow_full_table_scans to apply to anything done by
internal executors. (1) was already checking whether this connExecutor
was internal. (2) and (3) were not, which led to cockroachdb#137404. cockroachdb#137681 fixed
this by setting disallow_full_table_scans in `NewInternalSessionData`,
but we only use `NewInternalSessionData` for some uses of the internal
executor.

This commit explicitly checks `SessionData().Internal` in (2) and (3) to
match (1), so that we don't get any of the disallow_full_table_scans
behavior for any use of the internal executor, including populating
virtual tables.

This commit also adds `SessionData().Internal` to the memo staleness
check.

This behavior can be observed with the following diff, which adds an
index hint to the `crdb_internal.system_jobs` virtual table SQL:

```
diff --git a/pkg/sql/crdb_internal.go b/pkg/sql/crdb_internal.go
index 80a5342b56a..99ccb8f1611 100644
--- a/pkg/sql/crdb_internal.go
+++ b/pkg/sql/crdb_internal.go
@@ -975,7 +975,7 @@ SELECT
 DISTINCT(id), status, created, payload.value AS payload, progress.value AS progress,
 created_by_type, created_by_id, claim_session_id, claim_instance_id, num_runs, last_run, job_type
 FROM
-system.jobs AS j
+system.jobs@jobs_status_created_idx AS j
 LEFT JOIN system.job_info AS progress ON j.id = progress.job_id AND progress.info_key = 'legacy_progress'
 INNER JOIN system.job_info AS payload ON j.id = payload.job_id AND payload.info_key = 'legacy_payload'
 `
```

With this diff, the following SQL hits (2) and fails with an error, even
though it uses the internal executor and thus should not error:

```
SET disallow_full_table_scans = on;
SELECT * FROM crdb_internal.jobs;
```

With this commit we no longer fail, matching the behavior without the
index hint.

Informs: cockroachdb#137404
Informs: cockroachdb#123783

Release note: None
michae2 added a commit to michae2/cockroach that referenced this pull request Dec 28, 2024
We have various checks for disallow_full_table_scans:
1. in connExecutor we fail the statement with an error if it uses a full
   scan
2. in execbuild we fail the statement with an error if there is an index
   hint that causes a full scan
3. in the coster we add hugeCost to full scans, to try and avoid them

We don't want disallow_full_table_scans to apply to anything done by
internal executors. (1) was already checking whether this connExecutor
was internal. (2) and (3) were not, which led to cockroachdb#137404. cockroachdb#137681 fixed
this by setting disallow_full_table_scans in `NewInternalSessionData`,
but we only use `NewInternalSessionData` for some uses of the internal
executor.

This commit explicitly checks `SessionData().Internal` in (2) and (3) to
match (1), so that we don't get any of the disallow_full_table_scans
behavior for any use of the internal executor, including populating
virtual tables.

This commit also adds `SessionData().Internal` to the memo staleness
check.

This behavior can be observed with the following diff, which adds an
index hint to the `crdb_internal.system_jobs` virtual table SQL:

```
diff --git a/pkg/sql/crdb_internal.go b/pkg/sql/crdb_internal.go
index 80a5342b56a..99ccb8f1611 100644
--- a/pkg/sql/crdb_internal.go
+++ b/pkg/sql/crdb_internal.go
@@ -975,7 +975,7 @@ SELECT
 DISTINCT(id), status, created, payload.value AS payload, progress.value AS progress,
 created_by_type, created_by_id, claim_session_id, claim_instance_id, num_runs, last_run, job_type
 FROM
-system.jobs AS j
+system.jobs@jobs_status_created_idx AS j
 LEFT JOIN system.job_info AS progress ON j.id = progress.job_id AND progress.info_key = 'legacy_progress'
 INNER JOIN system.job_info AS payload ON j.id = payload.job_id AND payload.info_key = 'legacy_payload'
 `
```

With this diff, the following SQL hits (2) and fails with an error, even
though it uses the internal executor and thus should not error:

```
SET disallow_full_table_scans = on;
SELECT * FROM crdb_internal.jobs;
```

With this commit we no longer fail, matching the behavior without the
index hint.

Informs: cockroachdb#137404
Informs: cockroachdb#123783

Release note: None
michae2 added a commit to michae2/cockroach that referenced this pull request Jan 2, 2025
We have various checks for disallow_full_table_scans:
1. in connExecutor we fail the statement with an error if it uses a full
   scan
2. in execbuild we fail the statement with an error if there is an index
   hint that causes a full scan
3. in the coster we add hugeCost to full scans, to try and avoid them

We don't want disallow_full_table_scans to apply to anything done by
internal executors. (1) was already checking whether this connExecutor
was internal. (2) and (3) were not, which led to cockroachdb#137404. cockroachdb#137681 fixed
this by setting disallow_full_table_scans in `NewInternalSessionData`,
but we only use `NewInternalSessionData` for some uses of the internal
executor.

This commit explicitly checks `SessionData().Internal` in (2) and (3) to
match (1), so that we don't get any of the disallow_full_table_scans
behavior for any use of the internal executor, including populating
virtual tables.

This commit also adds `SessionData().Internal` to the memo staleness
check.

This behavior can be observed with the following diff, which adds an
index hint to the `crdb_internal.system_jobs` virtual table SQL:

```
diff --git a/pkg/sql/crdb_internal.go b/pkg/sql/crdb_internal.go
index 80a5342b56a..99ccb8f1611 100644
--- a/pkg/sql/crdb_internal.go
+++ b/pkg/sql/crdb_internal.go
@@ -975,7 +975,7 @@ SELECT
 DISTINCT(id), status, created, payload.value AS payload, progress.value AS progress,
 created_by_type, created_by_id, claim_session_id, claim_instance_id, num_runs, last_run, job_type
 FROM
-system.jobs AS j
+system.jobs@jobs_status_created_idx AS j
 LEFT JOIN system.job_info AS progress ON j.id = progress.job_id AND progress.info_key = 'legacy_progress'
 INNER JOIN system.job_info AS payload ON j.id = payload.job_id AND payload.info_key = 'legacy_payload'
 `
```

With this diff, the following SQL hits (2) and fails with an error, even
though it uses the internal executor and thus should not error:

```
SET disallow_full_table_scans = on;
SELECT * FROM crdb_internal.jobs;
```

With this commit we no longer fail, matching the behavior without the
index hint.

Informs: cockroachdb#137404
Informs: cockroachdb#123783

Release note: None
michae2 added a commit to michae2/cockroach that referenced this pull request Jan 2, 2025
We have various checks for disallow_full_table_scans:
1. in connExecutor we fail the statement with an error if it uses a full
   scan
2. in execbuild we fail the statement with an error if there is an index
   hint that causes a full scan
3. in the coster we add hugeCost to full scans, to try and avoid them

We don't want disallow_full_table_scans to apply to anything done by
internal executors. (1) was already checking whether this connExecutor
was internal. (2) and (3) were not, which led to cockroachdb#137404. cockroachdb#137681 fixed
this by setting disallow_full_table_scans in `NewInternalSessionData`,
but we only use `NewInternalSessionData` for some uses of the internal
executor.

This commit explicitly checks `SessionData().Internal` in (2) and (3) to
match (1), so that we don't get any of the disallow_full_table_scans
behavior for any use of the internal executor, including populating
virtual tables.

This commit also adds `SessionData().Internal` to the memo staleness
check.

This behavior can be observed with the following diff, which adds an
index hint to the `crdb_internal.system_jobs` virtual table SQL:

```
diff --git a/pkg/sql/crdb_internal.go b/pkg/sql/crdb_internal.go
index 80a5342b56a..99ccb8f1611 100644
--- a/pkg/sql/crdb_internal.go
+++ b/pkg/sql/crdb_internal.go
@@ -975,7 +975,7 @@ SELECT
 DISTINCT(id), status, created, payload.value AS payload, progress.value AS progress,
 created_by_type, created_by_id, claim_session_id, claim_instance_id, num_runs, last_run, job_type
 FROM
-system.jobs AS j
+system.jobs@jobs_status_created_idx AS j
 LEFT JOIN system.job_info AS progress ON j.id = progress.job_id AND progress.info_key = 'legacy_progress'
 INNER JOIN system.job_info AS payload ON j.id = payload.job_id AND payload.info_key = 'legacy_payload'
 `
```

With this diff, the following SQL hits (2) and fails with an error, even
though it uses the internal executor and thus should not error:

```
SET disallow_full_table_scans = on;
SELECT * FROM crdb_internal.jobs;
```

With this commit we no longer fail, matching the behavior without the
index hint.

Informs: cockroachdb#137404
Informs: cockroachdb#123783

Release note: None
craig bot pushed a commit that referenced this pull request Jan 3, 2025
137961: opt: don't apply disallow_full_table_scans to internal executors r=yuzefovich a=michae2

We have various checks for disallow_full_table_scans:
1. in connExecutor we fail the statement with an error if it uses a full scan
2. in execbuild we fail the statement with an error if there is an index hint that causes a full scan
3. in the coster we add hugeCost to full scans, to try and avoid them

We don't want disallow_full_table_scans to apply to anything done by internal executors. (1) was already checking whether this connExecutor was internal. (2) and (3) were not, which led to #137404. #137681 fixed this by setting disallow_full_table_scans in `NewInternalSessionData`, but we only use `NewInternalSessionData` for some uses of the internal executor.

This commit explicitly checks `SessionData().Internal` in (2) and (3) to match (1), so that we don't get any of the disallow_full_table_scans behavior for any use of the internal executor, including populating virtual tables.

This commit also adds `SessionData().Internal` to the memo staleness check.

This behavior can be observed with the following diff, which adds an index hint to the `crdb_internal.system_jobs` virtual table SQL:

```
diff --git a/pkg/sql/crdb_internal.go b/pkg/sql/crdb_internal.go
index 80a5342b56a..99ccb8f1611 100644
--- a/pkg/sql/crdb_internal.go
+++ b/pkg/sql/crdb_internal.go
`@@` -975,7 +975,7 `@@` SELECT
 DISTINCT(id), status, created, payload.value AS payload, progress.value AS progress,
 created_by_type, created_by_id, claim_session_id, claim_instance_id, num_runs, last_run, job_type
 FROM
-system.jobs AS j
+system.jobs@jobs_status_created_idx AS j
 LEFT JOIN system.job_info AS progress ON j.id = progress.job_id AND progress.info_key = 'legacy_progress'
 INNER JOIN system.job_info AS payload ON j.id = payload.job_id AND payload.info_key = 'legacy_payload'
 `
```

With this diff, the following SQL hits (2) and fails with an error, even though it uses the internal executor and thus should not error:

```
SET disallow_full_table_scans = on;
SELECT * FROM crdb_internal.jobs;
```

With this commit we no longer fail, matching the behavior without the index hint.

Informs: #137404
Informs: #123783

Release note: None

Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.2.x Flags PRs that need to be backported to 23.2. backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.2.x Flags PRs that need to be backported to 24.2 backport-24.3.x Flags PRs that need to be backported to 24.3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cluster setting sql.defaults.disallow_full_table_scans.enabled prevents index creation
4 participants