Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: old value for cluster setting sql.defaults.vectorize can prevent login after upgrade to 24.2 #133278

Closed
michae2 opened this issue Oct 23, 2024 · 1 comment · Fixed by #133281
Assignees
Labels
branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-1 Issues/test failures with a fix SLA of 1 month release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-queries SQL Queries Team

Comments

@michae2
Copy link
Collaborator

michae2 commented Oct 23, 2024

(Thanks to @spilchen for figuring this all out!)

In v20.2 we introduced the value 201auto for cluster setting sql.defaults.vectorize and session variable vectorize. This value was deprecated in v21.1, instead mapping to on. However, the remapping meant that any clusters running v21.1 would have the enum value '1' stored for 'on' (which in v20.2 was the enum value for '201auto'):

roachprod@localhost:26257/defaultdb> select version();
                                          version
--------------------------------------------------------------------------------------------
  CockroachDB CCL v21.1.0 (x86_64-unknown-linux-gnu, built 2021/05/17 13:49:40, go1.15.11)
(1 row)

roachprod@localhost:26257/defaultdb> set cluster setting sql.defaults.vectorize = 'on';
SET CLUSTER SETTING

roachprod@localhost:26257/defaultdb> select * from system.settings where name = 'sql.defaults.vectorize';
           name          | value |             lastUpdated             | valueType
-------------------------+-------+-------------------------------------+------------
  sql.defaults.vectorize | 1     | 2024-10-23 11:22:10.312679+00:00:00 | e
(1 row)

This remapping was subsequently removed in v21.2, and the enum value of '1' was no longer possible. The enum value '2' was stored for 'on'. Later, the enum value '0' was stored for 'on'.

In v24.2 we introduced stricter validation of enum settings. This validation throws an error when it encounters the enum value '1' for vectorize.

# Server version: CockroachDB CCL v24.2.3 (aarch64-apple-darwin21.2, built 2024/09/23 22:27:07, go1.22.5 X:nocoverageredesign) (same version as client)
# Cluster ID: 5c81eb9b-8873-4d27-aa7f-f7f21f29ed38
# Organization: Cockroach Demo
#
# Enter \? for a brief introduction.
#
demo@127.0.0.1:26257/defaultdb> UPSERT INTO system.settings (name, value, "valueType") VALUES ('sql.defaults.vectorize', '1', 'e');
INSERT 0 1

demo@127.0.0.1:26257/defaultdb> SHOW CLUSTER SETTING sql.defaults.vectorize;
  sql.defaults.vectorize
--------------------------
  1
(1 row)

demo@127.0.0.1:26257/defaultdb> SET vectorize = DEFAULT;
ERROR: invalid value for parameter "vectorize": "unknown(1)"
SQLSTATE: 22023
HINT: Available values: off,on,experimental_always

So any clusters that had SET CLUSTER SETTING sql.defaults.vectorize = 'on'; in v21.1 will throw an error when starting new sessions after upgrading to v24.2.

Jira issue: CRDB-43530

@michae2 michae2 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs T-sql-queries SQL Queries Team branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 labels Oct 23, 2024
@michae2 michae2 self-assigned this Oct 23, 2024
@github-project-automation github-project-automation bot moved this to Triage in SQL Queries Oct 23, 2024
@michae2 michae2 added the P-1 Issues/test failures with a fix SLA of 1 month label Oct 23, 2024
@michae2 michae2 moved this from Triage to Active in SQL Queries Oct 23, 2024
michae2 added a commit to michae2/cockroach that referenced this issue Oct 23, 2024
Fixes: cockroachdb#133278

Release note (bug fix): This commit fixes a bug which causes new
connections to fail with the following error after upgrading to v24.2:

```
ERROR: invalid value for parameter "vectorize": "unknown(1)"
SQLSTATE: 22023
HINT: Available values: off,on,experimental_always
```

In order to hit this bug, the cluster must have:
1. been on version v21.1 at some point in the past
2. run `SET CLUSTER SETTING sql.defaults.vectorize = 'on';` on v21.1
3. not set sql.defaults.vectorize after upgrading past v21.1
4. upgraded all the way to v24.2

The conditions required for this bug can be detected using:

```
SELECT * FROM system.settings WHERE name = 'sql.defaults.vectorize';
```

If the value is '1', the following statement should be run to fix it
before upgrading to v24.2:

```
RESET CLUSTER SETTING sql.defaults.vectorize;
```

This commit fixes the bug by making '1' a legal value for
sql.defaults.vectorize again (mapping to 'on').
michae2 added a commit to michae2/cockroach that referenced this issue Oct 23, 2024
Fixes: cockroachdb#133278

Release note (bug fix): This commit fixes a bug which causes new
connections to fail with the following error after upgrading to v24.2:

```
ERROR: invalid value for parameter "vectorize": "unknown(1)"
SQLSTATE: 22023
HINT: Available values: off,on,experimental_always
```

In order to hit this bug, the cluster must have:
1. been on version v21.1 at some point in the past
2. run `SET CLUSTER SETTING sql.defaults.vectorize = 'on';` on v21.1
3. not set sql.defaults.vectorize after upgrading past v21.1
4. upgraded all the way to v24.2

The conditions required for this bug can be detected using:

```
SELECT * FROM system.settings WHERE name = 'sql.defaults.vectorize';
```

If the value is '1', the following statement should be run to fix it
before upgrading to v24.2:

```
RESET CLUSTER SETTING sql.defaults.vectorize;
```

This commit fixes the bug by making '1' a legal value for
sql.defaults.vectorize again (mapping to 'on').
@michae2 michae2 added GA-blocker branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. and removed GA-blocker labels Oct 23, 2024
@michae2
Copy link
Collaborator Author

michae2 commented Oct 23, 2024

Marking as release-blocker so that we don't cut v24.2.5 or v24.3.0-beta.3 without the fix.

michae2 added a commit to michae2/cockroach that referenced this issue Oct 23, 2024
Fixes: cockroachdb#133278

Release note (bug fix): This commit fixes a bug which causes new
connections to fail with the following error after upgrading to v24.2:

```
ERROR: invalid value for parameter "vectorize": "unknown(1)"
SQLSTATE: 22023
HINT: Available values: off,on,experimental_always
```

In order to hit this bug, the cluster must have:
1. been on version v21.1 at some point in the past
2. run `SET CLUSTER SETTING sql.defaults.vectorize = 'on';` on v21.1
3. not set sql.defaults.vectorize after upgrading past v21.1
4. upgraded all the way to v24.2

The conditions required for this bug can be detected using:

```
SELECT * FROM system.settings WHERE name = 'sql.defaults.vectorize';
```

If the value is '1', the following statement should be run to fix it
before upgrading to v24.2:

```
RESET CLUSTER SETTING sql.defaults.vectorize;
```

This commit fixes the bug by making '1' a legal value for
sql.defaults.vectorize again (mapping to 'on').
craig bot pushed a commit that referenced this issue Oct 24, 2024
131624: roachtest: add `failover` variants with leader leases r=nvanbenschoten a=nvanbenschoten

Part of #132762.

Leader leases have different availability properties than epoch leases under most failure modes. This patch adds failover test variants that use leader leases where possible.

Initial test results:

| test                                         | lease=epoch (ms) | lease=expiration (ms) | lease=leader (ms) | parity with expiration |
|:---------------------------------------------|-----------------:|----------------------:|------------------:|:----------------------:|
| failover/chaos/read-only                     | 60,129           | 18,253                | 60,129            | ✔                      |
| failover/chaos/read-write                    | 60,129           | 20,401                | 60,129            | ❌❌❌                    |
| failover/liveness/blackhole                  | 9,663            | 369                   | 335               | ✔                      |
| failover/liveness/blackhole-recv             | 11,274           | 402                   | 369               | ✔                      |
| failover/liveness/blackhole-send             | 9,663            | 385                   | 469               | ✔                      |
| failover/liveness/crash                      | 8,053            | 352                   | 318               | ✔                      |
| failover/liveness/deadlock                   | 24,696           | 385                   | 369               | ✔                      |
| failover/liveness/disk-stall                 | 26,843           | 369                   | 419               | ✔                      |
| failover/liveness/pause                      | 10,200           | 385                   | 436               | ✔                      |
| failover/non-system/blackhole                | 7,247            | 7,516                 | 15,032            | ❌❌                     |
| failover/non-system/blackhole-recv           | 12,348           | 10,737                | 18,253            | ❌❌                     |
| failover/non-system/blackhole-send           | 6,979            | 6,979                 | 8,053             | ❌                      |
| failover/non-system/crash                    | 7,247            | 6,979                 | 9,126             | ❌                      |
| failover/non-system/deadlock                 | 60,129           | 60,129                | 60,129            | ✔                      |
| failover/non-system/disk-stall               | 22,548           | 22,548                | 25,769            | ❌                      |
| failover/non-system/pause                    | 7,247            | 7,247                 | 9,126             | ❌                      |
| failover/partial/lease-gateway               | 8,589            | 19,327 [^1]           | 60,129            | ❌❌❌                    |
| failover/partial/lease-leader                | 60,129           | 22,549 [^2]           | 31,139 [^2]       | ❌❌                     |
| failover/partial/lease-liveness              | 8,589            | 301                   | 318               | ✔                      |
| failover/system-non-liveness/blackhole       | 369              | 402                   | 352               | ✔                      |
| failover/system-non-liveness/blackhole-recv  | 335              | 285                   | 318               | ✔                      |
| failover/system-non-liveness/blackhole-send  | 402              | 419                   | 335               | ✔                      |
| failover/system-non-liveness/crash           | 419              | 301                   | 453               | ✔                      |
| failover/system-non-liveness/deadlock        | 369              | 352                   | 402               | ✔                      |
| failover/system-non-liveness/disk-stall      | 402              | 318                   | 453               | ✔                      |
| failover/system-non-liveness/pause           | 369              | 385                   | 335               | ✔                      |

_note: because of the way the test measures pMax, anything under 1,000ms is essentially "no impact"_

**Key _(comparing leader vs. expiration)_**:
✔ = parity
❌ = minor regression
❌❌ = major regression
❌❌❌ = unavailability

[^1]: I don't understand why expiration-based lease perform worse than epoch-based leases on this test.
[^2]: With #133214.

Epic: none
Release note: None

133214: roachtest: enable DistSender circuit breakers in failover/partial/lease-leader r=nvanbenschoten a=nvanbenschoten

DistSender circuit breakers are useful in this test to avoid artificially inflated latencies due to the way the test measures failover time (pMax, no timeouts). Without circuit breakers, a request stuck on the partitioned leaseholder will get blocked indefinitely, despite the range recovering on the other side of the partition and becoming available to all new traffic. As a result, the test won't differentiate between temporary and permanent range unavailability. We have other tests which demonstrate the benefit of DistSender circuit breakers (especially when applications do not use statement timeouts), so we don't need to test them here.

With this change, the test's meaured failover time drops from:

| lease=epoch (ms) | lease=expiration (ms) | lease=leader (ms) |
|-----------------:|----------------------:|------------------:|
| 60,129           | 60,129                | 60,129            |

down to:

| lease=epoch (ms) | lease=expiration (ms) | lease=leader (ms) |
|-----------------:|----------------------:|------------------:|
| 60,129           | 22,549                | 31,139            |

This is because the circuit breakers place a 10s timeout on all KV requests, so no request gets stuck indefinitely. Notice that expiration and leader leases now recover, while epoch leases remain unavailable indefinitely.

Epic: None
Release note: None

133281: sql: allow old enum value '1' for sql.defaults.vectorize r=mw5h a=michae2

Fixes: #133278

Release note (bug fix): This commit fixes a bug which causes new connections to fail with the following error after upgrading to v24.2:

```
ERROR: invalid value for parameter "vectorize": "unknown(1)"
SQLSTATE: 22023
HINT: Available values: off,on,experimental_always
```

In order to hit this bug, the cluster must have:
1. been on version v21.1 at some point in the past
2. run `SET CLUSTER SETTING sql.defaults.vectorize = 'on';` on v21.1
3. not set sql.defaults.vectorize after upgrading past v21.1
4. upgraded all the way to v24.2

The conditions required for this bug can be detected using:

```
SELECT * FROM system.settings WHERE name = 'sql.defaults.vectorize';
```

If the value is '1', the following statement should be run to fix it before upgrading to v24.2:

```
RESET CLUSTER SETTING sql.defaults.vectorize;
```

This commit fixes the bug by making '1' a legal value for sql.defaults.vectorize again (mapping to 'on').

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>
@craig craig bot closed this as completed in c756c80 Oct 24, 2024
@github-project-automation github-project-automation bot moved this from Active to Done in SQL Queries Oct 24, 2024
blathers-crl bot pushed a commit that referenced this issue Oct 24, 2024
Fixes: #133278

Release note (bug fix): This commit fixes a bug which causes new
connections to fail with the following error after upgrading to v24.2:

```
ERROR: invalid value for parameter "vectorize": "unknown(1)"
SQLSTATE: 22023
HINT: Available values: off,on,experimental_always
```

In order to hit this bug, the cluster must have:
1. been on version v21.1 at some point in the past
2. run `SET CLUSTER SETTING sql.defaults.vectorize = 'on';` on v21.1
3. not set sql.defaults.vectorize after upgrading past v21.1
4. upgraded all the way to v24.2

The conditions required for this bug can be detected using:

```
SELECT * FROM system.settings WHERE name = 'sql.defaults.vectorize';
```

If the value is '1', the following statement should be run to fix it
before upgrading to v24.2:

```
RESET CLUSTER SETTING sql.defaults.vectorize;
```

This commit fixes the bug by making '1' a legal value for
sql.defaults.vectorize again (mapping to 'on').
blathers-crl bot pushed a commit that referenced this issue Oct 24, 2024
Fixes: #133278

Release note (bug fix): This commit fixes a bug which causes new
connections to fail with the following error after upgrading to v24.2:

```
ERROR: invalid value for parameter "vectorize": "unknown(1)"
SQLSTATE: 22023
HINT: Available values: off,on,experimental_always
```

In order to hit this bug, the cluster must have:
1. been on version v21.1 at some point in the past
2. run `SET CLUSTER SETTING sql.defaults.vectorize = 'on';` on v21.1
3. not set sql.defaults.vectorize after upgrading past v21.1
4. upgraded all the way to v24.2

The conditions required for this bug can be detected using:

```
SELECT * FROM system.settings WHERE name = 'sql.defaults.vectorize';
```

If the value is '1', the following statement should be run to fix it
before upgrading to v24.2:

```
RESET CLUSTER SETTING sql.defaults.vectorize;
```

This commit fixes the bug by making '1' a legal value for
sql.defaults.vectorize again (mapping to 'on').
michae2 added a commit that referenced this issue Oct 24, 2024
Fixes: #133278

Release note (bug fix): This commit fixes a bug which causes new
connections to fail with the following error after upgrading to v24.2:

```
ERROR: invalid value for parameter "vectorize": "unknown(1)"
SQLSTATE: 22023
HINT: Available values: off,on,experimental_always
```

In order to hit this bug, the cluster must have:
1. been on version v21.1 at some point in the past
2. run `SET CLUSTER SETTING sql.defaults.vectorize = 'on';` on v21.1
3. not set sql.defaults.vectorize after upgrading past v21.1
4. upgraded all the way to v24.2

The conditions required for this bug can be detected using:

```
SELECT * FROM system.settings WHERE name = 'sql.defaults.vectorize';
```

If the value is '1', the following statement should be run to fix it
before upgrading to v24.2:

```
RESET CLUSTER SETTING sql.defaults.vectorize;
```

This commit fixes the bug by making '1' a legal value for
sql.defaults.vectorize again (mapping to 'on').
michae2 added a commit that referenced this issue Oct 24, 2024
Fixes: #133278

Release note (bug fix): This commit fixes a bug which causes new
connections to fail with the following error after upgrading to v24.2:

```
ERROR: invalid value for parameter "vectorize": "unknown(1)"
SQLSTATE: 22023
HINT: Available values: off,on,experimental_always
```

In order to hit this bug, the cluster must have:
1. been on version v21.1 at some point in the past
2. run `SET CLUSTER SETTING sql.defaults.vectorize = 'on';` on v21.1
3. not set sql.defaults.vectorize after upgrading past v21.1
4. upgraded all the way to v24.2

The conditions required for this bug can be detected using:

```
SELECT * FROM system.settings WHERE name = 'sql.defaults.vectorize';
```

If the value is '1', the following statement should be run to fix it
before upgrading to v24.2:

```
RESET CLUSTER SETTING sql.defaults.vectorize;
```

This commit fixes the bug by making '1' a legal value for
sql.defaults.vectorize again (mapping to 'on').
michae2 added a commit that referenced this issue Oct 24, 2024
Fixes: #133278

Release note (bug fix): This commit fixes a bug which causes new
connections to fail with the following error after upgrading to v24.2:

```
ERROR: invalid value for parameter "vectorize": "unknown(1)"
SQLSTATE: 22023
HINT: Available values: off,on,experimental_always
```

In order to hit this bug, the cluster must have:
1. been on version v21.1 at some point in the past
2. run `SET CLUSTER SETTING sql.defaults.vectorize = 'on';` on v21.1
3. not set sql.defaults.vectorize after upgrading past v21.1
4. upgraded all the way to v24.2

The conditions required for this bug can be detected using:

```
SELECT * FROM system.settings WHERE name = 'sql.defaults.vectorize';
```

If the value is '1', the following statement should be run to fix it
before upgrading to v24.2:

```
RESET CLUSTER SETTING sql.defaults.vectorize;
```

This commit fixes the bug by making '1' a legal value for
sql.defaults.vectorize again (mapping to 'on').
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-24.2 Used to mark GA and release blockers, technical advisories, and bugs for 24.2 branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs P-1 Issues/test failures with a fix SLA of 1 month release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant