Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit column or row size upon INSERT or UPDATE #67400

Closed
3 tasks done
glennfawcett opened this issue Jul 9, 2021 · 12 comments
Closed
3 tasks done

Limit column or row size upon INSERT or UPDATE #67400

glennfawcett opened this issue Jul 9, 2021 · 12 comments
Assignees
Labels
A-sql-execution Relating to SQL execution. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@glennfawcett
Copy link

glennfawcett commented Jul 9, 2021

Is your feature request related to a problem? Please describe.
Application was inadvertently INSERTING or UPDATING JSONB columns that were 100MB or larger. When running batch operations, CockroachDB was aborting with memory errors.

Describe the solution you'd like
It would be nice if CRDB took a defensible position and allowed users to set a limit on the size of a ROW. This would have avoided problems with erroneous application issues that bloat the size of rows within the database.

Additional limits that would be useful:

  • Limit the number of rows read per query
  • Limit the number of inserts/updates in a query
  • Limit the row size per table

Describe alternatives you've considered
As a workaround for the batch situation, we had to lower the batch size only commit ONE row. This allowed us to proceed but caused processing delay to diagnosis and resolve.

gz#9005

@glennfawcett glennfawcett added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-sql-execution Relating to SQL execution. labels Jul 9, 2021
@glennfawcett glennfawcett changed the title Limit column or row size upon INSERT Limit column or row size upon INSERT or UPDATE Jul 9, 2021
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Jul 12, 2021
@rytaft
Copy link
Collaborator

rytaft commented Jul 13, 2021

This will not be possible without increasing raft command size limit (default 64MB), which limits the size of keys in a single transaction. The setting is kv.raft.command.max_size. Maybe we need to add a cluster setting that enforces a limit on a single batch? @vy-ton will follow up.

@michae2
Copy link
Collaborator

michae2 commented Jul 14, 2021

According to @sumeerbhola they are setting kv.raft.command.max_size to 1GB because of non-deterministic batching of commands.

@rytaft
Copy link
Collaborator

rytaft commented Jul 20, 2021

@glennfawcett @vy-ton -- @michae2 and I were discussing how we might implement limiting the row size, and we had some questions:

  • This limit should be on the entire row level, not the individual column family level, correct?
  • Should this limit be enforced on all INSERTs, UPDATEs, and UPSERTs?
  • What happens if a row that already exists in a table exceeds the limit (e.g., if the limit was added after the row was inserted)? Should any attempt to UPDATE that row fail, unless it reduces the size of the row below the limit?

@glennfawcett
Copy link
Author

Yes to all above...

@michae2
Copy link
Collaborator

michae2 commented Jul 20, 2021

@glennfawcett @vy-ton -- @rytaft and I were talking more and wondering, would something like this fit the bill?

ALTER TABLE x ADD CONSTRAINT x_row_size_limit CHECK (pg_column_size(x.*) < limit)

This would be a per-table limit that would be enforced on all modifications. There would be a validation step when first applied to ensure all existing rows are within the limit.

@vy-ton
Copy link
Contributor

vy-ton commented Jul 20, 2021

A potential issue here is that the ADD CONSTRAINT option would require the AppDev teams to make changes in order to opt-in to the guardrail. This is unlikely something the DBaaS team could enforce for the whole cluster and a reason why we would want the cluster setting.

@RaduBerinde
Copy link
Member

Note that implementing something like this for UPDATE/UPSERT can significantly affect workloads if column families are in use. We would have to always read the entire row to check the limit, even if otherwise we'd be touching a single column family. This would remove the contention-related benefit of column families.

@michae2
Copy link
Collaborator

michae2 commented Jul 20, 2021

Based on a call with @vy-ton it sounds like at first the row-size limit should be a cluster setting, rather than a per-table setting, even if this causes some queries to break if violating data is already in the cluster. And based on @RaduBerinde's excellent point above, this probably needs to be a column-family-size limit rather than a full-row-size limit.

@vy-ton
Copy link
Contributor

vy-ton commented Jul 20, 2021

If a table doesn't have column families, would a column family size limit then be equal to the whole row?

Other options we can validate with users:

  • The full row limit could be ignored for tables with column families

@michae2
Copy link
Collaborator

michae2 commented Jul 20, 2021

Yes, if a table doesn't have explicit column families then it has one implicit column family which consists of all of the columns.

@michae2
Copy link
Collaborator

michae2 commented Jul 21, 2021

It looks like #66927 might address limiting the number of rows modified in a query.

michae2 added a commit to michae2/cockroach that referenced this issue Jul 22, 2021
Addresses: cockroachdb#67400

Add sql.mutations.kv_max_size, a new cluster setting which limits the
size in bytes of key-value pairs put into the KV layer by SQL INSERT,
UPSERT, and UPDATE statements. This is a cluster-wide setting, intended
to protect the layers below SQL from performance degradation or
undefined behavior caused by excessively large KV pairs.

We only apply the limit during Put, PutInline, CPut, CPutInline,
CPutAllowingIfNotExists, and InitPut requests coming from SQL INSERT,
UPSERT, and UPDATE statements. By selectively applying the limit we
intend to prevent new giant KV pairs from entering the database while
allowing existing giant KV pairs to be read, deleted, backed up, and
restored.

Note that existing giant KV pairs will *not* be updatable with the
limit in place unless the update shrinks the KV pair. From the
perspective of a database administrator this might be either good or
bad, depending on whether the risk of database instability is worse than
the pain of an application not being able to update an existing row. (If
the row is divided into multiple column families, then any column
families falling below the limit will still be updatable as normal.)

Release note (ops change): A new cluster setting,
sql.mutations.kv_max_size, was added which limits the size in bytes of
key-value pairs written by INSERT, UPSERT, and UPDATE statements. This
is effectively a limit on the size of column families in a row (or on
the size of an entire row if column families are not in use). It is
intended as a guardrail to protect the database from instability and
undefined behavior. Existing rows which violate the limit can still be
read with SELECT, deleted, backed up, and restored, and can be updated
if the update shrinks the violating column family below the limit.
michae2 added a commit to michae2/cockroach that referenced this issue Aug 17, 2021
Addresses: cockroachdb#67400

Add sql.mutations.max_row_size.warn, a new cluster setting which
controls large row logging. Rows larger than this size will have their
keys logged to the SQL_PERF or SQL_INTERNAL_PERF channels whenever the
SQL layer puts them into the KV layer.

This logging takes place in rowHelper, which is used by both
row.Inserter and row.Updater. Most of the work is plumbing
settings.Values and SessionData into rowHelper, and adding a new
structured event type.

Release note (ops change): A new cluster setting,
sql.mutations.max_row_size.warn, was added, which controls large row
logging. Whenever a row larger than this size is added (or a simgle
column family if multiple column families are in use) a LargeRow event
is logged to the SQL_PERF channel (or a LargeRowInternal event is logged
to SQL_INTERNAL_PERF if the row was added by an internal query). This
could occur for INSERT, UPSERT, UPDATE, CREATE TABLE AS, CREATE INDEX,
ALTER TABLE, ALTER INDEX, IMPORT, or RESTORE statements.
michae2 added a commit to michae2/cockroach that referenced this issue Aug 19, 2021
Addresses: cockroachdb#67400

Add sql.mutations.max_row_size.log, a new cluster setting which
controls large row logging. Rows larger than this size will have their
keys logged to the SQL_PERF or SQL_INTERNAL_PERF channels whenever the
SQL layer puts them into the KV layer.

This logging takes place in rowHelper, which is used by both
row.Inserter and row.Updater. Most of the work is plumbing
settings.Values and SessionData into rowHelper, and adding a new
structured event type.

Release note (ops change): A new cluster setting,
sql.mutations.max_row_size.log, was added, which controls large row
logging. Whenever a row larger than this size is written (or a single
column family if multiple column families are in use) a LargeRow event
is logged to the SQL_PERF channel (or a LargeRowInternal event is logged
to SQL_INTERNAL_PERF if the row was added by an internal query). This
could occur for INSERT, UPSERT, UPDATE, CREATE TABLE AS, CREATE INDEX,
ALTER TABLE, ALTER INDEX, IMPORT, or RESTORE statements. SELECT, DELETE,
TRUNCATE, and DROP are not affected by this setting.
michae2 added a commit to michae2/cockroach that referenced this issue Aug 19, 2021
Addresses: cockroachdb#67400

Add sql.mutations.max_row_size.err, a new cluster setting similar to
sql.mutations.max_row_size.log, which limits the size of rows written to
the database. Statements trying to write a row larger than this will
fail with an error. (Internal queries will not fail with an error, but
will log a LargeRowInternal event to the SQL_INTERNAL_PERF channel.)

We're reusing eventpb.CommonLargeRowDetails as the error type, out of
convenience.

Release note (ops change): A new cluster setting,
sql.mutations.max_row_size.err, was added, which limits the size of rows
written to the database (or individual column families, if multiple
column families are in use). Statements trying to write a row larger
than this will fail with a code 54000 (program_limit_exceeded) error.
(Internal queries writing a row larger than this will not fail, but will
log a LargeRowInternal event to the SQL_INTERNAL_PERF channel.) This
limit is enforced for INSERT, UPSERT, and UPDATE statements. CREATE
TABLE AS, CREATE INDEX, ALTER TABLE, ALTER INDEX, IMPORT, and RESTORE
will not fail with an error, but will log LargeRowInternal events to the
SQL_INTERNAL_PERF channel. SELECT, DELETE, TRUNCATE, and DROP are not
affected by this limit. **Note that existing rows violating the limit
will *not* be updatable, unless the update shrinks the size of the row
below the limit, but *will* be selectable, deletable, and able to be
altered, backed-up, and restored.** For this reason we recommend using
the accompanying setting sql.mutations.max_row_size.log in conjunction
with SELECT pg_column_size() queries to detect and fix any existing
large rows before lowering sql.mutations.max_row_size.err.
michae2 added a commit to michae2/cockroach that referenced this issue Aug 19, 2021
Addresses: cockroachdb#67400

Add sql.mutations.max_row_size.log, a new cluster setting which
controls large row logging. Rows larger than this size will have their
keys logged to the SQL_PERF or SQL_INTERNAL_PERF channels whenever the
SQL layer puts them into the KV layer.

This logging takes place in rowHelper, which is used by both
row.Inserter and row.Updater. Most of the work is plumbing
settings.Values and SessionData into rowHelper, and adding a new
structured event type.

Release note (ops change): A new cluster setting,
sql.mutations.max_row_size.log, was added, which controls large row
logging. Whenever a row larger than this size is written (or a single
column family if multiple column families are in use) a LargeRow event
is logged to the SQL_PERF channel (or a LargeRowInternal event is logged
to SQL_INTERNAL_PERF if the row was added by an internal query). This
could occur for INSERT, UPSERT, UPDATE, CREATE TABLE AS, CREATE INDEX,
ALTER TABLE, ALTER INDEX, IMPORT, or RESTORE statements. SELECT, DELETE,
TRUNCATE, and DROP are not affected by this setting.
michae2 added a commit to michae2/cockroach that referenced this issue Aug 19, 2021
Addresses: cockroachdb#67400

Add sql.mutations.max_row_size.err, a new cluster setting similar to
sql.mutations.max_row_size.log, which limits the size of rows written to
the database. Statements trying to write a row larger than this will
fail with an error. (Internal queries will not fail with an error, but
will log a LargeRowInternal event to the SQL_INTERNAL_PERF channel.)

We're reusing eventpb.CommonLargeRowDetails as the error type, out of
convenience.

Release note (ops change): A new cluster setting,
sql.mutations.max_row_size.err, was added, which limits the size of rows
written to the database (or individual column families, if multiple
column families are in use). Statements trying to write a row larger
than this will fail with a code 54000 (program_limit_exceeded) error.
(Internal queries writing a row larger than this will not fail, but will
log a LargeRowInternal event to the SQL_INTERNAL_PERF channel.) This
limit is enforced for INSERT, UPSERT, and UPDATE statements. CREATE
TABLE AS, CREATE INDEX, ALTER TABLE, ALTER INDEX, IMPORT, and RESTORE
will not fail with an error, but will log LargeRowInternal events to the
SQL_INTERNAL_PERF channel. SELECT, DELETE, TRUNCATE, and DROP are not
affected by this limit. **Note that existing rows violating the limit
will *not* be updatable, unless the update shrinks the size of the row
below the limit, but *will* be selectable, deletable, and able to be
altered, backed-up, and restored.** For this reason we recommend using
the accompanying setting sql.mutations.max_row_size.log in conjunction
with SELECT pg_column_size() queries to detect and fix any existing
large rows before lowering sql.mutations.max_row_size.err.
michae2 added a commit to michae2/cockroach that referenced this issue Aug 20, 2021
Addresses: cockroachdb#67400

Add sql.mutations.max_row_size.log, a new cluster setting which
controls large row logging. Rows larger than this size will have their
primary keys logged to the SQL_PERF or SQL_INTERNAL_PERF channels
whenever the SQL layer puts them into the KV layer.

This logging takes place in rowHelper, which is used by both
row.Inserter and row.Updater. Most of the work is plumbing
settings.Values and SessionData into rowHelper, and adding a new
structured event type.

Release note (ops change): A new cluster setting,
sql.mutations.max_row_size.log, was added, which controls large row
logging. Whenever a row larger than this size is written (or a single
column family if multiple column families are in use) a LargeRow event
is logged to the SQL_PERF channel (or a LargeRowInternal event is logged
to SQL_INTERNAL_PERF if the row was added by an internal query). This
could occur for INSERT, UPSERT, UPDATE, CREATE TABLE AS, CREATE INDEX,
ALTER TABLE, ALTER INDEX, IMPORT, or RESTORE statements. SELECT, DELETE,
TRUNCATE, and DROP are not affected by this setting.
michae2 added a commit to michae2/cockroach that referenced this issue Aug 20, 2021
Addresses: cockroachdb#67400

Add sql.mutations.max_row_size.err, a new cluster setting similar to
sql.mutations.max_row_size.log, which limits the size of rows written to
the database. Statements trying to write a row larger than this will
fail with an error. (Internal queries will not fail with an error, but
will log a LargeRowInternal event to the SQL_INTERNAL_PERF channel.)

We're reusing eventpb.CommonLargeRowDetails as the error type, out of
convenience.

Release note (ops change): A new cluster setting,
sql.mutations.max_row_size.err, was added, which limits the size of rows
written to the database (or individual column families, if multiple
column families are in use). Statements trying to write a row larger
than this will fail with a code 54000 (program_limit_exceeded) error.
(Internal queries writing a row larger than this will not fail, but will
log a LargeRowInternal event to the SQL_INTERNAL_PERF channel.) This
limit is enforced for INSERT, UPSERT, and UPDATE statements. CREATE
TABLE AS, CREATE INDEX, ALTER TABLE, ALTER INDEX, IMPORT, and RESTORE
will not fail with an error, but will log LargeRowInternal events to the
SQL_INTERNAL_PERF channel. SELECT, DELETE, TRUNCATE, and DROP are not
affected by this limit. **Note that existing rows violating the limit
will *not* be updatable, unless the update shrinks the size of the row
below the limit, but *will* be selectable, deletable, and able to be
altered, backed-up, and restored.** For this reason we recommend using
the accompanying setting sql.mutations.max_row_size.log in conjunction
with SELECT pg_column_size() queries to detect and fix any existing
large rows before lowering sql.mutations.max_row_size.err.
michae2 added a commit to michae2/cockroach that referenced this issue Sep 8, 2021
Addresses: cockroachdb#67400

Add sql.mutations.max_row_size.log, a new cluster setting which
controls large row logging. Rows larger than this size will have their
primary keys logged to the SQL_PERF or SQL_INTERNAL_PERF channels
whenever the SQL layer puts them into the KV layer.

This logging takes place in rowHelper, which is used by both
row.Inserter and row.Updater. Most of the work is plumbing
settings.Values and SessionData into rowHelper, and adding a new
structured event type.

Release note (ops change): A new cluster setting,
sql.mutations.max_row_size.log, was added, which controls large row
logging. Whenever a row larger than this size is written (or a single
column family if multiple column families are in use) a LargeRow event
is logged to the SQL_PERF channel (or a LargeRowInternal event is logged
to SQL_INTERNAL_PERF if the row was added by an internal query). This
could occur for INSERT, UPSERT, UPDATE, CREATE TABLE AS, CREATE INDEX,
ALTER TABLE, ALTER INDEX, IMPORT, or RESTORE statements. SELECT, DELETE,
TRUNCATE, and DROP are not affected by this setting.

Release justification: Low risk, high benefit change to existing
functionality. This adds logging whenever a large row is written to the
database. Default is 0 in 21.1 (meaning disabled).
michae2 added a commit to michae2/cockroach that referenced this issue Sep 8, 2021
Addresses: cockroachdb#67400

Add sql.mutations.max_row_size.err, a new cluster setting similar to
sql.mutations.max_row_size.log, which limits the size of rows written to
the database. Statements trying to write a row larger than this will
fail with an error. (Internal queries will not fail with an error, but
will log a LargeRowInternal event to the SQL_INTERNAL_PERF channel.)

We're reusing eventpb.CommonLargeRowDetails as the error type, out of
convenience.

Release note (ops change): A new cluster setting,
sql.mutations.max_row_size.err, was added, which limits the size of rows
written to the database (or individual column families, if multiple
column families are in use). Statements trying to write a row larger
than this will fail with a code 54000 (program_limit_exceeded) error.
(Internal queries writing a row larger than this will not fail, but will
log a LargeRowInternal event to the SQL_INTERNAL_PERF channel.) This
limit is enforced for INSERT, UPSERT, and UPDATE statements. CREATE
TABLE AS, CREATE INDEX, ALTER TABLE, ALTER INDEX, IMPORT, and RESTORE
will not fail with an error, but will log LargeRowInternal events to the
SQL_INTERNAL_PERF channel. SELECT, DELETE, TRUNCATE, and DROP are not
affected by this limit. **Note that existing rows violating the limit
*cannot* be updated, unless the update shrinks the size of the row
below the limit, but *can* be selected, deleted, altered, backed-up, and
restored.** For this reason we recommend using the accompanying setting
sql.mutations.max_row_size.log in conjunction with
SELECT pg_column_size() queries to detect and fix any existing large
rows before lowering sql.mutations.max_row_size.err.

Release justification: Low risk, high benefit change to existing
functionality. This causes statements adding large rows to fail with an
error. Default is 0 in 21.1 (meaning disabled).
michae2 added a commit to michae2/cockroach that referenced this issue Sep 10, 2021
Addresses: cockroachdb#67400

Add sql.mutations.max_row_size.log, a new cluster setting which
controls large row logging. Rows larger than this size will have their
primary keys logged to the SQL_PERF or SQL_INTERNAL_PERF channels
whenever the SQL layer puts them into the KV layer.

This logging takes place in rowHelper, which is used by both
row.Inserter and row.Updater. Most of the work is plumbing
settings.Values and SessionData into rowHelper, and adding a new
structured event type.

Release note (ops change): A new cluster setting,
sql.mutations.max_row_size.log, was added, which controls large row
logging. Whenever a row larger than this size is written (or a single
column family if multiple column families are in use) a LargeRow event
is logged to the SQL_PERF channel (or a LargeRowInternal event is logged
to SQL_INTERNAL_PERF if the row was added by an internal query). This
could occur for INSERT, UPSERT, UPDATE, CREATE TABLE AS, CREATE INDEX,
ALTER TABLE, ALTER INDEX, IMPORT, or RESTORE statements. SELECT, DELETE,
TRUNCATE, and DROP are not affected by this setting.

Release justification: Low risk, high benefit change to existing
functionality. This adds logging whenever a large row is written to the
database. Default is 0 in 21.1 (meaning disabled).
michae2 added a commit to michae2/cockroach that referenced this issue Sep 10, 2021
Addresses: cockroachdb#67400

Add sql.mutations.max_row_size.err, a new cluster setting similar to
sql.mutations.max_row_size.log, which limits the size of rows written to
the database. Statements trying to write a row larger than this will
fail with an error. (Internal queries will not fail with an error, but
will log a LargeRowInternal event to the SQL_INTERNAL_PERF channel.)

We're reusing eventpb.CommonLargeRowDetails as the error type, out of
convenience.

Release note (ops change): A new cluster setting,
sql.mutations.max_row_size.err, was added, which limits the size of rows
written to the database (or individual column families, if multiple
column families are in use). Statements trying to write a row larger
than this will fail with a code 54000 (program_limit_exceeded) error.
(Internal queries writing a row larger than this will not fail, but will
log a LargeRowInternal event to the SQL_INTERNAL_PERF channel.) This
limit is enforced for INSERT, UPSERT, and UPDATE statements. CREATE
TABLE AS, CREATE INDEX, ALTER TABLE, ALTER INDEX, IMPORT, and RESTORE
will not fail with an error, but will log LargeRowInternal events to the
SQL_INTERNAL_PERF channel. SELECT, DELETE, TRUNCATE, and DROP are not
affected by this limit. **Note that existing rows violating the limit
*cannot* be updated, unless the update shrinks the size of the row
below the limit, but *can* be selected, deleted, altered, backed-up, and
restored.** For this reason we recommend using the accompanying setting
sql.mutations.max_row_size.log in conjunction with
SELECT pg_column_size() queries to detect and fix any existing large
rows before lowering sql.mutations.max_row_size.err.

Release justification: Low risk, high benefit change to existing
functionality. This causes statements adding large rows to fail with an
error. Default is 0 in 21.1 (meaning disabled).
michae2 added a commit to michae2/cockroach that referenced this issue Sep 10, 2021
Addresses: cockroachdb#67400

Rename sql.mutations.max_row_size.{log|err} to
sql.guardrails.max_row_size_{log|err} for consistency with
transaction_rows_{read|written}_{log|err} and upcoming metrics.

Release justification: Low-risk update to new functionality.

Release note (ops change): New variables
sql.mutations.max_row_size.{log|err} were renamed to
sql.guardrails.max_row_size_{log|err} for consistency with other
variables and metrics.
michae2 added a commit to michae2/cockroach that referenced this issue Sep 10, 2021
Addresses: cockroachdb#67400

Add metrics for sql/row and pass them down from ExecutorConfig and
FlowCtx all the way to row.Helper. Like sql.Metrics, there are two
copies, one for user queries and one for internal queries. (I wanted to
make these part of sql.Metrics, but there are several users of sql/row
that do not operate under a sql.Server or connExecutor so we are forced
to add row.Metrics to the ExecutorConfig and FlowCtx instead.)

I ran into difficulty passing these through import, as FlowCtx itself
is not plumbed through.

There are only two metrics at first, corresponding to
violations of sql.guardrails.max_row_size_{log|err}.

Release justification: Low-risk update to new functionality.

Release note (ops): Added four new metrics,
sql.guardrails.max_row_size_{log|err}.count{.internal} which are
incremented whenever a large row violates the corresponding
sql.guardrails.max_row_size_{log|err} limit.
michae2 added a commit to michae2/cockroach that referenced this issue Sep 10, 2021
Addresses: cockroachdb#67400

Add sql.mutations.max_row_size.log, a new cluster setting which
controls large row logging. Rows larger than this size will have their
primary keys logged to the SQL_PERF or SQL_INTERNAL_PERF channels
whenever the SQL layer puts them into the KV layer.

This logging takes place in rowHelper, which is used by both
row.Inserter and row.Updater. Most of the work is plumbing
settings.Values and SessionData into rowHelper, and adding a new
structured event type.

Release note (ops change): A new cluster setting,
sql.mutations.max_row_size.log, was added, which controls large row
logging. Whenever a row larger than this size is written (or a single
column family if multiple column families are in use) a LargeRow event
is logged to the SQL_PERF channel (or a LargeRowInternal event is logged
to SQL_INTERNAL_PERF if the row was added by an internal query). This
could occur for INSERT, UPSERT, UPDATE, CREATE TABLE AS, CREATE INDEX,
ALTER TABLE, ALTER INDEX, IMPORT, or RESTORE statements. SELECT, DELETE,
TRUNCATE, and DROP are not affected by this setting.

Release justification: Low risk, high benefit change to existing
functionality. This adds logging whenever a large row is written to the
database. Default is 0 in 21.1 (meaning disabled).
michae2 added a commit to michae2/cockroach that referenced this issue Sep 10, 2021
Addresses: cockroachdb#67400

Add sql.mutations.max_row_size.err, a new cluster setting similar to
sql.mutations.max_row_size.log, which limits the size of rows written to
the database. Statements trying to write a row larger than this will
fail with an error. (Internal queries will not fail with an error, but
will log a LargeRowInternal event to the SQL_INTERNAL_PERF channel.)

We're reusing eventpb.CommonLargeRowDetails as the error type, out of
convenience.

Release note (ops change): A new cluster setting,
sql.mutations.max_row_size.err, was added, which limits the size of rows
written to the database (or individual column families, if multiple
column families are in use). Statements trying to write a row larger
than this will fail with a code 54000 (program_limit_exceeded) error.
(Internal queries writing a row larger than this will not fail, but will
log a LargeRowInternal event to the SQL_INTERNAL_PERF channel.) This
limit is enforced for INSERT, UPSERT, and UPDATE statements. CREATE
TABLE AS, CREATE INDEX, ALTER TABLE, ALTER INDEX, IMPORT, and RESTORE
will not fail with an error, but will log LargeRowInternal events to the
SQL_INTERNAL_PERF channel. SELECT, DELETE, TRUNCATE, and DROP are not
affected by this limit. **Note that existing rows violating the limit
*cannot* be updated, unless the update shrinks the size of the row
below the limit, but *can* be selected, deleted, altered, backed-up, and
restored.** For this reason we recommend using the accompanying setting
sql.mutations.max_row_size.log in conjunction with
SELECT pg_column_size() queries to detect and fix any existing large
rows before lowering sql.mutations.max_row_size.err.

Release justification: Low risk, high benefit change to existing
functionality. This causes statements adding large rows to fail with an
error. Default is 0 in 21.1 (meaning disabled).
michae2 added a commit to michae2/cockroach that referenced this issue Sep 10, 2021
Addresses: cockroachdb#67400

Rename sql.mutations.max_row_size.{log|err} to
sql.guardrails.max_row_size_{log|err} for consistency with
transaction_rows_{read|written}_{log|err} and upcoming metrics.

Release justification: Low-risk update to new functionality.

Release note (ops change): New variables
sql.mutations.max_row_size.{log|err} were renamed to
sql.guardrails.max_row_size_{log|err} for consistency with other
variables and metrics.
michae2 added a commit to michae2/cockroach that referenced this issue Sep 10, 2021
Addresses: cockroachdb#67400

Add metrics for sql/row and pass them down from ExecutorConfig and
FlowCtx all the way to row.Helper. Like sql.Metrics, there are two
copies, one for user queries and one for internal queries. (I wanted to
make these part of sql.Metrics, but there are several users of sql/row
that do not operate under a sql.Server or connExecutor so we are forced
to add row.Metrics to the ExecutorConfig and FlowCtx instead.)

I ran into difficulty passing these through import, as FlowCtx itself
is not plumbed through.

There are only two metrics at first, corresponding to
violations of sql.guardrails.max_row_size_{log|err}.

Release justification: Low-risk update to new functionality.

Release note (ops): Added four new metrics,
sql.guardrails.max_row_size_{log|err}.count{.internal} which are
incremented whenever a large row violates the corresponding
sql.guardrails.max_row_size_{log|err} limit.
michae2 added a commit to michae2/cockroach that referenced this issue Sep 13, 2021
Addresses: cockroachdb#67400, cockroachdb#69477

Remove ViolatesMaxRowSizeErr from CommonLargeRowDetails, as was done
for CommonTxnRowsLimitDetails in cockroachdb#69945.

Also remove the SafeDetails methods from CommonLargeRowDetails,
txnRowsReadLimitErr, and txnRowsWrittenLimitErr, as I don't think we
need them.

Release note: None (there was no release between the introduction of the
LargeRow and LargeRowInternal events and this commit).
craig bot pushed a commit that referenced this issue Sep 13, 2021
69300: jobs: retry non-cancelable running and all reverting jobs r=ajwerner a=sajjadrizvi

Previously, non-cancelable jobs were retried in running state
only if their errors were marked as retryable. Moreover,  only 
non-cancelable reverting jobs were retried by default. This 
commit makes non-cancelable jobs always retry in running 
state unless their error is marked as a permanent error. In
addition, this commit makes all reverting jobs to retry when
they fail. As a result, non-cancelable running jobs and all
reverting jobs do not fail due to transient errors.

Release justification: low-risk updates to new functionality.

Release note (general change): Non-cancelable jobs, such as
schema-change GC jobs, now do not fail unless they fail with
a permanent error. They retry with exponential-backoff if
they fail due to a transient error. Furthermore, Jobs that 
perform reverting tasks do not fail. Instead, they are retried 
with exponential-backoff if an error is encountered while 
reverting. As a result, transient errors do not impact the jobs 
that are reverting.

Fixes: #66685

69982: docs/tech-notes: admission control overview r=sumeerbhola a=sumeerbhola

Release justification: Non-production code change.
Release note: None

70094: tenantcostserver: fix erroneous panic in tests r=RaduBerinde a=RaduBerinde

The test-only code that checks the invariants of the `tenant_usage`
table inadvertently panics if the query hits an error (such as one
that would be expected if the server is shutting down). We now just
log the error instead.

Fixes #70089.

Release note: None

Release justification: non-production code change to fix test failure.

70095: tenantcostclient: restrict allowed configuration from the tenant side r=RaduBerinde a=RaduBerinde

This change restricts the configuration of tenant cost control from
the tenant side. In the future, we will want to have settings where
the values come from the host cluster but we don't have that
infrastructure today.

With tenants being able to set their own settings, they could easily
sabotage the cost control mechanisms. This change restricts the
allowed values for the target period and the CPU usage allowance, and
fixes the cost model configuration to the default.

Release note: None

Release justification: Necessary fix for the distributed rate limiting
functionality, which is vital for the upcoming Serverless MVP release.
It allows CRDB to throttle clusters that have run out of free or paid
request units (which measure CPU and I/O usage). This functionality is
only enabled in multi-tenant scenarios and should have no impact on
our dedicated customers.

70102: sql: clean up large row errors and events r=knz,yuzefovich a=michae2

Addresses: #67400, #69477

Remove ViolatesMaxRowSizeErr from CommonLargeRowDetails, as was done
for CommonTxnRowsLimitDetails in #69945.

Also remove the SafeDetails methods from CommonLargeRowDetails,
txnRowsReadLimitErr, and txnRowsWrittenLimitErr, as I don't think we
need them.

Release note: None (there was no release between the introduction of the
LargeRow and LargeRowInternal events and this commit).

70118: kv: lock mutexes for `TxnCoordSender.Epoch()` and `Txn.status()` r=ajwerner a=erikgrinaker

### kvcoord: lock mutex in `TxnCoordSender.Epoch()`

Methods that access `TxnCoordSender.mu` fields must lock the mutex
first. `Epoch()` didn't.

Resolves #70071.

Release note: None

### kv: fix mutex locking for `Txn.status`

`Txn.status()` fetches the transaction status from the mutex-protected
`Txn.mu.sender` field, but callers did not take out the mutex lock when
calling it.

This patch renames the method to `Txn.statusLocked()`, and updates all
callers to take out the lock before calling it.

Release note: None

Co-authored-by: Sajjad Rizvi <sajjad@cockroachlabs.com>
Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>
Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
michae2 added a commit to michae2/cockroach that referenced this issue Sep 13, 2021
Addresses: cockroachdb#67400, cockroachdb#69477

Remove ViolatesMaxRowSizeErr from CommonLargeRowDetails, as was done
for CommonTxnRowsLimitDetails in cockroachdb#69945.

Also remove the SafeDetails methods from CommonLargeRowDetails,
txnRowsReadLimitErr, and txnRowsWrittenLimitErr, as I don't think we
need them.

Release justification: low-risk update to new functionality.

Release note: None (there was no release between the introduction of the
LargeRow and LargeRowInternal events and this commit).
yuzefovich pushed a commit to yuzefovich/cockroach that referenced this issue Sep 16, 2021
Addresses: cockroachdb#67400, cockroachdb#69477

Remove ViolatesMaxRowSizeErr from CommonLargeRowDetails, as was done
for CommonTxnRowsLimitDetails in cockroachdb#69945.

Also remove the SafeDetails methods from CommonLargeRowDetails,
txnRowsReadLimitErr, and txnRowsWrittenLimitErr, as I don't think we
need them.

Release note: None (there was no release between the introduction of the
LargeRow and LargeRowInternal events and this commit).
@michae2
Copy link
Collaborator

michae2 commented Sep 21, 2021

This is now fixed in 21.1, 21.2, and master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Archived in project
Development

No branches or pull requests

5 participants