Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support VIRTUAL computed columns #57608

Closed
26 of 30 tasks
RaduBerinde opened this issue Dec 5, 2020 · 3 comments
Closed
26 of 30 tasks

sql: support VIRTUAL computed columns #57608

RaduBerinde opened this issue Dec 5, 2020 · 3 comments
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@RaduBerinde
Copy link
Member

RaduBerinde commented Dec 5, 2020

This issue tracks implementing virtual computed columns (using VIRTUAL instead of STORED). This is closely related to #9682 - the expression-based index columns will be implemented as virtual columns internally. It is also related to hash shared indexes where we would like the bucket column to be virtual.

Running checklist:

  • parser support
  • test catalog support
  • descriptors and catalog support
  • support querying tables with virtual columns
  • mutations of tables with virtual columns (with or without indexes on virtual columns)
    • INSERT
    • DELETE
    • UPDATE
    • UPSERT
      • Add rules for lookup joins with projections to fix plans for UPSERT
  • disallow PK on virtual columns
  • disallow indexes STORING virtual columns
  • disallow FKs on virtual columns
  • disallow computed/virtual columns depending on other virtual columns (already disallowed for all computed columns)
  • support virtual columns in partial index predicates
  • support adding / removing virtual columns
  • support adding / removing indexes on virtual columns
  • disallow or support virtual columns in check constraints
  • support or disallow inverted indexes on virtual columns
  • support unique (and partial unique) indexes on virtual columns
  • support virtual column as leading column on a multi-column inverted index
  • support partial indexes on virtual columns
  • support virtual columns in unique constraints without index
  • teach SQLsmith about virtual columns
  • inline virtual column projections into filters (allowing use of indexes on virtual columns more cases)
  • add backup/restore test with table with virtual columns

Non-critical items:

  • figure out how to prove implication for cases like:
CREATE TABLE t (
  k INT PRIMARY KEY,
  a INT,
  b INT,
  c INT AS (b + 10) VIRTUAL,
  INDEX (a) WHERE c IN (10, 20, 30)
)

SELECT k FROM t WHERE c = 20

(filed separate issue #61565).

  • handle numeric references case where we specify a virtual column but not a dependent one (applies to stored computed columns as well); filed separate issue opt: better validation of column numeric references #61563.
  • teach indexConstraintCtx.simplifyFilter to remove the redundant filters like the b:2 = 3 below:
exec-ddl
CREATE TABLE tmp (
  a INT,
  b INT,
  v INT AS (b + 10) VIRTUAL,
  INDEX idx (v)
)
----

opt
SELECT * FROM tmp WHERE v = 13
----
  project
   ├── columns: a:1 b:2!null v:3!null
   ├── immutable
   ├── fd: ()-->(2,3)
   ├── select
   │    ├── columns: a:1 b:2!null
   │    ├── fd: ()-->(2)
   │    ├── index-join tmp
   │    │    ├── columns: a:1 b:2
   │    │    └── scan tmp@idx
   │    │         ├── columns: rowid:4!null
   │    │         ├── constraint: /3/4: [/13 - /13]
   │    │         └── key: (4)
   │    └── filters
   │         └── b:2 = 3 [outer=(2), constraints=(/2: [/3 - /3]; tight), fd=()-->(2)]
   └── projections
        └── b:2 + 10 [as=v:3, outer=(2), immutable]

(filed separate issue #61566).

21.2 items:

@RaduBerinde RaduBerinde added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Dec 5, 2020
@RaduBerinde RaduBerinde self-assigned this Dec 5, 2020
craig bot pushed a commit that referenced this issue Dec 7, 2020
57584: sql: lift multi-region functions on to immutable descriptors r=ajwerner a=arulajmani

Previously, we added some accessors on type and database descriptors
that operated on the raw proto. Since, I've learned that this stuff
should really be on immutable types instead, as we're moving to a new
world where we move away from raw proto accesses. This patch lifts
accessors from the raw proto and on to the immutable types.

Release note: None

57606: sql: make CREATE STATISTICS work properly with EXPLAIN ANALYZE (DEBUG) r=RaduBerinde a=RaduBerinde

#### sql: migrate other EXPLAIN variants to new exec factory interface

This change migrates the other EXPLAIN variants to the new interface,
which uses an explain factory to build the explained plan.

This fixes a TODO in the experimental factory.

Release note: None

#### opt: add operator for CREATE STATISTICS

This change adds an optimizer operator for CREATE STATISTICS (instead
of using the opaque operator).

Release note: None

#### sql: reimplement EXPLAIN handling of CREATE STATISTICS

CREATE STATISTICS runs the statistics plan inside a job. The plan node
tree ends in a dead end at the createStatsNode. For explain purposes,
we want to see information about the actual plan that runs inside the
job.

The current solution is to pretend that we build the plan directly in
certain contexts. This solution is hard to extend for the new
top-level instrumentation for EXPLAIN ANALYZE.

This commit changes this solution to add a `runAsJob` flag and
actually plan the distributed plan directly when we don't run as a
job. We automatically set `runAsJob=false` whenever we are building
inside Explain.

Release note: None

#### sql: make CREATE STATISTICS work properly with EXPLAIN ANALYZE (DEBUG)

With this change, the bundle now contains the diagram for the
underlying plan.

Release note: None

57622: opt: parser, test catalog, query support for virtual columns r=RaduBerinde a=RaduBerinde

This set of changes add parser, testcat, and query support for virtual columns. I have come full circle and realized that even though user-defined virtual columns don't help with the optimizer-side work for expression-based indexes, they help with integrating with the existing sql code. In particular, we can carve out pieces of work and write more targeted tests related to virtual columns, without having to define them through an expression-based index.

Informs #57608.

#### sql: support virtual columns in parser

Support VIRTUAL keyword in the parser. We still error out if it is
ever used.

Release note: None

#### opt: clean up hack in test catalog

To ensure that PK columns are non-nullable, the test catalog
reinitializes them to override the nullable flag. We replace this
hacky mechanism with figuring out the PK columns beforehand and
marking them as non-nullable in the definition before creating the
table columns.

Release note: None

#### opt: test catalog support for virtual columns

Release note: None

#### opt: optbuilder support for scanning virtual columns

This change adds support for scanning tables with virtual columns. The
virtual columns are projected on top of the Scan.

Release note: None

Co-authored-by: arulajmani <arulajmani@gmail.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Dec 12, 2020
57803: sql: initial support for virtual columns r=RaduBerinde a=RaduBerinde

This change adds very basic support for virtual columns to the
non-test descriptors and catalog. This is gated behind an experimental
session setting.

Informs #57608.

Release note: None

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Dec 17, 2020
57998: sql: check use of virtual columns in indexes r=RaduBerinde a=RaduBerinde

This commit adds checks that disallow virtual columns being used as
primary key columns or stored columns in indexes.

Release note: None

Informs #57608.

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Dec 17, 2020
57931: opt: support DELETE with virtual columns r=RaduBerinde a=RaduBerinde

Release note: None

Informs #57608.

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
@ajwerner
Copy link
Contributor

disallow indexes STORING virtual columns

Why this restriction? The virtual columns are going to end up as index keys, why not let them end up as index values?

@RaduBerinde
Copy link
Member Author

No strong reason - just trying to limit the initial work. Something we can allow later for sure.

RaduBerinde added a commit to RaduBerinde/cockroach that referenced this issue Feb 1, 2021
Disallow FK references to or from virtual columns. Note that there is
no big reason why we couldn't support them other than some extra work
and testing surface; this is tracked by cockroachdb#59671.

Informs cockroachdb#57608.

Release note: None
craig bot pushed a commit that referenced this issue Feb 4, 2021
59675: sql: disallow virtual columns in FK references r=RaduBerinde a=RaduBerinde

Disallow FK references to or from virtual columns. Note that there is
no big reason why we couldn't support them other than some extra work
and testing surface; this is tracked by #59671.

Informs #57608.

Release note: None

59800: roachprod: extend error reporting for DNS sync r=rail a=rail

Related to #58905

In some cases `roachprod sync` fails updating the crdb.io DNS entries
without giving enough information about the failure. It looks like the
root cause may be the file format passed to the `gcloud dns` command.

Added extra information in the error message in order to get a better
picture when it happens again.

Release note: None

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Co-authored-by: Rail Aliiev <rail@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Feb 19, 2021
60748: sql: remove experimental setting for virtual columns r=RaduBerinde a=RaduBerinde

Informs #57608.

Release note (sql change): CockroachDB now supports VIRTUAL computed
columns (as opposed to STORED). These are computed columns that are
not stored in the primary index and are recomputed as necessary.

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Feb 19, 2021
60550: sql: add nodes for each EXPLAIN ANALYZE operator r=RaduBerinde a=RaduBerinde

Show the cluster nodes involved in the execution of each operator.

Note that this info does not show up in the non-analyze EXPLAIN. It is
technically much more challenging to do that because of the indirect
way we do distsql planning. Once we have the new DistSQL exec factory,
we will be able to add it.

Note on the implementation: I started by trying to make execution set
`ComponentID.NodeID` in all cases, but I got stuck in `ProcessorBase`
where we only have a `SQLIDContainer` available. I don't fully
understand the new abstraction and whether the distsql components and
flows should really use SQLIDs instead of NodeIDs.

Unfortunately, there is not much we can do to test this currently
(other than manual testing). I will investigate making the
"deterministic" flag more fine-grained, so that we can hide truly
non-deterministic values (like timings) separately from those that
just vary with the configuration (query distribution).

Example:
```
movr> EXPLAIN ANALYZE SELECT * FROM rides JOIN vehicles ON rides.vehicle_id = vehicles.id;
                    info
--------------------------------------------
  planning time: 158µs
  execution time: 7ms
  distribution: full
  vectorized: true

   hash join
  │ cluster nodes: n1, n2, n3
  │ actual row count: 500
  │ equality: (vehicle_id) = (id)
  │
  ├──  scan
  │     cluster nodes: n1, n2, n3
  │     actual row count: 500
  │     KV rows read: 500
  │     KV bytes read: 86 KiB
  │     missing stats
  │     table: rides@primary
  │     spans: FULL SCAN
  │
  └──  scan
        cluster nodes: n1, n2, n3
        actual row count: 15
        KV rows read: 15
        KV bytes read: 2.3 KiB
        missing stats
        table: vehicles@primary
        spans: FULL SCAN
```

Release note (sql change): EXPLAIN ANALYZE now includes the nodes
which were involved in the execution of each operator in the tree.

Informs #59860.

60748: sql: remove experimental setting for virtual columns r=RaduBerinde a=RaduBerinde

Informs #57608.

Release note (sql change): CockroachDB now supports VIRTUAL computed
columns (as opposed to STORED). These are computed columns that are
not stored in the primary index and are recomputed as necessary.

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Feb 23, 2021
60827: sql: propagate result types to inputs during set ops with nulls and tuples r=yuzefovich a=yuzefovich

**sql: move MergeResultTypes from physicalplan package and unexport**

Release note: None

**sql: propagate result types to inputs in set ops with nulls and tuples**

The physical planning for set operations is special because it allows
for the result types from its inputs to be of different type family in
some cases. Namely, it is ok if one input has `types.Unknown` whereas
the other input is of some "known" type. In order to handle such case
the execbuilder plans casts; however, currently it is not possible to
plan casts to `Tuple` type. As a result, the vectorized engine is not
able to execute the query because it expects the data coming from both
inputs to be of the same type.

This commit modifies the logic of reconciling the result types from both
inputs to additionally update unknown types on the inputs if the other
input is of a Tuple type. Such behavior is acceptable because in the
original plan we only expected to have NULL values, and the Tuple type
is able to handle such case too.

The problem can also be reproduced in a logic test with the row-by-row
engine in the `fakedist` setting. This bug was introduced in
a76ee31, and before that change we had
the following behavior: overwrite `plan.ResultTypes` to the merged types
and then plan a stage of distinct processors if we have UNION. This
resulted in the input spec for the input stream to distinct having
correctly set types. After the change, that was no longer the case.

Before that change the vectorized engine couldn't handle such queries
due to type mismatch during `SupportsVectorized` check, so we fell back
to the row-by-row engine. However, this commit makes it so that the
vectorized engine is able to execute such queries.

Fixes: #59611.

Release note (bug fix): CockroachDB previously could encounter an
internal error when performing UNION operation when the first input
resulted only in NULL values and consequent inputs produce tuples, and
this is now fixed. Only 21.1 alpha versions are affected.

60853: sql: do not allow reference FK partial unique constraints r=mgartner a=mgartner

This commit prevents users from creating a FK with a reference column
that has a partial unique constraint. These constraints do not guarantee
uniqueness in the entire table so they cannot be used.

There is no release note because these constraints are gated behind the
experimental_enable_unique_without_index_constraints session variable.

Release note: None

60944: bulkio: Avoid using AssertionFailed error when planning replication. r=miretskiy a=miretskiy

AssertionFailed error message is not appropriate.
Return "unsupported error" instead.

Release Notes: None

60945: sql: add tests with unique constraints and virtual columns r=RaduBerinde a=RaduBerinde

Logic tests for unique indexes and constraints involving virtual
columns.

Informs #57608.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Mar 5, 2021
61359: tracing: use byte-limits for logs/structured events per span r=irfansharif a=irfansharif

Touches #59188. Follow-on work from #60678. We can introduce byte-limits for
verbose logging and structured events, instead of limiting things based on
count.

This PR also:
- adds a _dropped tag to recordings with dropped logs/structured events.
- squashes a bug where reset spans (as used in SessionTracing) still
  held onto earlier structured events
- moves away from the internal usage of the opentracing.LogRecord type,
  it's unnecessary

Release justification: low risk, high benefit changes to existing
functionality

Release note: None

---

+cc @knz / @erikgrinaker / @angelapwen for pod-visibility.

61482: jobs: add job metrics per-type to track success, failure, and cancel r=fqazi a=fqazi

Fixes: #59711

Previously, there were only over all counters tracking how many
jobs were completed, cancelled, or failed. This was inadequate
because it didn't make it easy to tell in aggregate what job
types they were. To address this, this patch will add counters
for different job types for tracking success, failure, and
cancellation.

Release justification: Low risk change only adding a metric inside
the crdb_internal.feature_usage table
Release note: None

61491: sqlsmith: add support for computed columns r=RaduBerinde a=RaduBerinde

This changes the random table generator to also create computed
columns (either STORED or VIRTUAL). Some example of definitions:
 - `col1_14 STRING NOT NULL AS (lower(CAST(col1_8 AS STRING))) VIRTUAL`
 - `col2_6 DECIMAL NOT NULL AS (col2_2 + 1:::DECIMAL) STORED`
 - `col1_13 INT4 AS (col1_0 + col1_10) STORED`

Release justification: non-production code change.

Release note: None

Informs #57608.

61509: sql: add a regression test r=RaduBerinde a=RaduBerinde

This commit adds a regression test for #58104 (the problem was already
fixed).

Resolves #58104.

Release justification: non-production code change.

Release note: None

61522: opt: fix fetch scope in UPDATE..FROM statements r=mgartner a=mgartner

Previously, the fetch scope incorrectly included columns in the FROM
clause of an UPDATE..FROM statement. As a result, column names shared by
the FROM clause and the mutating table lead to ambiguity when resolving
partial index DEL column expressions. This commit ensures that the fetch
scope does not include columns in the FROM clause.

Fixes #61284

Release justification: This is a low-risk bug fix to existing
functionality.

Release note (bug fix): An UPDATE..FROM statement where the FROM clause
contained column names that match table column names erred if the table
had a partial index predicate referencing those columns. This bug,
present since partial indexes were released in version 20.2.0, has been
fixed.

61553: ccl,server: error.Wrap on previously handled errors r=[dt,miretskiy] a=stevendanna

These errors.Wrap calls are wrapping errors that are nil and thus will
always return a nil error.

Release justification: Minor error handling fixes
Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Steven Danna <danna@cockroachlabs.com>
craig bot pushed a commit that referenced this issue Mar 6, 2021
61567: backupccl: add a BACKUP/RESTORE test with a virtual column r=RaduBerinde a=RaduBerinde

Release justification: non-production code change.

Release note: None

Informs #57608.

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
@RaduBerinde
Copy link
Member Author

Outstanding items (none of which are critical) have been filed as separate issues; closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

2 participants