Skip to content

Commit

Permalink
schematelemetry,eventpb: add schema telemetry
Browse files Browse the repository at this point in the history
This commit adds:
  - the event definitions and logic for generating them,
  - the scheduling and jobs boilerplate to periodically log them.

Care is taken to redact all strings present in descriptors which might
unintentionally be leaking PIIs.

The event generation logic is tested on the schema of a bootstrapped
test cluster: the test checks that the events match expectations.

Fixes cockroachdb#84284.

Release note (general change): CRDB will now collect schema info if
phoning home is enabled. This schema info is added to the telemetry log
by a built-in scheduled job which runs on a weekly basis by default.
This recurrence can be changed via the sql.schema.telemetry.recurrence
cluster setting.  The schedule can also be paused via PAUSE SCHEDULE
followed by its ID, which can be retrieved by querying
SELECT * FROM [SHOW SCHEDULES] WHERE label = 'sql-schema-telemetry'.
  • Loading branch information
Marius Posta committed Aug 8, 2022
1 parent 40d59b5 commit 3892138
Show file tree
Hide file tree
Showing 53 changed files with 1,903 additions and 37 deletions.
52 changes: 52 additions & 0 deletions docs/generated/eventlog.md
Original file line number Diff line number Diff line change
Expand Up @@ -2577,6 +2577,58 @@ contains common SQL event/execution details.
| `FullIndexScan` | Whether the query contains a full secondary index scan of a non-partial index. | no |
| `TxnCounter` | The sequence number of the SQL transaction inside its session. | no |

### `schema_descriptor`

An event of type `schema_descriptor` is an event for schema telemetry, whose purpose is
to take periodic snapshots of the cluster's SQL schema and publish them in
the telemetry log channel. For all intents and purposes, the data in such a
snapshot can be thought of the outer join of certain system tables:
namespace, descriptor, and at some point perhaps zones, etc.

Snapshots are too large to conveniently be published as a single log event,
so instead they're broken down into SchemaDescriptor events which
contain the data in one record of this outer join projection. These events
are prefixed by a header (a SchemaSnapshotMetadata event).


| Field | Description | Sensitive |
|--|--|--|
| `SnapshotID` | SnapshotID is the unique identifier of the snapshot that this event is part of. | no |
| `ParentDatabaseID` | ParentDatabaseID matches the same key column in system.namespace. | no |
| `ParentSchemaID` | ParentSchemaID matches the same key column in system.namespace. | no |
| `Name` | Name matches the same key column in system.namespace. | no |
| `DescID` | DescID matches the 'id' column in system.namespace and system.descriptor. | no |
| `Desc` | Desc matches the 'descriptor' column in system.descriptor. Some contents of the descriptor may be redacted to prevent leaking PII. | no |


#### Common fields

| Field | Description | Sensitive |
|--|--|--|
| `Timestamp` | The timestamp of the event. Expressed as nanoseconds since the Unix epoch. | no |
| `EventType` | The type of the event. | no |

### `schema_snapshot_metadata`

An event of type `schema_snapshot_metadata` is an event describing a schema snapshot, which
is a set of SchemaDescriptor messages sharing the same SnapshotID.


| Field | Description | Sensitive |
|--|--|--|
| `SnapshotID` | SnapshotID is the unique identifier of this snapshot. | no |
| `NumRecords` | NumRecords is how many SchemaDescriptor events are in the snapshot. | no |
| `AsOfTimestamp` | AsOfTimestamp is when the snapshot was taken. This is equivalent to the timestamp given in the AS OF SYSTEM TIME clause when querying the namespace and descriptor tables in the system database. Expressed as nanoseconds since the Unix epoch. | no |
| `Errors` | Errors records any errors encountered when post-processing this snapshot, which includes the redaction of any potential PII. | yes |


#### Common fields

| Field | Description | Sensitive |
|--|--|--|
| `Timestamp` | The timestamp of the event. Expressed as nanoseconds since the Unix epoch. | no |
| `EventType` | The type of the event. | no |

## Zone config events

Events in this category pertain to zone configuration changes on
Expand Down
3 changes: 2 additions & 1 deletion docs/generated/settings/settings-for-tenants.txt
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,7 @@ sql.multiple_modifications_of_table.enabled boolean false if true, allow stateme
sql.multiregion.drop_primary_region.enabled boolean true allows dropping the PRIMARY REGION of a database if it is the last region
sql.notices.enabled boolean true enable notices in the server/client protocol being sent
sql.optimizer.uniqueness_checks_for_gen_random_uuid.enabled boolean false if enabled, uniqueness checks may be planned for mutations of UUID columns updated with gen_random_uuid(); otherwise, uniqueness is assumed due to near-zero collision probability
sql.schema.telemetry.recurrence string @weekly cron-tab recurrence for SQL schema telemetry job
sql.spatial.experimental_box2d_comparison_operators.enabled boolean false enables the use of certain experimental box2d comparison operators
sql.stats.automatic_collection.enabled boolean true automatic statistics collection mode
sql.stats.automatic_collection.fraction_stale_rows float 0.2 target fraction of stale rows per table that will trigger a statistics refresh
Expand Down Expand Up @@ -284,4 +285,4 @@ trace.jaeger.agent string the address of a Jaeger agent to receive traces using
trace.opentelemetry.collector string address of an OpenTelemetry trace collector to receive traces using the otel gRPC protocol, as <host>:<port>. If no port is specified, 4317 will be used.
trace.span_registry.enabled boolean true if set, ongoing traces can be seen at https://<ui>/#/debug/tracez
trace.zipkin.collector string the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used.
version version 22.1-40 set the active cluster version in the format '<major>.<minor>'
version version 22.1-42 set the active cluster version in the format '<major>.<minor>'
3 changes: 2 additions & 1 deletion docs/generated/settings/settings.html
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@
<tr><td><code>sql.multiregion.drop_primary_region.enabled</code></td><td>boolean</td><td><code>true</code></td><td>allows dropping the PRIMARY REGION of a database if it is the last region</td></tr>
<tr><td><code>sql.notices.enabled</code></td><td>boolean</td><td><code>true</code></td><td>enable notices in the server/client protocol being sent</td></tr>
<tr><td><code>sql.optimizer.uniqueness_checks_for_gen_random_uuid.enabled</code></td><td>boolean</td><td><code>false</code></td><td>if enabled, uniqueness checks may be planned for mutations of UUID columns updated with gen_random_uuid(); otherwise, uniqueness is assumed due to near-zero collision probability</td></tr>
<tr><td><code>sql.schema.telemetry.recurrence</code></td><td>string</td><td><code>@weekly</code></td><td>cron-tab recurrence for SQL schema telemetry job</td></tr>
<tr><td><code>sql.spatial.experimental_box2d_comparison_operators.enabled</code></td><td>boolean</td><td><code>false</code></td><td>enables the use of certain experimental box2d comparison operators</td></tr>
<tr><td><code>sql.stats.automatic_collection.enabled</code></td><td>boolean</td><td><code>true</code></td><td>automatic statistics collection mode</td></tr>
<tr><td><code>sql.stats.automatic_collection.fraction_stale_rows</code></td><td>float</td><td><code>0.2</code></td><td>target fraction of stale rows per table that will trigger a statistics refresh</td></tr>
Expand Down Expand Up @@ -215,6 +216,6 @@
<tr><td><code>trace.opentelemetry.collector</code></td><td>string</td><td><code></code></td><td>address of an OpenTelemetry trace collector to receive traces using the otel gRPC protocol, as <host>:<port>. If no port is specified, 4317 will be used.</td></tr>
<tr><td><code>trace.span_registry.enabled</code></td><td>boolean</td><td><code>true</code></td><td>if set, ongoing traces can be seen at https://<ui>/#/debug/tracez</td></tr>
<tr><td><code>trace.zipkin.collector</code></td><td>string</td><td><code></code></td><td>the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used.</td></tr>
<tr><td><code>version</code></td><td>version</td><td><code>22.1-40</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
<tr><td><code>version</code></td><td>version</td><td><code>22.1-42</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
</tbody>
</table>
2 changes: 2 additions & 0 deletions docs/generated/sql/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -3023,6 +3023,8 @@ SELECT * FROM crdb_internal.check_consistency(true, ‘\x02’, ‘\x04’)</p>
</span></td><td>Volatile</td></tr>
<tr><td><a name="crdb_internal.create_session_revival_token"></a><code>crdb_internal.create_session_revival_token() &rarr; <a href="bytes.html">bytes</a></code></td><td><span class="funcdesc"><p>Generate a token that can be used to create a new session for the current user.</p>
</span></td><td>Volatile</td></tr>
<tr><td><a name="crdb_internal.create_sql_schema_telemetry_job"></a><code>crdb_internal.create_sql_schema_telemetry_job() &rarr; <a href="int.html">int</a></code></td><td><span class="funcdesc"><p>This function is used to create a schema telemetry job instance.</p>
</span></td><td>Volatile</td></tr>
<tr><td><a name="crdb_internal.decode_cluster_setting"></a><code>crdb_internal.decode_cluster_setting(setting: <a href="string.html">string</a>, value: <a href="string.html">string</a>) &rarr; <a href="string.html">string</a></code></td><td><span class="funcdesc"><p>Decodes the given encoded value for a cluster setting.</p>
</span></td><td>Immutable</td></tr>
<tr><td><a name="crdb_internal.deserialize_session"></a><code>crdb_internal.deserialize_session(session: <a href="bytes.html">bytes</a>) &rarr; <a href="bool.html">bool</a></code></td><td><span class="funcdesc"><p>This function deserializes the serialized variables into the current session.</p>
Expand Down
6 changes: 6 additions & 0 deletions pkg/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,7 @@ ALL_TESTS = [
"//pkg/sql/catalog/resolver:resolver_test",
"//pkg/sql/catalog/schemadesc:schemadesc_test",
"//pkg/sql/catalog/schemaexpr:schemaexpr_test",
"//pkg/sql/catalog/schematelemetry:schematelemetry_test",
"//pkg/sql/catalog/seqexpr:seqexpr_disallowed_imports_test",
"//pkg/sql/catalog/seqexpr:seqexpr_test",
"//pkg/sql/catalog/systemschema_test:systemschema_test_test",
Expand Down Expand Up @@ -1314,6 +1315,9 @@ GO_TARGETS = [
"//pkg/sql/catalog/schemadesc:schemadesc_test",
"//pkg/sql/catalog/schemaexpr:schemaexpr",
"//pkg/sql/catalog/schemaexpr:schemaexpr_test",
"//pkg/sql/catalog/schematelemetry/schematelemetrycontroller:schematelemetrycontroller",
"//pkg/sql/catalog/schematelemetry:schematelemetry",
"//pkg/sql/catalog/schematelemetry:schematelemetry_test",
"//pkg/sql/catalog/seqexpr:seqexpr",
"//pkg/sql/catalog/seqexpr:seqexpr_test",
"//pkg/sql/catalog/systemschema:systemschema",
Expand Down Expand Up @@ -2466,6 +2470,8 @@ GET_X_DATA_TARGETS = [
"//pkg/sql/catalog/rewrite:get_x_data",
"//pkg/sql/catalog/schemadesc:get_x_data",
"//pkg/sql/catalog/schemaexpr:get_x_data",
"//pkg/sql/catalog/schematelemetry:get_x_data",
"//pkg/sql/catalog/schematelemetry/schematelemetrycontroller:get_x_data",
"//pkg/sql/catalog/seqexpr:get_x_data",
"//pkg/sql/catalog/systemschema:get_x_data",
"//pkg/sql/catalog/systemschema_test:get_x_data",
Expand Down
1 change: 1 addition & 0 deletions pkg/base/testing_knobs.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ type TestingKnobs struct {
JobsTestingKnobs ModuleTestingKnobs
BackupRestore ModuleTestingKnobs
TTL ModuleTestingKnobs
SchemaTelemetry ModuleTestingKnobs
Streaming ModuleTestingKnobs
UpgradeManager ModuleTestingKnobs
IndexUsageStatsKnobs ModuleTestingKnobs
Expand Down
7 changes: 7 additions & 0 deletions pkg/clusterversion/cockroach_versions.go
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,9 @@ const (
UsersHaveIDs
// SetUserIDNotNull sets the user_id column in system.users to not null.
SetUserIDNotNull
// SQLSchemaTelemetryScheduledJobs adds an automatic schedule for SQL schema
// telemetry logging jobs.
SQLSchemaTelemetryScheduledJobs

// *************************************************
// Step (1): Add new versions here.
Expand Down Expand Up @@ -590,6 +593,10 @@ var versionsSingleton = keyedVersions{
Key: SetUserIDNotNull,
Version: roachpb.Version{Major: 22, Minor: 1, Internal: 40},
},
{
Key: SQLSchemaTelemetryScheduledJobs,
Version: roachpb.Version{Major: 22, Minor: 1, Internal: 42},
},
// *************************************************
// Step (2): Add new versions here.
// Do not add new versions to a patch release.
Expand Down
5 changes: 3 additions & 2 deletions pkg/clusterversion/key_string.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions pkg/gen/protobuf.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ PROTOBUF_SRCS = [
"//pkg/settings:settings_go_proto",
"//pkg/sql/catalog/catpb:catpb_go_proto",
"//pkg/sql/catalog/descpb:descpb_go_proto",
"//pkg/sql/catalog/schematelemetry/schematelemetrycontroller:schematelemetrycontroller_go_proto",
"//pkg/sql/contentionpb:contentionpb_go_proto",
"//pkg/sql/execinfrapb:execinfrapb_go_proto",
"//pkg/sql/inverted:inverted_go_proto",
Expand Down
22 changes: 12 additions & 10 deletions pkg/jobs/jobspb/jobs.proto
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,6 @@ message StreamIngestionDetails {

// Stream of tenant data will be ingested as a new tenant with 'new_tenant_id'.
roachpb.TenantID new_tenant_id = 7 [(gogoproto.customname) = "NewTenantID", (gogoproto.nullable) = false];
// NEXT ID: 8.
}

message StreamIngestionCheckpoint {
Expand Down Expand Up @@ -261,8 +260,6 @@ message BackupDetails {
// timestamp and the timestamp resolved by the AS OF SYSTEM TIME expression.
// The interval is expressed in nanoseconds.
int64 as_of_interval = 22;

// NEXT ID: 23;
}

message BackupProgress {
Expand Down Expand Up @@ -384,8 +381,6 @@ message RestoreDetails {
// RestoreValidation determines whether to skip certain parts of the restore
// job if its only purpose is to validate the user's restore command.
RestoreValidation validation = 24;

// NEXT ID: 25.
}

enum RestoreValidation {
Expand Down Expand Up @@ -754,8 +749,6 @@ message SchemaChangeDetails {
// WriteTimestamp is the timestamp at which a backfill may want to write, e.g.
// a time that has been identified via a scan as safe for writing.
util.hlc.Timestamp write_timestamp = 10 [(gogoproto.nullable) = false];

// NEXT ID: 11.
}

message SchemaChangeProgress {
Expand Down Expand Up @@ -858,7 +851,6 @@ message ChangefeedDetails {
string select = 10;
reserved 1, 2, 5;
reserved "targets";
// NEXT ID: 11
}

message ResolvedSpan {
Expand Down Expand Up @@ -998,6 +990,12 @@ message RowLevelTTLProgress {
int64 row_count = 1;
}

message SchemaTelemetryDetails {
}

message SchemaTelemetryProgress {
}

message Payload {
string description = 1;
// If empty, the description is assumed to be the statement.
Expand Down Expand Up @@ -1045,6 +1043,10 @@ message Payload {
AutoSQLStatsCompactionDetails autoSQLStatsCompaction = 30;
StreamReplicationDetails streamReplication = 33;
RowLevelTTLDetails row_level_ttl = 34 [(gogoproto.customname)="RowLevelTTL"];
// SchemaTelemetry jobs collect a snapshot of the cluster's SQL schema
// and publish it to the telemetry event log. These jobs are typically
// created by a built-in schedule named "sql-schema-telemetry".
SchemaTelemetryDetails schema_telemetry = 37;
}
reserved 26;
// PauseReason is used to describe the reason that the job is currently paused
Expand All @@ -1067,8 +1069,6 @@ message Payload {
// cluster version, in case a job resuming later needs to use this information
// to migrate or update the job.
roachpb.Version creation_cluster_version = 36 [(gogoproto.nullable) = false];

// NEXT ID: 37.
}

message Progress {
Expand All @@ -1095,6 +1095,7 @@ message Progress {
AutoSQLStatsCompactionProgress autoSQLStatsCompaction = 23;
StreamReplicationProgress streamReplication = 24;
RowLevelTTLProgress row_level_ttl = 25 [(gogoproto.customname)="RowLevelTTL"];
SchemaTelemetryProgress schema_telemetry = 26;
}

uint64 trace_id = 21 [(gogoproto.nullable) = false, (gogoproto.customname) = "TraceID", (gogoproto.customtype) = "github.com/cockroachdb/cockroach/pkg/util/tracing/tracingpb.TraceID"];
Expand Down Expand Up @@ -1123,6 +1124,7 @@ enum Type {
AUTO_SQL_STATS_COMPACTION = 14 [(gogoproto.enumvalue_customname) = "TypeAutoSQLStatsCompaction"];
STREAM_REPLICATION = 15 [(gogoproto.enumvalue_customname) = "TypeStreamReplication"];
ROW_LEVEL_TTL = 16 [(gogoproto.enumvalue_customname) = "TypeRowLevelTTL"];
AUTO_SCHEMA_TELEMETRY = 17 [(gogoproto.enumvalue_customname) = "TypeAutoSchemaTelemetry"];
}

message Job {
Expand Down
Loading

0 comments on commit 3892138

Please sign in to comment.