Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job: add schema telemetry scheduled job #84284

Closed
postamar opened this issue Jul 12, 2022 · 0 comments · Fixed by #84761
Closed

job: add schema telemetry scheduled job #84284

postamar opened this issue Jul 12, 2022 · 0 comments · Fixed by #84761
Assignees
Labels
A-schema-catalog Related to the schema descriptors collection and the catalog API in general. A-sql-logging-and-telemetry Issues related to slow query log, SQL audit log, SQL internal logging telemetry, etc. A-telemetry C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@postamar
Copy link
Contributor

postamar commented Jul 12, 2022

A scheduled job is the way to go as far as within what the telemetry logging takes place.
As far as what to log I'm currently leaning towards logging an element decomposition of each descriptor.
As for as how much to log I currently intend to log all of it and we can retroactively add sampling policies and whatnot.

Epic: CRDB-2463

Jira issue: CRDB-17568

@postamar postamar added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 12, 2022
@blathers-crl blathers-crl bot added the T-sql-schema-deprecated Use T-sql-foundations instead label Jul 12, 2022
@postamar postamar added A-telemetry A-sql-logging-and-telemetry Issues related to slow query log, SQL audit log, SQL internal logging telemetry, etc. A-schema-catalog Related to the schema descriptors collection and the catalog API in general. and removed T-sql-schema-deprecated Use T-sql-foundations instead labels Jul 12, 2022
@postamar postamar self-assigned this Jul 12, 2022
@exalate-issue-sync exalate-issue-sync bot added T-sql-schema-deprecated Use T-sql-foundations instead A-schema-catalog Related to the schema descriptors collection and the catalog API in general. A-sql-logging-and-telemetry Issues related to slow query log, SQL audit log, SQL internal logging telemetry, etc. A-telemetry and removed A-telemetry A-sql-logging-and-telemetry Issues related to slow query log, SQL audit log, SQL internal logging telemetry, etc. A-schema-catalog Related to the schema descriptors collection and the catalog API in general. labels Jul 12, 2022
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 21, 2022
This commit introduces the logic which will be used in a later commit
for logging SQL schema telemetry. In the meantime, this commit applies
it in a test which checks that the schema of a bootstrapped cluster
matches expectations.

Informs cockroachdb#84284.

Release note: None
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 21, 2022
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 21, 2022
This commit adds all the boilerplate for automatically scheduling SQL
schema telemetry jobs, by default on a daily basis.

This commit also adds a couple of builtins to create the job schedule
and also to immediately create a schema telemetry logging job.

Fixes cockroachdb#84284.

Release note (sql change): this change ensures the existence of
a new scheduled job for collecting SQL schema telemetry. By default it
runs daily but this can be adjusted via the
sql.schema.telemetry.recurrence cluster setting. The schedule can also
be paused via PAUSE SCHEDULE followed by its ID, which can be retrieved
by SELECT * FROM [SHOW SCHEDULES] WHERE label = 'sql-schema-telemetry'.
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 25, 2022
This commit introduces the logic which will be used in a later commit
for logging SQL schema telemetry. In the meantime, this commit applies
it in a test which checks that the schema of a bootstrapped cluster
matches expectations.

Informs cockroachdb#84284.

Release note: None
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 25, 2022
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 26, 2022
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 26, 2022
This commit adds all the boilerplate for automatically scheduling SQL
schema telemetry jobs, by default on a daily basis.

This commit also adds a couple of builtins to create the job schedule
and also to immediately create a schema telemetry logging job.

Fixes cockroachdb#84284.

Release note (sql change): this change ensures the existence of
a new scheduled job for collecting SQL schema telemetry. By default it
runs daily but this can be adjusted via the
sql.schema.telemetry.recurrence cluster setting. The schedule can also
be paused via PAUSE SCHEDULE followed by its ID, which can be retrieved
by SELECT * FROM [SHOW SCHEDULES] WHERE label = 'sql-schema-telemetry'.
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 26, 2022
This commit introduces the event logic for logging SQL schema telemetry.
This is tested on the schema of a bootstrapped test cluster, the test
checks that the schema events match expectations.

Care is taken to redact all strings present in descriptors which might
unintentionally be leaking PIIs.

Informs cockroachdb#84284.

Release note: None
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 26, 2022
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 26, 2022
This commit adds all the boilerplate for automatically scheduling SQL
schema telemetry jobs, by default on a daily basis.

This commit also adds a couple of builtins to create the job schedule
and also to immediately create a schema telemetry logging job.

Fixes cockroachdb#84284.

Release note (sql change): this change ensures the existence of
a new scheduled job for collecting SQL schema telemetry. By default it
runs daily but this can be adjusted via the
sql.schema.telemetry.recurrence cluster setting. The schedule can also
be paused via PAUSE SCHEDULE followed by its ID, which can be retrieved
by SELECT * FROM [SHOW SCHEDULES] WHERE label = 'sql-schema-telemetry'.
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 27, 2022
This commit introduces the event logic for logging SQL schema telemetry.
This is tested on the schema of a bootstrapped test cluster, the test
checks that the schema events match expectations.

Care is taken to redact all strings present in descriptors which might
unintentionally be leaking PIIs.

Informs cockroachdb#84284.

Release note (general change): Added a new 'schema' telemetry log event
for the purpose of logging a snapshot of the cluster's SQL schema, done
on a daily basis by default.
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 27, 2022
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 27, 2022
This commit adds all the boilerplate for automatically scheduling SQL
schema telemetry jobs, by default on a daily basis.

This commit also adds a couple of builtins to create the job schedule
and also to immediately create a schema telemetry logging job.

Fixes cockroachdb#84284.

Release note (sql change): this change ensures the existence of
a new scheduled job for collecting SQL schema telemetry. By default it
runs daily but this can be adjusted via the
sql.schema.telemetry.recurrence cluster setting. The schedule can also
be paused via PAUSE SCHEDULE followed by its ID, which can be retrieved
by SELECT * FROM [SHOW SCHEDULES] WHERE label = 'sql-schema-telemetry'.
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 27, 2022
This commit introduces the event logic for logging SQL schema telemetry.
This is tested on the schema of a bootstrapped test cluster, the test
checks that the schema events match expectations.

Care is taken to redact all strings present in descriptors which might
unintentionally be leaking PIIs.

Informs cockroachdb#84284.

Release note (general change): Added a new 'schema' telemetry log event
for the purpose of logging a snapshot of the cluster's SQL schema, done
on a daily basis by default.
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 28, 2022
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 28, 2022
This commit adds all the boilerplate for automatically scheduling SQL
schema telemetry jobs, by default on a daily basis.

This commit also adds a couple of builtins to create the job schedule
and also to immediately create a schema telemetry logging job.

Fixes cockroachdb#84284.

Release note (sql change): this change ensures the existence of
a new scheduled job for collecting SQL schema telemetry. By default it
runs daily but this can be adjusted via the
sql.schema.telemetry.recurrence cluster setting. The schedule can also
be paused via PAUSE SCHEDULE followed by its ID, which can be retrieved
by SELECT * FROM [SHOW SCHEDULES] WHERE label = 'sql-schema-telemetry'.
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 28, 2022
This commit introduces the event logic for logging SQL schema telemetry.
This is tested on the schema of a bootstrapped test cluster, the test
checks that the schema events match expectations.

Care is taken to redact all strings present in descriptors which might
unintentionally be leaking PIIs.

Informs cockroachdb#84284.

Release note (general change): CRDB will now collect schema info if
phoning home is enabled. This schema info is added to the telemetry log
on a daily basis by default.
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 28, 2022
This commit adds all the boilerplate for automatically scheduling SQL
schema telemetry jobs, by default on a daily basis.

This commit also adds a couple of builtins to create the job schedule
and also to immediately create a schema telemetry logging job.

Fixes cockroachdb#84284.

Release note (sql change): this change ensures the existence of
a new scheduled job for collecting SQL schema telemetry. By default it
runs daily but this can be adjusted via the
sql.schema.telemetry.recurrence cluster setting. The schedule can also
be paused via PAUSE SCHEDULE followed by its ID, which can be retrieved
by SELECT * FROM [SHOW SCHEDULES] WHERE label = 'sql-schema-telemetry'.
postamar pushed a commit to postamar/cockroach that referenced this issue Jul 28, 2022
This commit introduces the event logic for logging SQL schema telemetry.
This is tested on the schema of a bootstrapped test cluster, the test
checks that the schema events match expectations.

Care is taken to redact all strings present in descriptors which might
unintentionally be leaking PIIs.

Informs cockroachdb#84284.

Release note (general change): CRDB will now collect schema info if
phoning home is enabled. This schema info is added to the telemetry log
on a daily basis by default.
postamar pushed a commit to postamar/cockroach that referenced this issue Aug 4, 2022
This commit adds:
  - the event definitions and logic for generating them,
  - the scheduling and jobs boilerplate to periodically log them.

Care is taken to redact all strings present in descriptors which might
unintentionally be leaking PIIs.

The event generation logic is tested on the schema of a bootstrapped
test cluster: the test checks that the events match expectations.

Fixes cockroachdb#84284.

Release note (general change): CRDB will now collect schema info if
phoning home is enabled. This schema info is added to the telemetry log
by a built-in scheduled job which runs on a weekly basis by default.
This recurrence can be changed via the sql.schema.telemetry.recurrence
cluster setting.  The schedule can also be paused via PAUSE SCHEDULE
followed by its ID, which can be retrieved by querying
SELECT * FROM [SHOW SCHEDULES] WHERE label = 'sql-schema-telemetry'.
postamar pushed a commit to postamar/cockroach that referenced this issue Aug 4, 2022
This commit adds:
  - the event definitions and logic for generating them,
  - the scheduling and jobs boilerplate to periodically log them.

Care is taken to redact all strings present in descriptors which might
unintentionally be leaking PIIs.

The event generation logic is tested on the schema of a bootstrapped
test cluster: the test checks that the events match expectations.

Fixes cockroachdb#84284.

Release note (general change): CRDB will now collect schema info if
phoning home is enabled. This schema info is added to the telemetry log
by a built-in scheduled job which runs on a weekly basis by default.
This recurrence can be changed via the sql.schema.telemetry.recurrence
cluster setting.  The schedule can also be paused via PAUSE SCHEDULE
followed by its ID, which can be retrieved by querying
SELECT * FROM [SHOW SCHEDULES] WHERE label = 'sql-schema-telemetry'.
postamar pushed a commit to postamar/cockroach that referenced this issue Aug 8, 2022
This commit adds:
  - the event definitions and logic for generating them,
  - the scheduling and jobs boilerplate to periodically log them.

Care is taken to redact all strings present in descriptors which might
unintentionally be leaking PIIs.

The event generation logic is tested on the schema of a bootstrapped
test cluster: the test checks that the events match expectations.

Fixes cockroachdb#84284.

Release note (general change): CRDB will now collect schema info if
phoning home is enabled. This schema info is added to the telemetry log
by a built-in scheduled job which runs on a weekly basis by default.
This recurrence can be changed via the sql.schema.telemetry.recurrence
cluster setting.  The schedule can also be paused via PAUSE SCHEDULE
followed by its ID, which can be retrieved by querying
SELECT * FROM [SHOW SCHEDULES] WHERE label = 'sql-schema-telemetry'.
postamar pushed a commit to postamar/cockroach that referenced this issue Aug 8, 2022
This commit adds:
  - the event definitions and logic for generating them,
  - the scheduling and jobs boilerplate to periodically log them.

Care is taken to redact all strings present in descriptors which might
unintentionally be leaking PIIs.

The event generation logic is tested on the schema of a bootstrapped
test cluster: the test checks that the events match expectations.

Fixes cockroachdb#84284.

Release note (general change): CRDB will now collect schema info if
phoning home is enabled. This schema info is added to the telemetry log
by a built-in scheduled job which runs on a weekly basis by default.
This recurrence can be changed via the sql.schema.telemetry.recurrence
cluster setting.  The schedule can also be paused via PAUSE SCHEDULE
followed by its ID, which can be retrieved by querying
SELECT * FROM [SHOW SCHEDULES] WHERE label = 'sql-schema-telemetry'.
craig bot pushed a commit that referenced this issue Aug 9, 2022
84761: schematelemetry,eventpb: add schema telemetry r=postamar a=postamar

This commit adds:
  - the event definitions and logic for generating them,
  - the scheduling and jobs boilerplate to periodically log them.

Care is taken to redact all strings present in descriptors which might
unintentionally be leaking PIIs.

The event generation logic is tested on the schema of a bootstrapped
test cluster: the test checks that the events match expectations.

Fixes #84284.

Release note (general change): CRDB will now collect schema info if
phoning home is enabled. This schema info is added to the telemetry log
by a built-in scheduled job which runs on a weekly basis by default.
This recurrence can be changed via the sql.schema.telemetry.recurrence
cluster setting.  The schedule can also be paused via PAUSE SCHEDULE
followed by its ID, which can be retrieved by querying
SELECT * FROM [SHOW SCHEDULES] WHERE label = 'sql-schema-telemetry'.

85059: admission,kvserver: improved byte token estimation for writes r=irfansharif,tbg a=sumeerbhola

The existing scheme for byte token estimation simply looked
at the total bytes added to L0 and divided it among the number
of requests. This was because (a) the parameters to provide
better size information for the request were not populated by
kvserver, (b) the basic estimation approach was flawed since
it assumed that regular writes would be roughly equal sized,
and assumed that ingests would tell what fraction went into L0.

The kvserver-side plumbing for improving (a) were done in a
preceding PR (#83937). This one completes that plumbing to pass on
admission.StoreWorkDoneInfo to the admission control package.
In this scheme the {WriteBytes,IngestedBytes} are provided
post-proposal evaluation, and the IngestedBytes is for the
whole LSM. This PR makes changes to the plumbing in the
admission package: specifically, the post-work-done token
adjustments are performed via the granterWithStoreWriteDone
interface and the addition to granterWithIOTokens. The former
also returns the token adjustments to StoreWorkQueue so
that the per-tenant fairness accounting in WorkQueue can be
updated.

The main changes in this PR are in the byte token
estimation logic in the admission package, where the
estimation now uses a linear model y=a.x + b, where
x is the bytes provided in admission.StoreWorkDoneInfo,
and y is the bytes added to L0 via write or ingestion.
If we consider regular writes, one can expect that even
with many different sized workloads concurrently being
active on a node, we should be able to fit a model where
a is roughly 2 and b is tiny -- this is because x is the
bytes written to the raft log and does not include the
subsequent state machine application. Similarly, one can
expect the a term being in the interval [0,1] for ingested
work. The linear model is meant to fix flaw (b) mentioned
earlier. The current linear model fitting in
store_token_estimation.go is very simple and can be
independently improved in the future -- there are code
comments outlining this. Additionally, all the byte token
estimation logic in granter.go has been removed, which
is better from a code readability perspective.

This change was evaluated with a single node that first
saw a kv0 workload that writes 64KB blocks, then
additionally a kv0 workload that writes 4KB blocks, and
finally a third workload that starts doing an index
backfill due to creating an index on the v column in
the kv table.

Here are snippets from a sequence of log statements when
only the first workload (64KB writes) was running:
```
write-model 1.46x+1 B (smoothed 1.50x+1 B) + ingested-model 0.00x+0 B (smoothed 0.75x+1 B) + at-admission-tokens 78 KiB
write-model 1.37x+1 B (smoothed 1.36x+1 B) + ingested-model 0.00x+0 B (smoothed 0.75x+1 B) + at-admission-tokens 80 KiB
write-model 1.50x+1 B (smoothed 1.43x+1 B) + ingested-model 0.00x+0 B (smoothed 0.75x+1 B) + at-admission-tokens 79 KiB
write-model 1.39x+1 B (smoothed 1.30x+1 B) + ingested-model 0.00x+0 B (smoothed 0.75x+1 B) + at-admission-tokens 77 KiB
```
Note that the parameter a, in a.x does fluctuate. The
additive value b stays at the minimum of 1 bytes, which
is desirable. There is no change to the starting ingest
model since there are no ingestions.

After both the 4KB and 64KB writes are active the log
statements look like:
```
write-model 1.85x+1 B (smoothed 1.78x+1 B) + ingested-model 0.00x+0 B (smoothed 0.75x+1 B) + at-admission-tokens 59 KiB
write-model 1.23x+1 B (smoothed 1.51x+1 B) + ingested-model 0.00x+0 B (smoothed 0.75x+1 B) + at-admission-tokens 47 KiB
write-model 1.21x+1 B (smoothed 1.36x+1 B) + ingested-model 0.00x+0 B (smoothed 0.75x+1 B) + at-admission-tokens 40 KiB
```
Note that the b value stays at 1 byte. The tokens consumed
at admission time are evenly divided among requests, so
the value has dropped.

When the index backfill is also running, the sstables are
ingested into L5 and L6, so the x value in the ingested
model is high, but what is ingested into L0 is low, which
means a becomes very small for the ingested-model -- see
the smoothed 0.00x+1 B below. There is choppiness in this
experiment wrt the write model and the at-admission-tokens,
which is caused by a high number of write stalls. This
was not planned for, and is a side-effect of huge Pebble
manifests caused by 64KB keys. So ignore those values in
the following log statements.
```
write-model 1.93x+1 B (smoothed 1.56x+2 B) + ingested-model 0.00x+1 B (smoothed 0.00x+1 B) + at-admission-tokens 120 KiB
write-model 2.34x+1 B (smoothed 1.95x+1 B) + ingested-model 0.00x+1 B (smoothed 0.00x+1 B) + at-admission-tokens 157 KiB
```

Fixes #79092
Informs #82536

Release note: None

Co-authored-by: Marius Posta <marius@cockroachlabs.com>
Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
@craig craig bot closed this as completed in 262a70d Aug 9, 2022
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-catalog Related to the schema descriptors collection and the catalog API in general. A-sql-logging-and-telemetry Issues related to slow query log, SQL audit log, SQL internal logging telemetry, etc. A-telemetry C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant