server: allow users to run conformance reports for their schemas #100004
Labels
A-cluster-observability
Related to cluster observability
A-multitenancy
Related to multi-tenancy
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-observability
Is your feature request related to a problem? Please describe.
Background
Users can configure various properties, such as quorum size, number of non-voters, data placement etc. on schema objects. They can either do so directly, via zone configurations, or indirectly using multi-region abstractions (which are internally translated to zone configurations). However, the effects of such changes are asynchronous.
The zone configurations table lives in a tenant's keyspace. At a high level, once a zone configuration is committed, it must be converted to a
SpanConfig
and reconciled to KV (where it lives in thesystem.span_configurations
table). Doing so entails hydrating the zone configuration (by walking up its inheritance chain) to convert it to aSpanConfig
. ThisSpanConfig
is then linked with the keyspan associated with the schema object, and persisted in KV using an RPC.All KV nodes maintain an in-memory, incremental view over
system.span_configurations
. Once a KV node receives aSpanConfig
update, ranges that overlap with the update's keyspans are pushed through various queues (e.g. Split, Merge, Replicate). It's these queues that are responsible for taking action to fulfill what user intention.SpanConfigBounds
Until very recently, the async application of user specified configurations was only a matter of time. This changed with the introduction of
SpanConfigBounds
. SpanConfigBounds were motivated by a desire to disallow secondary tenants unfettered access to multi-region features (or zone configurations) in deployments where operators desire such control (read: serverless).SpanConfigBounds allow operators the ability to declare bounds on (almost) all
SpanConfig
fields at a per-tenant level. These only work for secondary tenants. Operators can use SpanConfigBounds to override tenant reconciled span configurations by "clamping" any or all fields. For example, operators are able to do things like constrain a tenant to specific region(s) regardless of what the tenant requested. They can also do so retro-actively, after a tenant has successfully committed and reconciled such configurations.Describe the solution you'd like
Arguably, users care more about when their data is in conformance, as opposed to to a promise that it eventually will be. With the introduction of
SpanConfigBounds
, tenants no longer have the latter either. This elevates the need to make point-in-time conformance easily observable.Conveniently, we have a lot of pieces to provide such conformance reports already built. We just need to possibly enhance it, stitch things together, and provide a mechanism to consume such information. This issue asks to do exactly that.
Specifically, users should be able to run conformance reports that gives them information about which table(s)/index(es) are in violation of their zone configurations. There should also be 2 variations -- one that takes
SpanConfigBounds
in account and another that doesn't. This will allow users to discriminate between cases where all they need to do is "just wait" and cases that will never be satisfied.High level sketch
We already have a conformance reporter. However, it doesn't give the caller a point-in-time snapshot -- this may need to change if we want stronger guarantees when stitching the report back with SQL state.
Note: this
Reporter
is not to be confused by the otherReporter
in kvserver/reports/reporter.go. This latter construct is older, deprecated and we don't see too much future into it any more. It should be removed. (#100180)The
Reporter
also doesn't know aboutSpanConfigBounds
yet. We should extend it to return a list ofSpanConfigs
that fail bound checks in its response. This might simply be about giving the reporter a handle to theBoundsReader
to get it access to a tenant'sBounds
and callingCheck()
on it.The tenant (SQL) is the only thing that has access to both:
As such, the tenant would be responsible for taking the contents of a
SpanConfigConformanceReport
(which only associates raw keys to a conformance status) and mapping it back to which tables/indexes are in violation (if any).I'm not sure what the best way to consume such information is -- maybe a new endpoint users can query? Or, better yet, we can build some sort of DB console page using? Alternatively, we could run such a thing periodically and maybe increment some metrics.
cc @ajwerner @knz
Jira issue: CRDB-26176
Epic CRDB-26686
As part of addressing this, we should make sure to delete
FIXMEIDONTKNOWWHICHCODECTOUSE
usage (introduced as part of #48123).The text was updated successfully, but these errors were encountered: