server: add poller for tenant cluster settings #57926

dt · 2020-12-15T00:12:32Z

System tenants can maintain their in-memory cluster settings caches by
listening for the gossip updates on their system.settings table and then
updating the in-memory cache directly off the decoded gossip events.

Tenants however do not have changes to their system tables gossiped
and thus have to use either polling or rangefeeds if they want to
notice changes therein.

This change adds a simple poller to re-read the system.settings
table to update the in-memory settings which is started for tenant
SQL servers that do not start the old gossip updater.

This change also removes the limits in the SET CLUSTER SETTING
code path that prevented tenants writing to their settings table
before those changes would have had any effect.

Release note: none.

cockroach-teamcity · 2020-12-15T00:12:40Z

This change is

dt · 2020-12-15T00:13:35Z

just wanted to see how hard this would be and let CI show me what I might have missed

dt · 2020-12-15T22:02:22Z

Okay, based on some slack chat, I've added a couple patches to this PR. Some settings are only read by the kv and storage layers e.g. to control storage behaviors like rebalancing, so they'd have no effect if set within a guest tenant. While setting them there may be harmless, that may still confuse a user who sets it but then does not see any effect.

This PR now adds a bit, on the setting itself, to indicate if a setting is a system-only or one that is applicable to both system and guest tenant behavior. Attempts to set settings that are system-only using SET CLUSTER SETTING are rejected.

Additionally, the list-of-settings generation (settings.html) is extended to generate the list of settings that are not system-only, i.e. those settings that tenants can set. This generated listing is checked in both for reference like the current listing, but, as with that listing, is there so that changes which introduce tenant-applicable settings will have those highlighted in the diff for ease of review.

dt · 2020-12-15T22:04:12Z

Oh, and just to act as an example and to exercise in a test, the system-only rebalancing rate is marked as such herein. I have a local stash that marks all the kvserver settings too, but figured I'd hold off to keep the diff here manageable for review.

tbg

This all looks good, my main sticking point is around the use of a range feed for the settings polling (this should work, right @nvanbenschoten?), and switching to an allowlist approach for the tenant-accessible settings. Thanks for working on this! It'll be nice to have this settings story ironed out and it looks like you're most of the way there.

Reviewed 5 of 5 files at r1, 11 of 11 files at r2, 4 of 4 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @dt, and @irfansharif)

docs/generated/settings/settings-for-tenants.txt, line 1 at r3 (raw file):

Setting	Type	Default	Description

Just glancing at this list most of those seem system-only.
Was there a particular reason why new settings are by default available to tenants? I would've expected the opposite approach, where settings are by default system-only and they need to explicitly be allowlisted for tenants, and looking at this list consider this the better approach still.

pkg/server/settingsworker.go, line 159 at r1 (raw file):

	stopper.RunWorker(ctx, func(ctx context.Context) {
		fn := func() {
			rows, err := s.internalExecutor.Query(ctx, "settings-refresh", nil, `SELECT name, value, "valueType" FROM system.settings`)

Can we use a rangefeed here? It seems expensive to poll every 15s from every tenant, and besides this will result in traffic/charges for the tenant that may be confusing.

pkg/server/settingsworker.go, line 180 at r1 (raw file):

		tick := time.NewTicker(time.Second * 15)
		log.Infof(ctx, "stating polling for settings changes...")

starting

pkg/sql/set_cluster_setting.go, line 240 at r1 (raw file):

			if params.p.execCfg.TenantTestingKnobs != nil {
				return params.p.execCfg.TenantTestingKnobs.ClusterSettingsUpdater.Set(n.name, encoded, n.setting.Typ())

I don't think this should be a return now, but just an interceptor.

pkg/sql/set_cluster_setting.go, line 305 at r1 (raw file):

	wait := 10 * time.Second
	if !execCfg.Codec.ForSystemTenant() {
		wait = 20 * time.Second // poller may be slower than gossip.

Wouldn't be an issue with rangefeed.

dt

Inline reply below but agree 100% on range feeds, just wanted to start with simple poller to get mvp working, and then someone with some rangefeed expertise could swap that bit out as a follow-up optimization.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @irfansharif, and @tbg)

docs/generated/settings/settings-for-tenants.txt, line 1 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Just glancing at this list most of those seem system-only.
Was there a particular reason why new settings are by default available to tenants? I would've expected the opposite approach, where settings are by default system-only and they need to explicitly be allowlisted for tenants, and looking at this list consider this the better approach still.

This was just a starting point, I have another diff locally that switches most of the kvserver ones to system-only that takes some of the ones here.

As far as the default, I went back and forth between .SystemOnly and .SystemOrGuestTenant. But ultimately, think opting system settings out makes more sense. I think our goal is that ~everyone should be able to run as their workload as a tenant.

We have settings-backed knobs throughout SQL and KV layers so that if/when someone's workload hits some limit or constant in a way we didn't foresee when testing and tuning, we have a quick, no-downtime option to re-tune it. Given their usefulness in resolving or mitigating production issues, I'd be wary of defaulting to defaulting to not having them -- inevitably if someone forgets to opt a setting in, we'd regret that at just the wrong time. On the other hand, if someone forgets to opt a system-only behavior control out, it may be confusing that setting it has no effect, but should be harmless.

(SET CLUSTER SETTING, and settings in general, shouldn't be where we impose any actual behavior/security controls, because root can write to the settings table, restore it from backups, etc -- if something should not be allowed in a tenant, it should be in an if ...codec.ForSystemTenant() {block).

pkg/server/settingsworker.go, line 159 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Can we use a rangefeed here? It seems expensive to poll every 15s from every tenant, and besides this will result in traffic/charges for the tenant that may be confusing.

We can and we should, and I believe that is the longer-term plan, but I just wanted something quick and easy to get started so we could start working the rest out.

Note that system.settings is only the changed settings, so this should be pulling back very few, if any rows (and at most, if every single setting had been changed, could pull back a hundred or so), so it should be no worse than the job adoption pollers.

pkg/server/settingsworker.go, line 180 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

starting

Done.

pkg/sql/set_cluster_setting.go, line 240 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I don't think this should be a return now, but just an interceptor.

Done.

pkg/sql/set_cluster_setting.go, line 305 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Wouldn't be an issue with rangefeed.

Indeed. Again though, I just wanted correct first, fast second.

System tenants can maintain their in-memory cluster settings caches by listening for the gossip updates on their system.settings table and then updating the in-memory cache directly off the decoded gossip events. Tenants however do not have changes to their system tables gossiped and thus have to use either polling or rangefeeds if they want to notice changes therein. This change adds a simple poller to re-read the system.settings table to update the in-memory settings which is started for tenant SQL servers that do not start the old gossip updater. This change also removes the limits in the SET CLUSTER SETTING code path that prevented tenants writing to their settings table before those changes would have had any effect. Release note: none.

This change adds a flag to indicate a setting is only applicable to the system tenant, such as those that control storage layer behaviors, and thus should not be set by guest tenants (where it would be ignored). Release note: none.

This adds a generated listing of the settings available to tenants, i.e. settings which are not system-only. This generated listing is checked in so that reviewers will notice if a new setting is there that should not be presented to tenants. Release note: none.

nvanbenschoten

my main sticking point is around the use of a range feed for the settings polling (this should work, right @nvanbenschoten?)

Yes, this should work. Rangefeeds are enabled for tenants.

There's also the option to hook this up to the tenant's existing gossip subscription, though that would only work if we put these settings in gossip, which we probably don't want to do.

switching to an allowlist approach for the tenant-accessible settings

+1

Thanks for working on this! It'll be nice to have this settings story ironed out and it looks like you're most of the way there.

Also +1. This is a nice improvement that will come with large benefits.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @irfansharif, and @tbg)

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @irfansharif, and @tbg)

docs/generated/settings/settings-for-tenants.txt, line 1 at r3 (raw file):
So there are two dimensions here: the features controlled by settings, and the mechanism to allow/block access to settings.

Regarding the features:

We have settings-backed knobs throughout SQL and KV layers so that if/when someone's workload hits some limit or constant in a way we didn't foresee when testing and tuning, we have a quick, no-downtime option to re-tune it.

The large majority of settings that have an impact on the behavior of tenants are KV system-level and can be set at the system tenant (by a SRE). Customizations of these settings by tenants would be ineffective, and we can just ignore them here (I'd recommend a blanket block for all KV-level settings in tenants to prevent UX surprises, but it's just polish.)

The number of settings that have an impact on SQL pods is much smaller and limited; we have two flavors of them:

those settings which we want to encourage tenants to use and are very well delimited by name (specifically sql.defaults)
those settings which we want to disallow tenants from using because they would be disruptive to CC (everything else)

In that world, I can see a reason why we'd want an operator (e.g. an SRE), from a root account logged in the system tenant, to adjust SQL settings on behalf of a tenant; however I am 100% sure I would prefer the tenants themselves to not have access to these tuning knobs -- they could be putting our business on the line by accident by doing so.

Regarding mechanism:

The above suggests an allow list, with glob or regexp patterns and not just an enumeration so we can have a blanket allow for sql.defaults.*.

There's another argument in favor of allow instead of block, that's mechanism-specific: our eng team does not yet have the discipline nor skill to safely make a call about whether a setting is safe or not for tenants. We want to block in case of doubt, until we grow that skill / understanding.

dt

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @irfansharif, and @tbg)

docs/generated/settings/settings-for-tenants.txt, line 1 at r3 (raw file):

Previously, knz (kena) wrote…

So there are two dimensions here: the features controlled by settings, and the mechanism to allow/block access to settings.

Regarding the features:

We have settings-backed knobs throughout SQL and KV layers so that if/when someone's workload hits some limit or constant in a way we didn't foresee when testing and tuning, we have a quick, no-downtime option to re-tune it.

The large majority of settings that have an impact on the behavior of tenants are KV system-level and can be set at the system tenant (by a SRE). Customizations of these settings by tenants would be ineffective, and we can just ignore them here (I'd recommend a blanket block for all KV-level settings in tenants to prevent UX surprises, but it's just polish.)

The number of settings that have an impact on SQL pods is much smaller and limited; we have two flavors of them:

those settings which we want to encourage tenants to use and are very well delimited by name (specifically sql.defaults)

those settings which we want to disallow tenants from using because they would be disruptive to CC (everything else)

In that world, I can see a reason why we'd want an operator (e.g. an SRE), from a root account logged in the system tenant, to adjust SQL settings on behalf of a tenant; however I am 100% sure I would prefer the tenants themselves to not have access to these tuning knobs -- they could be putting our business on the line by accident by doing so.

Regarding mechanism:

The above suggests an allow list, with glob or regexp patterns and not just an enumeration so we can have a blanket allow for sql.defaults.*.

There's another argument in favor of allow instead of block, that's mechanism-specific: our eng team does not yet have the discipline nor skill to safely make a call about whether a setting is safe or not for tenants. We want to block in case of doubt, until we grow that skill / understanding.

If a setting would be disruptive to our business if set inside a tenant, we shouldn't just be preventing SET CLUSTER SETTING -- we should not even be reading its value, in the code, in the first place.

I don't know that it is as narrow as just sql.defaults.*.

Currently we're aiming for paying CC users to have access to ~all the same functionality of dedicated clusters, e.g. including hooking up CDC to their own infra, or imporing/exporting data to their own servers, cloudstorage.http.custom_ca and cloudstorage.timeout are important. We expect them to run backups but may have differnet ideas of average txn duration, so they may set bulkio.backup.read_with_priority_after or bulkio.backup.read_retry_delay. We regularly set jobs.retention_time to mitigate incidents, or could need to set schemachanger.bulk_index_backfill.batch_size in a pinch if someone's rows are unusually shaped. sql.notices.enabled could be important too if a client doesn't support them. This is from a quick scan -- I'm sure there are others, and new settings are added as escape hatches.

If there is a setting that, if set, could true jeopardize the stability of CC itself, of course that's bad and it shouldn't be settable by a customer. But in many cases I suspect that if we look at the mechanism by which it would actually do so -- e.g. sending too many/large requests, using too much of a resource, etc -- there is probably already another way they could do that without, that setting. In that case, preventing the setting is the wrong place to defend against that -- I'd much rather defend against it at the vulnerable point then let you set the setting, let it try to do the thing, and be rejected there instead.

I think we should be aiming to, as much as possible, make the SQL pod experience as close to the dedicated cluster experience (and aim to replace it) with the only difference being that your persistence layer isn't yours, but is shared. But your execution layer is still yours, so if a dedicated cluster user might need to turn some knob that affects execution, I'd default to assuming that a shared tier user would too. If we go the other way, every time we add something outside sql.defaults like sql.metrics.statement_details.threshold or sql.spatial.experimental_box2d_comparison_operators.enabled or sql.stats.automatic_collection.min_stale_rows, we'll think "good that we have this here, in case we need it" but then, if we realize we need it for some customer, we're stuck if that customer's persistence layer execution is running in shared KV rather than their SQL nodes, even though that setting has nothing to do with persistence at all.

(also, another argument for not using set cluster setting as the point for disallowing settings that could cause truly unsafe behavior: that also implies we'd e.g. make RESTORE start digging deep into the rows in tables that it is restoring, to e.g. not restore some settings in some cases. Also users with admin can write to the settings table. All this points me to the fact that if something is truly unsafe, the code shouldn't even consider doing it, regardless of what the settings say).

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @irfansharif, and @tbg)

docs/generated/settings/settings-for-tenants.txt, line 1 at r3 (raw file):

Previously, dt (David Taylor) wrote…

If a setting would be disruptive to our business if set inside a tenant, we shouldn't just be preventing SET CLUSTER SETTING -- we should not even be reading its value, in the code, in the first place.

I don't know that it is as narrow as just sql.defaults.*.

Currently we're aiming for paying CC users to have access to ~all the same functionality of dedicated clusters, e.g. including hooking up CDC to their own infra, or imporing/exporting data to their own servers, cloudstorage.http.custom_ca and cloudstorage.timeout are important. We expect them to run backups but may have differnet ideas of average txn duration, so they may set bulkio.backup.read_with_priority_after or bulkio.backup.read_retry_delay. We regularly set jobs.retention_time to mitigate incidents, or could need to set schemachanger.bulk_index_backfill.batch_size in a pinch if someone's rows are unusually shaped. sql.notices.enabled could be important too if a client doesn't support them. This is from a quick scan -- I'm sure there are others, and new settings are added as escape hatches.

If there is a setting that, if set, could true jeopardize the stability of CC itself, of course that's bad and it shouldn't be settable by a customer. But in many cases I suspect that if we look at the mechanism by which it would actually do so -- e.g. sending too many/large requests, using too much of a resource, etc -- there is probably already another way they could do that without, that setting. In that case, preventing the setting is the wrong place to defend against that -- I'd much rather defend against it at the vulnerable point then let you set the setting, let it try to do the thing, and be rejected there instead.

I think we should be aiming to, as much as possible, make the SQL pod experience as close to the dedicated cluster experience (and aim to replace it) with the only difference being that your persistence layer isn't yours, but is shared. But your execution layer is still yours, so if a dedicated cluster user might need to turn some knob that affects execution, I'd default to assuming that a shared tier user would too. If we go the other way, every time we add something outside sql.defaults like sql.metrics.statement_details.threshold or sql.spatial.experimental_box2d_comparison_operators.enabled or sql.stats.automatic_collection.min_stale_rows, we'll think "good that we have this here, in case we need it" but then, if we realize we need it for some customer, we're stuck if that customer's persistence layer execution is running in shared KV rather than their SQL nodes, even though that setting has nothing to do with persistence at all.

(also, another argument for not using set cluster setting as the point for disallowing settings that could cause truly unsafe behavior: that also implies we'd e.g. make RESTORE start digging deep into the rows in tables that it is restoring, to e.g. not restore some settings in some cases. Also users with admin can write to the settings table. All this points me to the fact that if something is truly unsafe, the code shouldn't even consider doing it, regardless of what the settings say).

David, the truth in this matter is that our design of cluster settings so far was not principled in this way, and we have too much confusion and fuzzy boundaries to enable "making the SQL pod experience as close to the dedicated cluster experience".

In fact, I'd argue that our cluster setting infra is not even ready for the CC dedicated cluster experience. Too many knobs are currently so that we can't enable CC users to tune then without violating our CC SLAs or even our profitability.

Let's take just two of your examples:

sql.metrics.statement_details.threshold - we can't let users change that, because our own monitoring and TSE processes mandate specific values for this and our customizations would run afoul of user wishes.
sql.spatial.experimental_box2d_comparison_operators.enabled - we can't let users change that without supervision, because that may introduce cluster-wide instability

Generally, we need to do the following 4 things to even start exposing cluster settings safely to CC users (even dedicated!)

annotate a separation between "system" and "application" settings; initially, all settings would be "system" unless specifically opted in to become "application"
design a set of requirements for what it means for a setting to be "application" level; in these requirements, we need at least the following:
- a privilege model (i.e. not just "admin" vs "non-admin")
- an equivalent setting in the "system" space so that a SRE can silently override the setting for troubleshooting, policy enforcement etc, without runing afoul of customer scripts / introspection etc.
- an argument in the setting's code (comments, algorithm design etc) that demonstrates that the effect of that setting is "local" and cannot affect multiple tenants
- an argument that demonstrates that a user changing the setting does not run afoul of our CC infrastructure needs
audit all the current settings to identify those that already match these requirements to become "application" (e.g. the sql.defaults settings do, and only to the extent that we've never needed to override them for monitoring or troubleshooting; cloudstorage.http.custom_ca and bulkio.backup.read_retry_delay definitely do not);
develop a product design process (involving PMs) to identify what knobs we have today whose design run afoul of our reuiqrements but for which users/customers have expressed a customizability need, so that we can design new knobs that follow the requirements.

Until those steps are realized (especially steps 3 and 4) I don't think it's sane to expose anything to CC users.

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @bdarnell, @irfansharif, @piyush-singh, and @tbg)

docs/generated/settings/settings-for-tenants.txt, line 1 at r3 (raw file):

Previously, knz (kena) wrote…

David, the truth in this matter is that our design of cluster settings so far was not principled in this way, and we have too much confusion and fuzzy boundaries to enable "making the SQL pod experience as close to the dedicated cluster experience".

In fact, I'd argue that our cluster setting infra is not even ready for the CC dedicated cluster experience. Too many knobs are currently so that we can't enable CC users to tune then without violating our CC SLAs or even our profitability.

Let's take just two of your examples:

sql.metrics.statement_details.threshold - we can't let users change that, because our own monitoring and TSE processes mandate specific values for this and our customizations would run afoul of user wishes.

sql.spatial.experimental_box2d_comparison_operators.enabled - we can't let users change that without supervision, because that may introduce cluster-wide instability

Generally, we need to do the following 4 things to even start exposing cluster settings safely to CC users (even dedicated!)

annotate a separation between "system" and "application" settings; initially, all settings would be "system" unless specifically opted in to become "application"

design a set of requirements for what it means for a setting to be "application" level; in these requirements, we need at least the following:

a privilege model (i.e. not just "admin" vs "non-admin")

an equivalent setting in the "system" space so that a SRE can silently override the setting for troubleshooting, policy enforcement etc, without runing afoul of customer scripts / introspection etc.

an argument in the setting's code (comments, algorithm design etc) that demonstrates that the effect of that setting is "local" and cannot affect multiple tenants

an argument that demonstrates that a user changing the setting does not run afoul of our CC infrastructure needs

audit all the current settings to identify those that already match these requirements to become "application" (e.g. the sql.defaults settings do, and only to the extent that we've never needed to override them for monitoring or troubleshooting; cloudstorage.http.custom_ca and bulkio.backup.read_retry_delay definitely do not);

develop a product design process (involving PMs) to identify what knobs we have today whose design run afoul of our reuiqrements but for which users/customers have expressed a customizability need, so that we can design new knobs that follow the requirements.

Until those steps are realized (especially steps 3 and 4) I don't think it's sane to expose anything to CC users.

FWIW, I was the one who introduced the distinction between "public" and "non-public" back in September, upon a similar request from PM as the one that is driving this review thread. I am now realizing in hindsight that this distinction was misguided; the true discussion back then was already how to introduce system-level vs app-level settings. Depending on how folk like @piyush-singh or @bdarnell chime into this discussion, I could volunteer to walk back my changes from back then and work on the new distinction instead.

dt

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @bdarnell, @irfansharif, @piyush-singh, and @tbg)

docs/generated/settings/settings-for-tenants.txt, line 1 at r3 (raw file):

Previously, knz (kena) wrote…

FWIW, I was the one who introduced the distinction between "public" and "non-public" back in September, upon a similar request from PM as the one that is driving this review thread. I am now realizing in hindsight that this distinction was misguided; the true discussion back then was already how to introduce system-level vs app-level settings. Depending on how folk like @piyush-singh or @bdarnell chime into this discussion, I could volunteer to walk back my changes from back then and work on the new distinction instead.

I don't completely buy this assumption that tenants somehow need more guard rails / supervision. Why can't we let them enable sql.spatial.experimental_box2d_comparison_operators.enabled ? Can it destabilize anything outside their SQL execution?

In another thread, it was mentioned that exposing more settings created a "worry about giving tenants ways to shoot themselves in the foot by accident" -- I'd push back on this: again, I don't think we should be treating tenants are less capable users by default, so if a dedicated user is given some setting with which to shoot their foot, a shared tier should be able to as well. Of course, we should aim (no pun intended) to put safeties in place to prevent foot shooting, but those should be for tenants and non-tenants alike.

Similarly, Why is sql.metrics.statement_details.threshold any different for CC Shared vs CC Dedicated?

I think the tenant distinction is the wrong place to take out our anxiety about the fact users might set settings that break things. Can you set a setting that makes your cluster misbehave? yes. You can also send traffic that does that, though. The stability of CC itself should be ensured with robust limits on the pods, on their ability to harm the shared tier, etc but I don't think beyond that, we should be treating users who have their data persisted in the shared tier vs their own nodes as somehow needing different guard rails on their sql behavior.

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @bdarnell, @irfansharif, @piyush-singh, and @tbg)

docs/generated/settings/settings-for-tenants.txt, line 1 at r3 (raw file):
I was clear with my concern above:

In fact, I'd argue that our cluster setting infra is not even ready for the CC dedicated cluster experience.

The proposal I spelled out above is very much needed even for CC dedicated. I don't actually care about multi-tenant that much in this discussion.

However the reason why we can't just ignore this is the following:

with CC dedicated and shared tier, customers have a monetary incentive to behave properly with our infra
with CC free tier, they do not, and we're expecting more folk to try things out mindlessly because they have nothing of value to lose

dt

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @bdarnell, @irfansharif, @piyush-singh, and @tbg)

docs/generated/settings/settings-for-tenants.txt, line 1 at r3 (raw file):

Previously, knz (kena) wrote…

I was clear with my concern above:

In fact, I'd argue that our cluster setting infra is not even ready for the CC dedicated cluster experience.

The proposal I spelled out above is very much needed even for CC dedicated. I don't actually care about multi-tenant that much in this discussion.

However the reason why we can't just ignore this is the following:

with CC dedicated and shared tier, customers have a monetary incentive to behave properly with our infra

with CC free tier, they do not, and we're expecting more folk to try things out mindlessly because they have nothing of value to lose

Ah, I think I'm seeing the distinction for free users and the fact that allowing them to customize their experience may create support load that we're not setup for. That indeed seems like a concern any operator of a multi-tenant deployment might have -- that they simply aren't ready, operationally, to support users who've customized their SQL behavior (either because it breaks their automation, they don't have the support bandwidth, etc).

In that case though, I'd argue that we should start with a flag on the tenant, that says "this tenant cannot set any settings" and blanket prevents settings as well as reading from their settings table. Later, we could extend this to "this tenant can set/read the subset of settings marked as 'basic customization'" whatever we come up with e.g. for defaults. But I think as a user-story, "customizing cluster settings requires a paid cluster" is understandable, whereas "shared users can't enable box2d_comparison, but dedicated can" seems really artificial and goes against the goal to have the shared experience be the default experience.

andy-kimball

I also think an allowlist is preferable, in order to prevent bugs. Usually there's an easy way to add a test that fails when new settings are added without specifying whether to allow or block.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @bdarnell, @irfansharif, @piyush-singh, and @tbg)

piyush-singh

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @bdarnell, and @tbg)

docs/generated/settings/settings-for-tenants.txt, line 1 at r3 (raw file):

In fact, I'd argue that our cluster setting infra is not even ready for the CC dedicated cluster experience.

I think I'm seeing the distinction for free users and the fact that allowing them to customize their experience may create support load that we're not setup for.

Completely agree here. Even with the relatively small number of customers we have for the dedicated tier, we still get quite a few SRE pages/support issues when customers tweak settings they should not be changing without a proper understanding of what they control under the hood. Allowing free tier customers access to these foot guns could result in poor user experience as:

we expect there to be a much larger number of free tier users which could result in a much higher support request volume
these users are not paying customers and therefore our incentive to respond quickly to support issues is lower

In that case though, I'd argue that we should start with a flag on the tenant, that says "this tenant cannot set any settings" and blanket prevents settings as well as reading from their settings table.

I like this solution as it gives customers who would like to self-host a multi tenant cluster the option to let end users configure cluster settings if they wish.

Later, we could extend this to "this tenant can set/read the subset of settings marked as 'basic customization'" whatever we come up with e.g. for defaults.

FWIW, we've already seen customers that self-host start to request similar behavior. Recently, JPMC requested the ability to set granular permissions on which users can control which cluster settings. That was for individual users, but I could see us extend that functionality to apply to the tenant level as well.

design a set of requirements for what it means for a setting to be "application" level; in these requirements, we need at least the following:

Vy has started a doc to categorize these which I think both David and Raphael are aware of. Tommy, John, and I are also contributing, so we should have clarity here over the coming weeks.

bdarnell

Reviewed 2 of 5 files at r1, 10 of 11 files at r5, 4 of 4 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @tbg)

docs/generated/settings/settings-for-tenants.txt, line 1 at r3 (raw file):

Can you set a setting that makes your cluster misbehave? yes. You can also send traffic that does that, though.

Not all misbehavior is overload-related. The ability to set cluster settings lets you cause problems that you might not be able to cause in other ways. For example, enabling OCSP lets you create behind-the-firewall traffic; setting it to strict mode can lead to unavailability (in a way that locks us out from being able to fix it).

It's true that this does not map directly to the shared vs dedicated distinction - many of these settings should be blocked from users in dedicated CC clusters too. But since free tier users are more or less anonymous and untraceable, it's good to place limits here especially on things that could lead to external abuse.

And one place where the multi-tenant distinction really matters is in the fact that many settings are simply no-ops in tenants. That's a bad user experience whether or not it's creating other problems.

So I support an effort to categorize our cluster settings (and force people to think about this categorization when introducing the settings), and to think about how it works for cc-serverless, cc-dedicated, and on-prem users.

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @tbg)

docs/generated/settings/settings-for-tenants.txt, line 1 at r3 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Can you set a setting that makes your cluster misbehave? yes. You can also send traffic that does that, though.

Not all misbehavior is overload-related. The ability to set cluster settings lets you cause problems that you might not be able to cause in other ways. For example, enabling OCSP lets you create behind-the-firewall traffic; setting it to strict mode can lead to unavailability (in a way that locks us out from being able to fix it).

It's true that this does not map directly to the shared vs dedicated distinction - many of these settings should be blocked from users in dedicated CC clusters too. But since free tier users are more or less anonymous and untraceable, it's good to place limits here especially on things that could lead to external abuse.

And one place where the multi-tenant distinction really matters is in the fact that many settings are simply no-ops in tenants. That's a bad user experience whether or not it's creating other problems.

So I support an effort to categorize our cluster settings (and force people to think about this categorization when introducing the settings), and to think about how it works for cc-serverless, cc-dedicated, and on-prem users.

Filed this issue to track the discussion:

settings: enhance the design of cluster settings to better support hosted deployments settings: enhance the design of cluster settings to better support hosted deployments #58445

@dt

58362: *: craft rangefeed abstraction, support cluster settings in tenants r=nvanbenschoten a=ajwerner This PR is a reworking of @dt's excellent effort in #57926 to pick up rangefeeds for tenant cluster settings after sprucing up the libraries around them. @tbg and @nvanbenschoten I'm tagging you accordingly given that you reviewed that PR. ### First two commits (`kvclient/rangefeed` package and adoption in `catalog/lease`) The first commit in this PR abstract out some rangefeed client logic around retries to make them more generally usable in other contexts. The second commit adopts this package in `sql/catalog/lease`. These two commits can (and probably should) be merged separately and exist in #58361. ### Next two commits (`server/settingswatcher` package and adoption in `server`) These commits introduce code to update the settings using the above rangefeed package and then hooks it up to the sql server. ### Next three commits Remove the restrictions from tenants updating settings, mark which settings can be updated, generate docs. These are lifted straight from @dt in #57926. Maybe closes #56623. 59571: kv: route present-time reads to global_read follower replicas r=nvanbenschoten a=nvanbenschoten This commit updates the kv client routing logic to account for the new `LEAD_FOR_GLOBAL_READS` `RangeClosedTimestampPolicy` added in #59505. In enterprise builds, non-stale read-only requests to ranges with this closed timestamp policy can now be routed to follower replicas, just like stale read-only requests to normal ranges currently are. In addition to this main change, there are a few smaller changes in this PR that were hard to split out, so are included in this commit. First, non-transactional requests are no longer served by followers even if the follower replica detects that the request can be. Previously, non-transactional requests would never be routed intentionally to a follower replica, but `Replica.canServeFollowerRead` would allow them through if their timestamp was low enough. This change was made in order to keep the client and server logic in-sync and because this does not seem safe for non-transactional requests that get their timestamp assigned on the server. We're planning to remove non-transactional requests soon anyway (#58459), so this isn't a big deal. It mostly just caused some testing fallout. Second, transactional requests that had previously written intents are now allowed to have their read-only requests served by followers, as long as those followers have a closed timestamp above the transaction's read *and* write timestamp. Previously, we had avoided this because it seemed to cause issues with read-your-writes. However, it appears safe as long as the write timestamp is below the closed timestamp, because we know all of the transaction's intents are at or below its write timestamp. This is very important for multi-region work, where we want a transaction to be able to write to a REGIONAL table and then later perform local reads (maybe foreign key checks) on GLOBAL tables. Third, a max clock offset shift in `canUseFollowerRead` was removed. It wasn't exactly clear what this was doing. It was introduced in the original 70be833 and seemed to allow a follower read in cases where they otherwise shouldn't be expected to succeed. I thought at first that it was accounting for the fact that the kv client's clock might be leading the kv server's clock and so it was being pessimistic about the expected closed timestamp, but it actually seems to be shifting the other way, so I don't know. I might be missing something. Finally, the concept of a `replicaoracle.OracleFactoryFunc` was removed and the existing `replicaoracle.OracleFactory` takes its place. We no longer need the double-factory approach because the transaction object is now passed directly to `Oracle.ChoosePreferredReplica`. This was a necessary change, because the process of determining whether a follower read can be served now requires transaction information and range information (i.e. the closed timestamp policy), so we need to make it in the Oracle itself instead of in the OracleFactory. This all seems like a simplification anyway. This is still waiting on changes to the closed timestamp system to be able to write end-to-end tests using this new functionality. 60021: roachpb: use LocalUncertaintyLimit from err in readWithinUncertaintyIntervalRetryTimestamp r=nvanbenschoten a=nvanbenschoten Fixes #57685. This commit updates `readWithinUncertaintyIntervalRetryTimestamp` to use the the `LocalUncertaintyLimit` field from `ReadWithinUncertaintyIntervalError` in place of `ObservedTimestamps`. This addresses a bug where `ReadWithinUncertaintyIntervalErrors` thrown by follower replicas during follower reads would use meaningless observed timestamps. Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com> Co-authored-by: David Taylor <tinystatemachine@gmail.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

dt requested review from irfansharif and asubiotto December 15, 2020 00:12

dt requested a review from a team as a code owner December 15, 2020 00:12

dt marked this pull request as draft December 15, 2020 00:13

dt force-pushed the tenant-settings branch 2 times, most recently from dd68f90 to 1304a5d Compare December 15, 2020 21:51

dt marked this pull request as ready for review December 15, 2020 22:02

dt requested a review from tbg December 15, 2020 22:02

tbg reviewed Dec 16, 2020

View reviewed changes

dt commented Dec 16, 2020

View reviewed changes

dt added 3 commits December 16, 2020 13:13

settings: add SystemOnly flag

11e976a

This change adds a flag to indicate a setting is only applicable to the system tenant, such as those that control storage layer behaviors, and thus should not be set by guest tenants (where it would be ignored). Release note: none.

dt force-pushed the tenant-settings branch from 1304a5d to 962cb25 Compare December 16, 2020 13:14

nvanbenschoten reviewed Dec 16, 2020

View reviewed changes

ajwerner mentioned this pull request Dec 28, 2020

sql: per-tenant cluster settings #56623

Closed

dt closed this Dec 28, 2020

dt deleted the tenant-settings branch December 28, 2020 17:54

dt restored the tenant-settings branch December 28, 2020 18:15

dt reopened this Dec 28, 2020

ajwerner mentioned this pull request Dec 30, 2020

*: craft rangefeed abstraction, support cluster settings in tenants #58362

Merged

knz reviewed Jan 4, 2021

View reviewed changes

dt commented Jan 4, 2021

View reviewed changes

knz reviewed Jan 4, 2021

View reviewed changes

dt commented Jan 4, 2021

View reviewed changes

knz reviewed Jan 4, 2021

View reviewed changes

dt commented Jan 4, 2021

View reviewed changes

andy-kimball reviewed Jan 4, 2021

View reviewed changes

irfansharif removed their request for review January 4, 2021 15:29

piyush-singh reviewed Jan 4, 2021

View reviewed changes

bdarnell reviewed Jan 4, 2021

View reviewed changes

knz reviewed Jan 5, 2021

View reviewed changes

dt closed this Jan 21, 2021

dt deleted the tenant-settings branch January 21, 2021 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: add poller for tenant cluster settings #57926

server: add poller for tenant cluster settings #57926

dt commented Dec 15, 2020

cockroach-teamcity commented Dec 15, 2020

dt commented Dec 15, 2020

dt commented Dec 15, 2020

dt commented Dec 15, 2020

tbg left a comment

dt left a comment

nvanbenschoten left a comment

knz left a comment

dt left a comment

knz left a comment

knz left a comment

dt left a comment

knz left a comment

dt left a comment

andy-kimball left a comment

piyush-singh left a comment

bdarnell left a comment

knz left a comment

server: add poller for tenant cluster settings #57926

server: add poller for tenant cluster settings #57926

Conversation

dt commented Dec 15, 2020

cockroach-teamcity commented Dec 15, 2020

dt commented Dec 15, 2020

dt commented Dec 15, 2020

dt commented Dec 15, 2020

tbg left a comment

Choose a reason for hiding this comment

dt left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

dt left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

dt left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

dt left a comment

Choose a reason for hiding this comment

andy-kimball left a comment

Choose a reason for hiding this comment

piyush-singh left a comment

Choose a reason for hiding this comment

bdarnell left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment