stats: break Gossip dependency #47925

tbg · 2020-04-22T13:44:06Z

Table stats uses gossip to notify the nodes about new statistics:

Line 79 in f176a39

func NewTableStatisticsCache(

Multi-tenancy Phase 2 won't have gossip available. On the plus side, there will only be a single SQL server running on behalf of a tenant (meaning the use of Gossip is not necessary, you just need to notify the running process).

Post phase 2, when we consider multiple SQL servers for a given tenant, we need to properly substitute the use of Gossip, but for now here the work is to use Gossip only when it's available, but to unconditionally notify the local server. This suggests that the statistics cache should not know about gossip but should operate on a suitable notifier abstraction.

Tentatively marking as a blocker since I think we'll want to avoid disabling stats, even for the small workloads we expect to support in the beginning.

Deprecated() calls are blocking phase 2, but the ones made optional here were not. Touches cockroachdb#47900. Touches cockroachdb#47925. Release note: None

tbg · 2020-07-08T12:53:22Z

@RaduBerinde I feel like you told me before but can you write down what the consequences would be of not doing this. Would stats just arrive on the node with a delay?

RaduBerinde · 2020-07-08T15:52:15Z

Currently they won't be updated at all without gossip (except on the local node where they are generated). There is no "polling" in place for table stats.

RaduBerinde · 2020-07-08T15:53:42Z

For Phase 2, I believe things would just work without Gossip, as we also notify the local stats cache directly (this was to guarantee in tests that the next statement would use new stats).

asubiotto · 2020-08-12T12:12:58Z

I'm trying to run the distsql_automatic_stats logic test in a multitenant configuration but it seems to fail to show any stats for the first SHOW STATISTICS statement (even after a 45s retry):

# Disable automatic stats
statement ok
SET CLUSTER SETTING sql.stats.automatic_collection.enabled = false

statement ok
CREATE TABLE data (a INT, b INT, c FLOAT, d DECIMAL, PRIMARY KEY (a, b, c), INDEX d_idx (d))

# Enable automatic stats
statement ok
SET CLUSTER SETTING sql.stats.automatic_collection.enabled = true

# Generate all combinations of values 1 to 10.
statement ok
INSERT INTO data SELECT a, b, c::FLOAT, NULL FROM
   generate_series(1, 10) AS a(a),
   generate_series(1, 10) AS b(b),
   generate_series(1, 10) AS c(c)

query TTIII colnames,rowsort,retry
SELECT statistics_name, column_names, row_count, distinct_count, null_count
FROM [SHOW STATISTICS FOR TABLE data] ORDER BY column_names::STRING, created
----
statistics_name  column_names  row_count  distinct_count  null_count
__auto__         {a,b,c}       1000       1000            0
__auto__         {a,b}         1000       100             0
__auto__         {a}           1000       10              0
__auto__         {b}           1000       10              0
__auto__         {c}           1000       10              0
__auto__         {d}           1000       1               1000

I wondered whether it had anything to do with disabling auto stats on the backing kv cluster and (since setting cluster settings on a tenant only affect the in-memory setting), but enabling that doesn't result in a difference.

To reproduce paste the above logictest into a __test file and run: make test PKG=./pkg/ccl/logictestccl TESTS=TestTenantLogic//distsql_automatic_stats. @RaduBerinde do you mind taking a look?

RaduBerinde · 2020-08-12T16:41:46Z

The logs show this error:

failed to create statistics on table 53: create-stats: unexpected concurrency for a flow that was forced to be planned locally

I guess the stats plan can't be fused into a single processor. @andreimatei I think you ran into this at some point and fixed something, do you remember?

andreimatei · 2020-08-12T16:45:35Z

I think maybe this is what #52621 is trying to address.
Is there a good reason why these stats processors can't be fused?

RaduBerinde · 2020-08-12T16:55:11Z

@asubiotto I don't understand why you expect this to work when there is a known issue that you have a PR out for? Or is the problem still happening with your PR?

asubiotto · 2020-08-12T17:00:01Z

I guess the stats plan can't be fused into a single processor. @andreimatei I think you ran into this at some point and fixed something, do you remember?

Woops, I did not see the logs. I just posted here since I saw no stats and that the expectation was that this should work. #52621 should fix this, sorry for the noise.

tbg · 2020-09-04T09:57:14Z

Should be fixed now then, right? Can we confirm, add the test, and close?

asubiotto · 2020-09-04T10:29:38Z

distsql_automatic_stats runs with 3node-tenant.

knz · 2020-09-11T10:26:18Z

The INJECT STATS statement still uses gossip in a way that refers to this issue.

knz · 2020-09-11T10:26:39Z

CREATE STATS too

rytaft · 2020-09-14T12:14:58Z

So just to confirm, the remaining work here is to remove all references to Gossip in stats for post-Phase 2, in which tenants may have multiple SQL servers? What is the timeline for this?

Is the suggested solution to use polling? Are there other DB features that have moved from Gossip to polling? Or have they used something else?

54256: sql: make multi-tenancy errors report to telemetry r=asubiotto a=knz Fixes #48164. Issues referenced, I also added the X-anchored-telemetry label to them on github (we keep those issues open on github until their reference in the code is removed): #54254 #54255 #54250 #54251 #54252 #49854 #48274 #47150 #47971 #47970 #47900 #47925 Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>

knz · 2020-09-14T12:28:54Z

There is a new subsystem alongside gossip called "gossip subscriptions".
The SQL servrer can register for certain gossip updates and the KV server will push updates over the subscription stream.

rytaft · 2020-09-14T14:17:52Z

Got it - thanks!

RaduBerinde · 2021-02-10T21:00:04Z

It looks like the best option is to set up a RangeFeed on the entire table_statistics table and look at modified keys to see which tables need refresh. There is a similar usecase in sql/catalog/lease/lease.go.

RaduBerinde · 2021-02-10T21:06:17Z

Node @ajwerner is working on a library for setting up rangefeeds which we should use once it's ready (#58361).

We currently use gossip to keep table stats caches updated when new statistics become available. This mechanism is not available in multi-tenant configurations (currently we only support single SQL nodes in that configuration and we refresh the cache on the same node out of band). This commit changes this mechanism to use a range feed that listens for updates to system.table_statistics. This is a cleaner mechanism as it doesn't require any out-of-band coordination. The range feed events are on a row-by-row basis; a statistics collection typically inserts multiple rows. We handle this by remembering the timestamp of the table_statistics row that triggered the last update. In addition, we remember information about the last event to avoid unnecessary overhead. Note that we continue to send gossip updates when new statistics are available, to keep things working for mixed-version clusters during upgrade. Fixes cockroachdb#47925. Release note: None

62563: sql: use a range feed for refreshing the table stats cache r=RaduBerinde a=RaduBerinde We currently use gossip to keep table stats caches updated when new statistics become available. This mechanism is not available in multi-tenant configurations (currently we only support single SQL nodes in that configuration and we refresh the cache on the same node out of band). This commit changes this mechanism to use a range feed that listens for updates to system.table_statistics. This is a cleaner mechanism as it doesn't require any out-of-band coordination. The range feed events are on a row-by-row basis; a statistics collection typically inserts multiple rows. We handle this by remembering the timestamp of the table_statistics row that triggered the last update. In addition, we remember information about the last event to avoid unnecessary overhead. Note that we continue to send gossip updates when new statistics are available, to keep things working for mixed-version clusters during upgrade. Fixes #47925. Release note: None Co-authored-by: Radu Berinde <radu@cockroachlabs.com>

RaduBerinde · 2021-03-29T18:38:54Z

Keeping open until we have something backported.

This is a modified backport of cockroachdb#62563 (description below). The difference is that we keep both mechanisms and only use the range cache when gossip is not available (i.e. multi-tenant). This avoids potential regressions caused by the new mechanism. --- We currently use gossip to keep table stats caches updated when new statistics become available. This mechanism is not available in multi-tenant configurations (currently we only support single SQL nodes in that configuration and we refresh the cache on the same node out of band). This commit changes this mechanism to use a range feed that listens for updates to system.table_statistics. This is a cleaner mechanism as it doesn't require any out-of-band coordination. The range feed events are on a row-by-row basis; a statistics collection typically inserts multiple rows. We handle this by remembering the timestamp of the table_statistics row that triggered the last update. In addition, we remember information about the last event to avoid unnecessary overhead. Note that we continue to send gossip updates when new statistics are available, to keep things working for mixed-version clusters during upgrade. Fixes cockroachdb#47925. Release note: None

ajwerner · 2021-09-22T12:34:15Z

The backport merged in #64360.

tbg added A-multitenancy Related to multi-tenancy A-multitenancy-blocker labels Apr 22, 2020

tbg assigned RaduBerinde Apr 22, 2020

RaduBerinde assigned rytaft Apr 22, 2020

tbg added a commit to tbg/cockroach that referenced this issue May 5, 2020

sql: downgrade some gossip.Deprecated() calls

cfe7722

Deprecated() calls are blocking phase 2, but the ones made optional here were not. Touches cockroachdb#47900. Touches cockroachdb#47925. Release note: None

tbg added a commit to tbg/cockroach that referenced this issue May 7, 2020

sql: downgrade some gossip.Deprecated() calls

cb06dc9

Deprecated() calls are blocking phase 2, but the ones made optional here were not. Touches cockroachdb#47900. Touches cockroachdb#47925. Release note: None

tbg removed the A-multitenancy-blocker label Jul 8, 2020

asubiotto closed this as completed Sep 4, 2020

knz reopened this Sep 11, 2020

knz added the X-anchored-telemetry The issue number is anchored by telemetry references. label Sep 11, 2020

knz mentioned this issue Sep 11, 2020

sql: make multi-tenancy errors report to telemetry #54256

Merged

RaduBerinde mentioned this issue Mar 25, 2021

sql: use a range feed for refreshing the table stats cache #62563

Merged

craig bot closed this as completed in 8905b9b Mar 28, 2021

RaduBerinde reopened this Mar 29, 2021

RaduBerinde mentioned this issue Apr 28, 2021

release-21.1: sql: use a range feed for refreshing the table stats cache #64360

Merged

jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021

ajwerner closed this as completed Sep 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stats: break Gossip dependency #47925

stats: break Gossip dependency #47925

tbg commented Apr 22, 2020 •

edited

Loading

tbg commented Jul 8, 2020

RaduBerinde commented Jul 8, 2020

RaduBerinde commented Jul 8, 2020

asubiotto commented Aug 12, 2020

RaduBerinde commented Aug 12, 2020

andreimatei commented Aug 12, 2020

RaduBerinde commented Aug 12, 2020

asubiotto commented Aug 12, 2020

tbg commented Sep 4, 2020

asubiotto commented Sep 4, 2020

knz commented Sep 11, 2020

knz commented Sep 11, 2020

rytaft commented Sep 14, 2020

knz commented Sep 14, 2020

rytaft commented Sep 14, 2020

RaduBerinde commented Feb 10, 2021

RaduBerinde commented Feb 10, 2021

RaduBerinde commented Mar 29, 2021

ajwerner commented Sep 22, 2021

stats: break Gossip dependency #47925

stats: break Gossip dependency #47925

Comments

tbg commented Apr 22, 2020 • edited Loading

tbg commented Jul 8, 2020

RaduBerinde commented Jul 8, 2020

RaduBerinde commented Jul 8, 2020

asubiotto commented Aug 12, 2020

RaduBerinde commented Aug 12, 2020

andreimatei commented Aug 12, 2020

RaduBerinde commented Aug 12, 2020

asubiotto commented Aug 12, 2020

tbg commented Sep 4, 2020

asubiotto commented Sep 4, 2020

knz commented Sep 11, 2020

knz commented Sep 11, 2020

rytaft commented Sep 14, 2020

knz commented Sep 14, 2020

rytaft commented Sep 14, 2020

RaduBerinde commented Feb 10, 2021

RaduBerinde commented Feb 10, 2021

RaduBerinde commented Mar 29, 2021

ajwerner commented Sep 22, 2021

tbg commented Apr 22, 2020 •

edited

Loading