Skip to content

Conversation

@dodeca12
Copy link
Contributor

@dodeca12 dodeca12 commented Sep 10, 2025

This PR ensures that all spanconfig updates bypass the backpressure mechanism. This is done by excluding the system.span_configurations table from backpressurable spans by splitting the backpressurable range around it.

Implementation

Commit 1: Reproduce the Issue
  • Added comprehensive test TestSpanConfigUpdatesBlockedByRangeSizeBackpressureOnDefaultRanges that reproduces the catch-22 scenario
  • The test repeatedly writes spanconfig updates for a single key until the range becomes too large to split, triggering backpressure
  • Demonstrates how spanconfig updates (including protected timestamp deletions) get blocked by backpressure:
W250910 14:26:03.302675 144058 kv/kvserver/replica_backpressure.go:263  [T1,Vsystem,n1,s1,r77/1:/Table/4{7/1/"\xf…-8}] 481  applying backpressure to limit range growth on batch Put [/Table/47/1/"\xfa"/0], [txn: cfbe5191], [can-forward-ts]
I250910 14:26:03.304650 50 kv/kvserver_test/client_spanconfig_backpressure_test.go:209  [-] 482  ERROR! BREAKING OUT OF LOOP, numWrites successful: 62690, error: upsert-span-cfgs: split failed while applying backpressure to Put [/Table/47/1/"\xfa"/0], [txn: cfbe5191], [can-forward-ts] on range r77:/Table/4{7/1/"\xfa"-8} [(n1,s1):1, next=2, gen=1]: could not find valid split key
I250910 14:26:03.304650 50 kv/kvserver_test/client_spanconfig_backpressure_test.go:209  [-] 482 +(1) attached stack trace
I250910 14:26:03.304650 50 kv/kvserver_test/client_spanconfig_backpressure_test.go:209  [-] 482 +  -- stack trace:
I250910 14:26:03.304650 50 kv/kvserver_test/client_spanconfig_backpressure_test.go:209  [-] 482 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).execInternal.func1.1
I250910 14:26:03.304650 50 kv/kvserver_test/client_spanconfig_backpressure_test.go:209  [-] 482 +  | 	pkg/sql/internal.go:1260
I250910 14:26:03.304650 50 kv/kvserver_test/client_spanconfig_backpressure_test.go:209  [-] 482 +  | github.com/cockroachdb/cockroach/pkg/sql.(*rowsIterator).Close
I250910 14:26:03.304650 50 kv/kvserver_test/client_spanconfig_backpressure_test.go:209  [-] 482 +  | 	pkg/sql/internal.go:608
I250910 14:26:03.304650 50 kv/kvserver_test/client_spanconfig_backpressure_test.go:209  [-] 482 +  | github.com/cockroachdb/cockroach/pkg/sql.(*rowsIterator).Next
I250910 14:26:03.304650 50 kv/kvserver_test/client_spanconfig_backpressure_test.go:209  [-] 482 +  | 	pkg/sql/internal.go:584
I250910 14:26:03.304650 50 kv/kvserver_test/client_spanconfig_backpressure_test.go:209  [-] 482 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).execIEStmt
I250910 14:26:03.304650 50 kv/kvserver_test/client_spanconfig_backpressure_test.go:209  [-] 482 +  | 	pkg/sql/internal.go:872
I250910 14:26:03.304650 50 kv/kvserver_test/client_spanconfig_backpressure_test.go:209  [-] 482 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).ExecEx
I250910 14:26:03.304650 50 kv/kvserver_test/client_spanconfig_backpressure_test.go:209  [-] 482 +  | 	pkg/sql/internal.go:820
.
.
.
Commit 2:
  • Modified backpressurableSpans in replica_backpressure.go: Split the original backpressurable span {SystemConfigTableDataMax, TableDataMax} into two spans that exclude the span_configurations table:
    • {SystemConfigTableDataMax, span_configurations_table_start}
    • {span_configurations_table_end, TableDataMax}
  • Added explanatory comments: Documented that this prevents catch-22 situations where protected timestamp updates or garbage collection TTL updates are blocked by backpressure
  • Updated test expectations: Modified the aforementioned test to expect that spanconfig updates should succeed (bypass backpressure) rather than fail

Expected behaviour - any span config updates now bypass the range-size backpressure mechanism.

Fixes: #146982
Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@dodeca12 dodeca12 marked this pull request as ready for review September 11, 2025 14:51
@dodeca12 dodeca12 requested a review from a team as a code owner September 11, 2025 14:51
@dodeca12 dodeca12 force-pushed the sg_spanconfig_backpressure branch from e1266f9 to 0dce811 Compare September 12, 2025 14:17
Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat, it'll be good to address this failure condition.

I left some drive-by comments.

@stevendanna reviewed 1 of 2 files at r1, 2 of 2 files at r3, 1 of 2 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dodeca12)


-- commits line 11 at r3:
This largely duplicated the previous paragraph. We can probably make it a bit more concise.


-- commits line 13 at r3:

Suggestion:

This commit adds a test that reproduces the

-- commits line 20 at r3:
This commit does not fix the issue.


-- commits line 4 at r4:
In this comment, they can still be blocked.


-- commits line 30 at r4:
Do we want to say some words about why we are only doing this for the system.span_configurations table and not other system database tables?


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 40 at r3 (raw file):

	const (
		overloadMaxRangeBytes = 64 << 20 // Set to 64 MiB, a saner value than the default of 512 MiB.
		overloadMinRangeBytes = 16 << 10

Is there a reason not to set these much lower for the purposes of the test so that we don't have to do so many writes?


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 79 at r3 (raw file):

		RangeMaxBytes: 64 << 20, // 64 MiB.
		RangeMinBytes: 16 << 20, // 16 MiB.
	}

Should we re-use the constants you defined above here? I'd also either move it up near the other declarations or down closer to its usage.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 81 at r3 (raw file):

	}

	// This is a fat spanconfig with all fields set to maximum int64 and int32 values.

Suggestion:

	// This is a large spanconfig with all fields set to maximum int64 and int32 values.

pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 122 at r3 (raw file):

	require.NoError(t, err)

	log.Dev.Infof(ctx, "Size of configBytes: %d bytes (%d KiB)\n", len(configBytes), len(configBytes)>>10)

For some of these log statements t.Logf might be more appropriate.

In this particular case I might replace the log statement with an assertion that the configBytes is large enough.


pkg/kv/kvserver/replica_backpressure.go line 93 at r4 (raw file):

	// Backpressure from the end of the system config forward instead of
	// over all table data to avoid backpressuring unsplittable ranges.
	// Note: The above comment is no longer true.

Rather than this as a note, let's perhaps update the comment.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 30 at r4 (raw file):

)

func TestSpanConfigUpdatesBlockedByRangeSizeBackpressureOnDefaultRanges(t *testing.T) {

This test is complex enough that I would add a comment about the genera strategy being used.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 46 at r4 (raw file):

		defaultMaxBytes = 512 << 20 // default max bytes for a range
	)
	tc := serverutils.StartServerOnly(t, base.TestServerArgs{

nit: Usually tc is used for a "TestCluster" in our unit tests. So this might be better to name srv or s.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 49 at r4 (raw file):

		Knobs: base.TestingKnobs{
			Store: &kvserver.StoreTestingKnobs{
				DisableMergeQueue: true,

Why do we need to disable the merge queue?


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 137 at r4 (raw file):

	log.Dev.Infof(ctx,
		"Configuring span_configurations table with custom zone settings...\n")

We probably one need 1 of these two log message.

Copy link
Contributor

@iskettaneh iskettaneh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dodeca12 and @stevendanna)


pkg/kv/kvserver/replica_backpressure.go line 93 at r4 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Rather than this as a note, let's perhaps update the comment.

+1


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 29 at r3 (raw file):

	"github.com/stretchr/testify/require"
)

Is there a test file for backpressure already? maybe we could add this test there instead of creating a new file.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 58 at r3 (raw file):

	require.NoError(t, err)

	waitForSpanConfig := func(t *testing.T, tc serverutils.TestServerInterface, tablePrefix roachpb.Key, exp int64) {

nit: consider changing the function parameter "exp" to another name that represents what that field is all about exactly. In this case, it could be named to "expRangeMaxBytes"


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 65 at r3 (raw file):

				return err
			}
			if log.V(1) {

You probably don't need to check for verbosity in tests


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 66 at r3 (raw file):

			}
			if log.V(1) {
				log.Dev.Infof(ctx, "RangeMaxBytes for tablePrefix %s: %d\n", tablePrefix, conf.RangeMaxBytes)

nit: We usually start each log line with a small case letter.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 66 at r3 (raw file):

			}
			if log.V(1) {
				log.Dev.Infof(ctx, "RangeMaxBytes for tablePrefix %s: %d\n", tablePrefix, conf.RangeMaxBytes)

Maybe logging inside a testutils.SucceedsSoon might be spammy. The returned error should be enough.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 69 at r3 (raw file):

			}
			if conf.RangeMaxBytes != exp {
				return fmt.Errorf("expected %d, got %d", exp, conf.RangeMaxBytes)

maybe add "expected RangeMaxBytes %d, got %d"


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 87 at r3 (raw file):

	// See func exceedsMultipleOfSplitSize in /pkg/kv/kvserver/replica_metrics.go for the logic.
	spanConfig2KiB := roachpb.SpanConfig{ // 2078 bytes ~ 2 KiB.
		RangeMaxBytes: math.MaxInt64, // Maximum int64 value.

I am probably missing something here, but why are we using MaxInt64 for the RanMaxBytes? Shouldn't we reduce it so that we trigger the backpressure issue faster?


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 96 at r3 (raw file):

				},
				{
					ProtectedTimestamp: hlc.MaxTimestamp,

question: Why do we need two ProtectedTimestamp here?


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 106 at r3 (raw file):

			{
				Constraints: []roachpb.Constraint{
					{Key: "max_key", Value: strings.Repeat("x", 1000)}, // Very long constraint value.

nit #1: 1024 instead of 1000
nit #2: inline comments start with a small case letter, and they don't end with a full stop.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 157 at r3 (raw file):

	// Wait for the zone configuration to be applied.
	log.Dev.Infof(ctx, "Waiting for zone configuration to be applied...\n")
	testutils.SucceedsSoon(t, func() error {

Is this redundant given that we call waitForSpanConfig() a couple of statements above?


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 215 at r3 (raw file):

		}
		if err != nil {
			log.Dev.Infof(ctx, "ERROR! BREAKING OUT OF LOOP, numWrites successful: %d, error: %+v\n", i, err)

Instead of having logs, can you replace them with expect. statements?


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 245 at r3 (raw file):

	smallSpancofnRecordWriteErr := kvaccessor.UpdateSpanConfigRecords(ctx, []spanconfig.Target{testTarget}, []spanconfig.Record{smallSpancofnRecord}, hlc.MinTimestamp, hlc.MaxTimestamp)
	if smallSpancofnRecordWriteErr != nil {
		log.Dev.Infof(ctx, "ERROR: smallSpancofnRecord write failed: %v\n", smallSpancofnRecordWriteErr)

Can you instead of having log statements here, replace it with require.Error()?


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 250 at r4 (raw file):

	// Assert that the operation failed due to backpressure.
	require.NoError(t, err, "Expected span config writes to NOT be backpressured")
	log.Dev.Infof(ctx, "Verified that span config writes NOT be backpressured")

nit: You could remove this log line as I don't think it adds much debugging help.

Copy link
Contributor

@iskettaneh iskettaneh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dodeca12 and @stevendanna)


pkg/kv/kvserver/replica_backpressure.go line 98 at r4 (raw file):

	// collection TTL updates are blocked by backpressure.
	{Key: keys.SystemConfigTableDataMax, EndKey: keys.SystemSQLCodec.TablePrefix(keys.SpanConfigurationsTableID)},
	{Key: keys.SystemSQLCodec.TablePrefix(keys.SpanConfigurationsTableID + 1), EndKey: keys.TableDataMax},

To make this a little bit more readable, how about we do the following:

In the pkg/keys/constants.go file, similar to how we have NamespaceTableMin and NamespaceTableMax, can you add SpanConfigTableDataMin and SpanConfigTableDataMax.
Then, we here can just reference those constants.

Copy link
Contributor Author

@dodeca12 dodeca12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a little test to see if file comments show up on github integration on vscode/goland

@dodeca12 dodeca12 force-pushed the sg_spanconfig_backpressure branch from 0dce811 to 3a0f82f Compare September 15, 2025 17:32
Copy link
Contributor Author

@dodeca12 dodeca12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @iskettaneh and @stevendanna)


-- commits line 11 at r3:

Previously, stevendanna (Steven Danna) wrote…

This largely duplicated the previous paragraph. We can probably make it a bit more concise.

Done.


-- commits line 20 at r3:

Previously, stevendanna (Steven Danna) wrote…

This commit does not fix the issue.

Done.


-- commits line 4 at r4:

Previously, stevendanna (Steven Danna) wrote…

In this comment, they can still be blocked.

did you mean commit?


-- commits line 30 at r4:

Previously, stevendanna (Steven Danna) wrote…

Do we want to say some words about why we are only doing this for the system.span_configurations table and not other system database tables?

Done.


pkg/kv/kvserver/replica_backpressure.go line 93 at r4 (raw file):

Previously, iskettaneh wrote…

+1

Done.


pkg/kv/kvserver/replica_backpressure.go line 98 at r4 (raw file):

Previously, iskettaneh wrote…

To make this a little bit more readable, how about we do the following:

In the pkg/keys/constants.go file, similar to how we have NamespaceTableMin and NamespaceTableMax, can you add SpanConfigTableDataMin and SpanConfigTableDataMax.
Then, we here can just reference those constants.

Done.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 29 at r3 (raw file):

Previously, iskettaneh wrote…

Is there a test file for backpressure already? maybe we could add this test there instead of creating a new file.

Done.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 40 at r3 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Is there a reason not to set these much lower for the purposes of the test so that we don't have to do so many writes?

64 MiB is the smallest size I can configure the MaxRangeBytes


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 65 at r3 (raw file):

Previously, iskettaneh wrote…

You probably don't need to check for verbosity in tests

Done.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 66 at r3 (raw file):

Previously, iskettaneh wrote…

Maybe logging inside a testutils.SucceedsSoon might be spammy. The returned error should be enough.

Done.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 69 at r3 (raw file):

Previously, iskettaneh wrote…

maybe add "expected RangeMaxBytes %d, got %d"

Done.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 79 at r3 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Should we re-use the constants you defined above here? I'd also either move it up near the other declarations or down closer to its usage.

Done.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 87 at r3 (raw file):

Previously, iskettaneh wrote…

I am probably missing something here, but why are we using MaxInt64 for the RanMaxBytes? Shouldn't we reduce it so that we trigger the backpressure issue faster?

These are span configs for the scratch range - the systemSpanConfig is the one that targets the system.span_configurations table and is the one responsible for changing the range size to a lower value. We need these configs to be quite large to write as little as possible to trigger the backpressure behaviour.

Additionally, math.MaxInt64 is used here to create a larger span config since the span config needs to be marshalled into protobuf, which uses variant encoding


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 96 at r3 (raw file):

Previously, iskettaneh wrote…

question: Why do we need two ProtectedTimestamp here?

For a couple of reasons - 1) to make the span config larger in size so as to not have an unreasonable number of writes, 2) to simulate ranges having PTS, 3) and to test if removing the PTS (see smallSpanConfig) in a span config update would bypass the backpressure


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 122 at r3 (raw file):

Previously, stevendanna (Steven Danna) wrote…

For some of these log statements t.Logf might be more appropriate.

In this particular case I might replace the log statement with an assertion that the configBytes is large enough.

Done.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 157 at r3 (raw file):

Previously, iskettaneh wrote…

Is this redundant given that we call waitForSpanConfig() a couple of statements above?

Done.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 215 at r3 (raw file):

Previously, iskettaneh wrote…

Instead of having logs, can you replace them with expect. statements?

Done.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 245 at r3 (raw file):

Previously, iskettaneh wrote…

Can you instead of having log statements here, replace it with require.Error()?

Done.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 137 at r4 (raw file):

Previously, stevendanna (Steven Danna) wrote…

We probably one need 1 of these two log message.

Done.


-- commits line 13 at r3:
Done.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 81 at r3 (raw file):

	}

	// This is a fat spanconfig with all fields set to maximum int64 and int32 values.

Done.

Copy link
Contributor Author

@dodeca12 dodeca12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @iskettaneh and @stevendanna)


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 30 at r4 (raw file):

Previously, stevendanna (Steven Danna) wrote…

This test is complex enough that I would add a comment about the genera strategy being used.

Done.

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dodeca12 and @iskettaneh)


-- commits line 4 at r4:

Previously, dodeca12 (Swapneeth Gorantla) wrote…

did you mean commit?

Yup


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 40 at r3 (raw file):

Previously, dodeca12 (Swapneeth Gorantla) wrote…

64 MiB is the smallest size I can configure the MaxRangeBytes

Can you say a bit more about what you encountered when you tried to set this smaller? Locally I tried 16MB and it seemed to work.

The system has a number of "testing hooks" that allow us to get around various limitations and we can always ad new testing hooks if needed.

I ask because when I run this test locally it takes a long time so it would be nice if we could find some ways to speed it up, either by lowering the range size or doing a smaller number of larger writes.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 226 at r5 (raw file):

	if repl != nil {
		stats := repl.GetMVCCStats()
		t.Logf(`Range size after all writes: %d bytes (KeyCount: %d, 

I'm not sure why, but I'm not sure this is printing what we expect because when I run it locally I am seeing:

    client_spanconfig_backpressure_test.go:221: Range size after all writes: 4807 bytes (KeyCount: 76,
                        LiveBytes: 4499)\n

Even when I modify the code such that I know we are hitting size-based backpressure.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 236 at r5 (raw file):

	}

	smallSpancofnRecord, err := spanconfig.MakeRecord(testTarget, smallSpanConfig)

nit: smallSpancofnRecord -> smallSpanconfRecord (and related log entries below).

@dodeca12 dodeca12 force-pushed the sg_spanconfig_backpressure branch from 3a0f82f to e2455fa Compare September 17, 2025 11:07
Copy link
Contributor Author

@dodeca12 dodeca12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @iskettaneh and @stevendanna)


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 40 at r3 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Can you say a bit more about what you encountered when you tried to set this smaller? Locally I tried 16MB and it seemed to work.

The system has a number of "testing hooks" that allow us to get around various limitations and we can always ad new testing hooks if needed.

I ask because when I run this test locally it takes a long time so it would be nice if we could find some ways to speed it up, either by lowering the range size or doing a smaller number of larger writes.

Hmm you're right - looking into this, I think it's because I didn't change it from when I tried different approaches previously. I ran into errors that I can't configure the range to be any smaller than 64MiB when I was previously trying to run the test via LocalTestCluster:

cfg := roachpb.SpanConfig{
    RangeMinBytes: 0,
    RangeMaxBytes: 16 << 20, // 16MiB
    GCPolicy: roachpb.GCPolicy{
        TTLSeconds: 60 * 60 * 4, // 4 hours
    },
    NumReplicas: 3,
}

tc := &localtestcluster.LocalTestCluster{
    Cfg: kvserver.StoreConfig{
        DefaultSpanConfig: cfg, // This only applies to NEW ranges
    },
    StoreTestingKnobs: &kvserver.StoreTestingKnobs{
        DisableMergeQueue: true,
        DisableSplitQueue: true, // Prevent splits so range gets oversized
        DisableGCQueue:    true,
    },
}

this didn't work due as DefaultSpanConfig only applies to new ranges created in the store. The span_configurations table range already exists with its own hardcoded configuration.

and then also via:

_, err = tdb.Exec("ALTER DEFAULT ZONE CONFIGURE ZONE USING range_max_bytes = $1, range_min_bytes = $2",
   16 << MiB, defaultMinRangeBytes)

System tables have restricted zone configuration and so this approach also failed.

However it seems that SpanConfigKVAccessor updates the span configuration, bypassing the zone config inheritance logic.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 49 at r4 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Why do we need to disable the merge queue?

remnant from my previous approaches where not disabling the merge queue caused issues - removed as we no longer need to do so since we can use KVAccessor to set RangeMaxBytes to any value


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 226 at r5 (raw file):

Previously, stevendanna (Steven Danna) wrote…

I'm not sure why, but I'm not sure this is printing what we expect because when I run it locally I am seeing:

    client_spanconfig_backpressure_test.go:221: Range size after all writes: 4807 bytes (KeyCount: 76,
                        LiveBytes: 4499)\n

Even when I modify the code such that I know we are hitting size-based backpressure.

so this got me into a little rabbit hole - turns out this is printing the stats for a different replica than what is relevant, I've addressed this by printing out the aggregate MVCC stats for all the ranges in the table. Thanks to @iskettaneh for helping me with identifying the underlying issue

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up. I left a few more minor comments, but I defer to @iskettaneh for final review. Thanks for digging into this!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dodeca12 and @iskettaneh)


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 40 at r3 (raw file):

Previously, dodeca12 (Swapneeth Gorantla) wrote…

Hmm you're right - looking into this, I think it's because I didn't change it from when I tried different approaches previously. I ran into errors that I can't configure the range to be any smaller than 64MiB when I was previously trying to run the test via LocalTestCluster:

cfg := roachpb.SpanConfig{
    RangeMinBytes: 0,
    RangeMaxBytes: 16 << 20, // 16MiB
    GCPolicy: roachpb.GCPolicy{
        TTLSeconds: 60 * 60 * 4, // 4 hours
    },
    NumReplicas: 3,
}

tc := &localtestcluster.LocalTestCluster{
    Cfg: kvserver.StoreConfig{
        DefaultSpanConfig: cfg, // This only applies to NEW ranges
    },
    StoreTestingKnobs: &kvserver.StoreTestingKnobs{
        DisableMergeQueue: true,
        DisableSplitQueue: true, // Prevent splits so range gets oversized
        DisableGCQueue:    true,
    },
}

this didn't work due as DefaultSpanConfig only applies to new ranges created in the store. The span_configurations table range already exists with its own hardcoded configuration.

and then also via:

_, err = tdb.Exec("ALTER DEFAULT ZONE CONFIGURE ZONE USING range_max_bytes = $1, range_min_bytes = $2",
   16 << MiB, defaultMinRangeBytes)

System tables have restricted zone configuration and so this approach also failed.

However it seems that SpanConfigKVAccessor updates the span configuration, bypassing the zone config inheritance logic.

Great, thanks for taking another look.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 226 at r5 (raw file):

Previously, dodeca12 (Swapneeth Gorantla) wrote…

so this got me into a little rabbit hole - turns out this is printing the stats for a different replica than what is relevant, I've addressed this by printing out the aggregate MVCC stats for all the ranges in the table. Thanks to @iskettaneh for helping me with identifying the underlying issue

Nice, glad you got to the bottom of this.


-- commits line 32 at r8:
I'm not sure "self referential" is the best description here and my own understanding of the problem implies that there could be some value in not backpressuring other tables as well.

Let's imagine that the span_configuration table is getting large because of a protected timestamp. The range can't be cleaned up until the protected timestamp is removed. To remove the protected timestamp, we write to the system.protected_ts_records table. Then the "reconciliation process" attempts to "translate" that protected timestamp into span configuration and write it to the system.span_configurations table. From there, the replica will eventually learn of the updated span configuration and the next GC run will allow it to collect the MVCC garbage. As you can see, cleaning up this protected timestamp actually required 2 writes: one to actually remove the PTS record and one to translate that removal into new span configuration. You can also see that it isn't so much that the span_configuration update triggers reconciliation, but rather that successfully processing the reconciliation requires a span_configuration update.

All that said, I do think it is reasonable to only put this table into the allow list for now because there is a bit of a fanout that happens. Namely 1 PTS record can very easily cause many span_configuration updates.


pkg/keys/constants.go line 380 at r7 (raw file):

	SpanConfigTableDataMin = SystemSQLCodec.TablePrefix(SpanConfigurationsTableID)
	// SpanConfigTableDataMax is the end key of system.span_configurations.
	SpanConfigTableDataMax = SystemSQLCodec.TablePrefix(SpanConfigurationsTableID + 1)

For consistency with the namespace table I might name these SpanConfigTableMin and SpanConfigTableMax, but I leave that to y'alls judgement.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 242 at r8 (raw file):

	t.Logf("aggregate table size after all writes: %d bytes (%d MiB), "+
		"key count: %d, live count: %d",
		aggregateStats.Total(), aggregateStats.Total()>>20,

Just as an FYI, we have: humanize.Bytes which pretty-prints sizes.

Copy link
Contributor

@iskettaneh iskettaneh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dodeca12 and @stevendanna)


pkg/keys/constants.go line 380 at r7 (raw file):

Previously, stevendanna (Steven Danna) wrote…

For consistency with the namespace table I might name these SpanConfigTableMin and SpanConfigTableMax, but I leave that to y'alls judgement.

+1


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 29 at r3 (raw file):

Previously, dodeca12 (Swapneeth Gorantla) wrote…

Done.

I might be looking at a slightly stale PR, but is there already a backpressure test file that we can add this test to, instead of creating a new file?


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 87 at r3 (raw file):

Previously, dodeca12 (Swapneeth Gorantla) wrote…

These are span configs for the scratch range - the systemSpanConfig is the one that targets the system.span_configurations table and is the one responsible for changing the range size to a lower value. We need these configs to be quite large to write as little as possible to trigger the backpressure behaviour.

Additionally, math.MaxInt64 is used here to create a larger span config since the span config needs to be marshalled into protobuf, which uses variant encoding

I tested it manually without filling those fields with MaxInt64, and the test succeeds fine. I mean an integer with a value of 0 or MAX_INT will have the same bytes anyways. The main size is coming from strings.Repeat(...


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 106 at r8 (raw file):

	}

	configBytessdfsdf, err := protoutil.Marshal(&systemSpanConfig)

typo: configBytessdfsdf


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 195 at r8 (raw file):

	// Write span configurations using KVAccessor.
	// We expect this to fail due to backpressure.

Can you fix this such that in the first commit, it fails, and and the second commit after the fix, it doesn't fail?


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 217 at r8 (raw file):

	distSender := s.DistSenderI().(*kvcoord.DistSender)

	// Track aggregate MVCC stats across all discovered ranges

nit: it's not very clear to me what discovered ranges are. Can you change this comment to something like: Track aggregate MVCC stats across all SpanConfigurationsTable ranges?


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 240 at r8 (raw file):

	}

	t.Logf("aggregate table size after all writes: %d bytes (%d MiB), "+

Can you replace the log with an assertion? Something like:

require.Greater(t, aggregateStats.Total(), int64(overloadMaxRangeBytes))


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 254 at r8 (raw file):

	require.NoError(t, err)

	t.Logf("testing one more write with a small span config...")

No need for this log line, as there is a clear assertion below!


pkg/kv/kvserver/replica_backpressure.go line 91 at r8 (raw file):

var backpressurableSpans = []roachpb.Span{
	{Key: keys.TimeseriesPrefix, EndKey: keys.TimeseriesKeyMax},
	// Split the span to exclude the span_configurations table to avoid

nit: I would change this comment slightly to:

Exclude the span_configurations table to avoid ....


pkg/keys/constants.go line 378 at r8 (raw file):

	NamespaceTableMax = SystemSQLCodec.TablePrefix(NamespaceTableID + 1)
	// SpanConfigTableDataMin is the start key of system.span_configurations.
	SpanConfigTableDataMin = SystemSQLCodec.TablePrefix(SpanConfigurationsTableID)

I see that the commit "kvserver: reproduce range-size backpressure for spanconfig updates" has this, and it also has the fix. While the commit "kvserver: allowlist spanconfig updates to bypass range-size backpressure" only changes some comments

Copy link
Contributor

@iskettaneh iskettaneh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! I think it is good to go after this review round

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dodeca12 and @stevendanna)

Spanconfig updates could be blocked by range-size backpressure when
range-size gets too large, creating a catch-22 situation where
garbage collection and protected timestamp removal operations were
prevented from running, which in turn prevented range
splitting and alleviation of the backpressure condition - leading to
cluster wide issues.

This commit adds a test that reproduces the issue by repeatedly writing
spanconfig updates for a single key until the range becomes too large
to split, triggering backpressure. The test demonstrates how spanconfig
updates (including protected timestamp deletions) get blocked by
backpressure, preventing the very operations needed to resolve
the backpressure condition.

Fixes: None
Release note: None
@dodeca12 dodeca12 force-pushed the sg_spanconfig_backpressure branch from e2455fa to cb458e3 Compare September 18, 2025 15:25
Copy link
Contributor Author

@dodeca12 dodeca12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @iskettaneh and @stevendanna)


pkg/keys/constants.go line 380 at r7 (raw file):

Previously, iskettaneh wrote…

+1

Done.


pkg/keys/constants.go line 378 at r8 (raw file):

Previously, iskettaneh wrote…

I see that the commit "kvserver: reproduce range-size backpressure for spanconfig updates" has this, and it also has the fix. While the commit "kvserver: allowlist spanconfig updates to bypass range-size backpressure" only changes some comments

Done.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 29 at r3 (raw file):

Previously, iskettaneh wrote…

I might be looking at a slightly stale PR, but is there already a backpressure test file that we can add this test to, instead of creating a new file?

Done.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 87 at r3 (raw file):

Previously, iskettaneh wrote…

I tested it manually without filling those fields with MaxInt64, and the test succeeds fine. I mean an integer with a value of 0 or MAX_INT will have the same bytes anyways. The main size is coming from strings.Repeat(...

integers are fixed width in go, but for protobuf they are not since pb uses variant encoding. as a little example:

	spanConfig2KiB := roachpb.SpanConfig{ // 2078 bytes ~ 2 KiB.
		RangeMaxBytes: math.MaxInt64,
		RangeMinBytes: math.MaxInt64,
		GCPolicy: roachpb.GCPolicy{
			TTLSeconds: math.MaxInt32,
			ProtectionPolicies: []roachpb.ProtectionPolicy{
				{
					ProtectedTimestamp: hlc.MaxTimestamp,
				},
				{
					ProtectedTimestamp: hlc.MaxTimestamp,
				},
			},
		},
		NumReplicas: math.MaxInt32,
		GlobalReads: true,
		NumVoters:   math.MaxInt32,
		VoterConstraints: []roachpb.ConstraintsConjunction{
			{
				Constraints: []roachpb.Constraint{
					{Key: "max_key", Value: strings.Repeat("x", 1024)}, // very long constraint value
				},
			},
		},
		LeasePreferences: []roachpb.LeasePreference{
			{
				Constraints: []roachpb.Constraint{
					{Key: "max_key", Value: strings.Repeat("y", 1024)}, // very long constraint value
				},
			},
		},
	}

	spanConfig2KiBWithOneIntFieldEqualToZero := roachpb.SpanConfig{ // 2078 bytes ~ 2 KiB.
		RangeMaxBytes: 0,
		RangeMinBytes: math.MaxInt64,
		GCPolicy: roachpb.GCPolicy{
			TTLSeconds: math.MaxInt32,
			ProtectionPolicies: []roachpb.ProtectionPolicy{
				{
					ProtectedTimestamp: hlc.MaxTimestamp,
				},
				{
					ProtectedTimestamp: hlc.MaxTimestamp,
				},
			},
		},
		NumReplicas: math.MaxInt32,
		GlobalReads: true,
		NumVoters:   math.MaxInt32,
		VoterConstraints: []roachpb.ConstraintsConjunction{
			{
				Constraints: []roachpb.Constraint{
					{Key: "max_key", Value: strings.Repeat("x", 1024)}, // very long constraint value
				},
			},
		},
		LeasePreferences: []roachpb.LeasePreference{
			{
				Constraints: []roachpb.Constraint{
					{Key: "max_key", Value: strings.Repeat("y", 1024)}, // very long constraint value
				},
			},
		},
	}

	cfgBytesOneZero, err := protoutil.Marshal(&spanConfig2KiBWithOneIntFieldEqualToZero)


	configBytes, err := protoutil.Marshal(&spanConfig2KiB)

	fmt.Printf("spanConfig2KiBWithOneIntFieldEqualToZero marshalled size: %d bytes\n", len(cfgBytesOneZero))
	fmt.Printf("spanConfig2KiB marshalled size: %d bytes\n", len(configBytes))

which when run prints:

spanConfig2KiBWithOneIntFieldEqualToZero marshalled size: 2156 bytes
spanConfig2KiB marshalled size: 2166 bytes

I set it to math.MaxInt64 wherever I could before when I was under the assumption that I couldn't set the RangeMaxBytes to be below 64MiB but I suppose the ~single digit byte difference doesn't make too much of a difference now that we can set RangeMaxBytes to any arbitrarily low value now


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 106 at r8 (raw file):

Previously, iskettaneh wrote…

typo: configBytessdfsdf

thanks for the catch


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 195 at r8 (raw file):

Previously, iskettaneh wrote…

Can you fix this such that in the first commit, it fails, and and the second commit after the fix, it doesn't fail?

I may have messed up the rebase since I did have the test passing (as in asserting that there's an error ) in the first commit. This also explains why some file changes from the next commit show up in this earlier commit


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 240 at r8 (raw file):

Previously, iskettaneh wrote…

Can you replace the log with an assertion? Something like:

require.Greater(t, aggregateStats.Total(), int64(overloadMaxRangeBytes))

Done.


pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 254 at r8 (raw file):

Previously, iskettaneh wrote…

No need for this log line, as there is a clear assertion below!

Done.

Copy link
Contributor Author

@dodeca12 dodeca12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @iskettaneh and @stevendanna)


-- commits line 32 at r8:

Previously, stevendanna (Steven Danna) wrote…

I'm not sure "self referential" is the best description here and my own understanding of the problem implies that there could be some value in not backpressuring other tables as well.

Let's imagine that the span_configuration table is getting large because of a protected timestamp. The range can't be cleaned up until the protected timestamp is removed. To remove the protected timestamp, we write to the system.protected_ts_records table. Then the "reconciliation process" attempts to "translate" that protected timestamp into span configuration and write it to the system.span_configurations table. From there, the replica will eventually learn of the updated span configuration and the next GC run will allow it to collect the MVCC garbage. As you can see, cleaning up this protected timestamp actually required 2 writes: one to actually remove the PTS record and one to translate that removal into new span configuration. You can also see that it isn't so much that the span_configuration update triggers reconciliation, but rather that successfully processing the reconciliation requires a span_configuration update.

All that said, I do think it is reasonable to only put this table into the allow list for now because there is a bit of a fanout that happens. Namely 1 PTS record can very easily cause many span_configuration updates.

Reworded to make the description better

@dodeca12 dodeca12 requested a review from iskettaneh September 18, 2025 15:28
This commit modifies `backpressurableSpans` to exclude the
`system.span_configurations` table by splitting the original span into
two ranges that create a gap around the `system.span_configurations`
table. This allows spanconfig updates to bypass backpressure entirely,
preventing the catch-22 deadlock. The corresponding test is then
modified to assert the new behaviour (i.e., span config updates are not
blocked).

We only allowlist the \`system.span_configurations\` table because
protected timestamp cleanup requires two writes: one to remove the PTS
record from \`system.protected_ts_records\` and another to translate that
removal into updated span configuration in \`system.span_configurations\`.
When backpressure blocks the spanconfig update, the reconciliation process
cannot complete, preventing protected timestamp cleanup and creating a
deadlock where the system cannot clean up its own protected timestamps.
Other system tables lack this circular dependency and are not excluded.

Fixes: cockroachdb#146982
Release note: None
@dodeca12 dodeca12 force-pushed the sg_spanconfig_backpressure branch from cb458e3 to 04e58b8 Compare September 18, 2025 16:58
Copy link
Contributor

@iskettaneh iskettaneh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @stevendanna)

@dodeca12
Copy link
Contributor Author

bors r=iskettaneh

@craig
Copy link
Contributor

craig bot commented Sep 20, 2025

@craig craig bot merged commit 3b1e9fa into cockroachdb:master Sep 20, 2025
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kvserver: range-size backpressure can result in blocked spanconfig updates

4 participants