-
Notifications
You must be signed in to change notification settings - Fork 4k
kvserver: resolve range-size backpressure preventing spanconfig updates #153373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: resolve range-size backpressure preventing spanconfig updates #153373
Conversation
e1266f9 to
0dce811
Compare
stevendanna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat, it'll be good to address this failure condition.
I left some drive-by comments.
@stevendanna reviewed 1 of 2 files at r1, 2 of 2 files at r3, 1 of 2 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @dodeca12)
-- commits line 11 at r3:
This largely duplicated the previous paragraph. We can probably make it a bit more concise.
Suggestion:
This commit adds a test that reproduces the
-- commits line 20 at r3:
This commit does not fix the issue.
-- commits line 4 at r4:
In this comment, they can still be blocked.
-- commits line 30 at r4:
Do we want to say some words about why we are only doing this for the system.span_configurations table and not other system database tables?
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 40 at r3 (raw file):
const ( overloadMaxRangeBytes = 64 << 20 // Set to 64 MiB, a saner value than the default of 512 MiB. overloadMinRangeBytes = 16 << 10
Is there a reason not to set these much lower for the purposes of the test so that we don't have to do so many writes?
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 79 at r3 (raw file):
RangeMaxBytes: 64 << 20, // 64 MiB. RangeMinBytes: 16 << 20, // 16 MiB. }
Should we re-use the constants you defined above here? I'd also either move it up near the other declarations or down closer to its usage.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 81 at r3 (raw file):
} // This is a fat spanconfig with all fields set to maximum int64 and int32 values.
Suggestion:
// This is a large spanconfig with all fields set to maximum int64 and int32 values.pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 122 at r3 (raw file):
require.NoError(t, err) log.Dev.Infof(ctx, "Size of configBytes: %d bytes (%d KiB)\n", len(configBytes), len(configBytes)>>10)
For some of these log statements t.Logf might be more appropriate.
In this particular case I might replace the log statement with an assertion that the configBytes is large enough.
pkg/kv/kvserver/replica_backpressure.go line 93 at r4 (raw file):
// Backpressure from the end of the system config forward instead of // over all table data to avoid backpressuring unsplittable ranges. // Note: The above comment is no longer true.
Rather than this as a note, let's perhaps update the comment.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 30 at r4 (raw file):
) func TestSpanConfigUpdatesBlockedByRangeSizeBackpressureOnDefaultRanges(t *testing.T) {
This test is complex enough that I would add a comment about the genera strategy being used.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 46 at r4 (raw file):
defaultMaxBytes = 512 << 20 // default max bytes for a range ) tc := serverutils.StartServerOnly(t, base.TestServerArgs{
nit: Usually tc is used for a "TestCluster" in our unit tests. So this might be better to name srv or s.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 49 at r4 (raw file):
Knobs: base.TestingKnobs{ Store: &kvserver.StoreTestingKnobs{ DisableMergeQueue: true,
Why do we need to disable the merge queue?
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 137 at r4 (raw file):
log.Dev.Infof(ctx, "Configuring span_configurations table with custom zone settings...\n")
We probably one need 1 of these two log message.
iskettaneh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dodeca12 and @stevendanna)
pkg/kv/kvserver/replica_backpressure.go line 93 at r4 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Rather than this as a note, let's perhaps update the comment.
+1
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 29 at r3 (raw file):
"github.com/stretchr/testify/require" )
Is there a test file for backpressure already? maybe we could add this test there instead of creating a new file.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 58 at r3 (raw file):
require.NoError(t, err) waitForSpanConfig := func(t *testing.T, tc serverutils.TestServerInterface, tablePrefix roachpb.Key, exp int64) {
nit: consider changing the function parameter "exp" to another name that represents what that field is all about exactly. In this case, it could be named to "expRangeMaxBytes"
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 65 at r3 (raw file):
return err } if log.V(1) {
You probably don't need to check for verbosity in tests
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 66 at r3 (raw file):
} if log.V(1) { log.Dev.Infof(ctx, "RangeMaxBytes for tablePrefix %s: %d\n", tablePrefix, conf.RangeMaxBytes)
nit: We usually start each log line with a small case letter.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 66 at r3 (raw file):
} if log.V(1) { log.Dev.Infof(ctx, "RangeMaxBytes for tablePrefix %s: %d\n", tablePrefix, conf.RangeMaxBytes)
Maybe logging inside a testutils.SucceedsSoon might be spammy. The returned error should be enough.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 69 at r3 (raw file):
} if conf.RangeMaxBytes != exp { return fmt.Errorf("expected %d, got %d", exp, conf.RangeMaxBytes)
maybe add "expected RangeMaxBytes %d, got %d"
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 87 at r3 (raw file):
// See func exceedsMultipleOfSplitSize in /pkg/kv/kvserver/replica_metrics.go for the logic. spanConfig2KiB := roachpb.SpanConfig{ // 2078 bytes ~ 2 KiB. RangeMaxBytes: math.MaxInt64, // Maximum int64 value.
I am probably missing something here, but why are we using MaxInt64 for the RanMaxBytes? Shouldn't we reduce it so that we trigger the backpressure issue faster?
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 96 at r3 (raw file):
}, { ProtectedTimestamp: hlc.MaxTimestamp,
question: Why do we need two ProtectedTimestamp here?
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 106 at r3 (raw file):
{ Constraints: []roachpb.Constraint{ {Key: "max_key", Value: strings.Repeat("x", 1000)}, // Very long constraint value.
nit #1: 1024 instead of 1000
nit #2: inline comments start with a small case letter, and they don't end with a full stop.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 157 at r3 (raw file):
// Wait for the zone configuration to be applied. log.Dev.Infof(ctx, "Waiting for zone configuration to be applied...\n") testutils.SucceedsSoon(t, func() error {
Is this redundant given that we call waitForSpanConfig() a couple of statements above?
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 215 at r3 (raw file):
} if err != nil { log.Dev.Infof(ctx, "ERROR! BREAKING OUT OF LOOP, numWrites successful: %d, error: %+v\n", i, err)
Instead of having logs, can you replace them with expect. statements?
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 245 at r3 (raw file):
smallSpancofnRecordWriteErr := kvaccessor.UpdateSpanConfigRecords(ctx, []spanconfig.Target{testTarget}, []spanconfig.Record{smallSpancofnRecord}, hlc.MinTimestamp, hlc.MaxTimestamp) if smallSpancofnRecordWriteErr != nil { log.Dev.Infof(ctx, "ERROR: smallSpancofnRecord write failed: %v\n", smallSpancofnRecordWriteErr)
Can you instead of having log statements here, replace it with require.Error()?
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 250 at r4 (raw file):
// Assert that the operation failed due to backpressure. require.NoError(t, err, "Expected span config writes to NOT be backpressured") log.Dev.Infof(ctx, "Verified that span config writes NOT be backpressured")
nit: You could remove this log line as I don't think it adds much debugging help.
iskettaneh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dodeca12 and @stevendanna)
pkg/kv/kvserver/replica_backpressure.go line 98 at r4 (raw file):
// collection TTL updates are blocked by backpressure. {Key: keys.SystemConfigTableDataMax, EndKey: keys.SystemSQLCodec.TablePrefix(keys.SpanConfigurationsTableID)}, {Key: keys.SystemSQLCodec.TablePrefix(keys.SpanConfigurationsTableID + 1), EndKey: keys.TableDataMax},
To make this a little bit more readable, how about we do the following:
In the pkg/keys/constants.go file, similar to how we have NamespaceTableMin and NamespaceTableMax, can you add SpanConfigTableDataMin and SpanConfigTableDataMax.
Then, we here can just reference those constants.
dodeca12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a little test to see if file comments show up on github integration on vscode/goland
0dce811 to
3a0f82f
Compare
dodeca12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @iskettaneh and @stevendanna)
Previously, stevendanna (Steven Danna) wrote…
This largely duplicated the previous paragraph. We can probably make it a bit more concise.
Done.
Previously, stevendanna (Steven Danna) wrote…
This commit does not fix the issue.
Done.
Previously, stevendanna (Steven Danna) wrote…
In this comment, they can still be blocked.
did you mean commit?
Previously, stevendanna (Steven Danna) wrote…
Do we want to say some words about why we are only doing this for the system.span_configurations table and not other system database tables?
Done.
pkg/kv/kvserver/replica_backpressure.go line 93 at r4 (raw file):
Previously, iskettaneh wrote…
+1
Done.
pkg/kv/kvserver/replica_backpressure.go line 98 at r4 (raw file):
Previously, iskettaneh wrote…
To make this a little bit more readable, how about we do the following:
In the pkg/keys/constants.go file, similar to how we have
NamespaceTableMinandNamespaceTableMax, can you addSpanConfigTableDataMinandSpanConfigTableDataMax.
Then, we here can just reference those constants.
Done.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 29 at r3 (raw file):
Previously, iskettaneh wrote…
Is there a test file for backpressure already? maybe we could add this test there instead of creating a new file.
Done.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 40 at r3 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Is there a reason not to set these much lower for the purposes of the test so that we don't have to do so many writes?
64 MiB is the smallest size I can configure the MaxRangeBytes
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 65 at r3 (raw file):
Previously, iskettaneh wrote…
You probably don't need to check for verbosity in tests
Done.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 66 at r3 (raw file):
Previously, iskettaneh wrote…
Maybe logging inside a testutils.SucceedsSoon might be spammy. The returned error should be enough.
Done.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 69 at r3 (raw file):
Previously, iskettaneh wrote…
maybe add "expected RangeMaxBytes %d, got %d"
Done.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 79 at r3 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Should we re-use the constants you defined above here? I'd also either move it up near the other declarations or down closer to its usage.
Done.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 87 at r3 (raw file):
Previously, iskettaneh wrote…
I am probably missing something here, but why are we using MaxInt64 for the RanMaxBytes? Shouldn't we reduce it so that we trigger the backpressure issue faster?
These are span configs for the scratch range - the systemSpanConfig is the one that targets the system.span_configurations table and is the one responsible for changing the range size to a lower value. We need these configs to be quite large to write as little as possible to trigger the backpressure behaviour.
Additionally, math.MaxInt64 is used here to create a larger span config since the span config needs to be marshalled into protobuf, which uses variant encoding
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 96 at r3 (raw file):
Previously, iskettaneh wrote…
question: Why do we need two ProtectedTimestamp here?
For a couple of reasons - 1) to make the span config larger in size so as to not have an unreasonable number of writes, 2) to simulate ranges having PTS, 3) and to test if removing the PTS (see smallSpanConfig) in a span config update would bypass the backpressure
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 122 at r3 (raw file):
Previously, stevendanna (Steven Danna) wrote…
For some of these log statements
t.Logfmight be more appropriate.In this particular case I might replace the log statement with an assertion that the configBytes is large enough.
Done.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 157 at r3 (raw file):
Previously, iskettaneh wrote…
Is this redundant given that we call waitForSpanConfig() a couple of statements above?
Done.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 215 at r3 (raw file):
Previously, iskettaneh wrote…
Instead of having logs, can you replace them with expect. statements?
Done.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 245 at r3 (raw file):
Previously, iskettaneh wrote…
Can you instead of having log statements here, replace it with require.Error()?
Done.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 137 at r4 (raw file):
Previously, stevendanna (Steven Danna) wrote…
We probably one need 1 of these two log message.
Done.
-- commits line 13 at r3:
Done.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 81 at r3 (raw file):
} // This is a fat spanconfig with all fields set to maximum int64 and int32 values.
Done.
dodeca12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @iskettaneh and @stevendanna)
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 30 at r4 (raw file):
Previously, stevendanna (Steven Danna) wrote…
This test is complex enough that I would add a comment about the genera strategy being used.
Done.
stevendanna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dodeca12 and @iskettaneh)
Previously, dodeca12 (Swapneeth Gorantla) wrote…
did you mean commit?
Yup
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 40 at r3 (raw file):
Previously, dodeca12 (Swapneeth Gorantla) wrote…
64 MiB is the smallest size I can configure the MaxRangeBytes
Can you say a bit more about what you encountered when you tried to set this smaller? Locally I tried 16MB and it seemed to work.
The system has a number of "testing hooks" that allow us to get around various limitations and we can always ad new testing hooks if needed.
I ask because when I run this test locally it takes a long time so it would be nice if we could find some ways to speed it up, either by lowering the range size or doing a smaller number of larger writes.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 226 at r5 (raw file):
if repl != nil { stats := repl.GetMVCCStats() t.Logf(`Range size after all writes: %d bytes (KeyCount: %d,
I'm not sure why, but I'm not sure this is printing what we expect because when I run it locally I am seeing:
client_spanconfig_backpressure_test.go:221: Range size after all writes: 4807 bytes (KeyCount: 76,
LiveBytes: 4499)\n
Even when I modify the code such that I know we are hitting size-based backpressure.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 236 at r5 (raw file):
} smallSpancofnRecord, err := spanconfig.MakeRecord(testTarget, smallSpanConfig)
nit: smallSpancofnRecord -> smallSpanconfRecord (and related log entries below).
3a0f82f to
e2455fa
Compare
dodeca12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @iskettaneh and @stevendanna)
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 40 at r3 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Can you say a bit more about what you encountered when you tried to set this smaller? Locally I tried 16MB and it seemed to work.
The system has a number of "testing hooks" that allow us to get around various limitations and we can always ad new testing hooks if needed.
I ask because when I run this test locally it takes a long time so it would be nice if we could find some ways to speed it up, either by lowering the range size or doing a smaller number of larger writes.
Hmm you're right - looking into this, I think it's because I didn't change it from when I tried different approaches previously. I ran into errors that I can't configure the range to be any smaller than 64MiB when I was previously trying to run the test via LocalTestCluster:
cfg := roachpb.SpanConfig{
RangeMinBytes: 0,
RangeMaxBytes: 16 << 20, // 16MiB
GCPolicy: roachpb.GCPolicy{
TTLSeconds: 60 * 60 * 4, // 4 hours
},
NumReplicas: 3,
}
tc := &localtestcluster.LocalTestCluster{
Cfg: kvserver.StoreConfig{
DefaultSpanConfig: cfg, // This only applies to NEW ranges
},
StoreTestingKnobs: &kvserver.StoreTestingKnobs{
DisableMergeQueue: true,
DisableSplitQueue: true, // Prevent splits so range gets oversized
DisableGCQueue: true,
},
}
this didn't work due as DefaultSpanConfig only applies to new ranges created in the store. The span_configurations table range already exists with its own hardcoded configuration.
and then also via:
_, err = tdb.Exec("ALTER DEFAULT ZONE CONFIGURE ZONE USING range_max_bytes = $1, range_min_bytes = $2",
16 << MiB, defaultMinRangeBytes)
System tables have restricted zone configuration and so this approach also failed.
However it seems that SpanConfigKVAccessor updates the span configuration, bypassing the zone config inheritance logic.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 49 at r4 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Why do we need to disable the merge queue?
remnant from my previous approaches where not disabling the merge queue caused issues - removed as we no longer need to do so since we can use KVAccessor to set RangeMaxBytes to any value
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 226 at r5 (raw file):
Previously, stevendanna (Steven Danna) wrote…
I'm not sure why, but I'm not sure this is printing what we expect because when I run it locally I am seeing:
client_spanconfig_backpressure_test.go:221: Range size after all writes: 4807 bytes (KeyCount: 76, LiveBytes: 4499)\nEven when I modify the code such that I know we are hitting size-based backpressure.
so this got me into a little rabbit hole - turns out this is printing the stats for a different replica than what is relevant, I've addressed this by printing out the aggregate MVCC stats for all the ranges in the table. Thanks to @iskettaneh for helping me with identifying the underlying issue
stevendanna
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for following up. I left a few more minor comments, but I defer to @iskettaneh for final review. Thanks for digging into this!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dodeca12 and @iskettaneh)
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 40 at r3 (raw file):
Previously, dodeca12 (Swapneeth Gorantla) wrote…
Hmm you're right - looking into this, I think it's because I didn't change it from when I tried different approaches previously. I ran into errors that I can't configure the range to be any smaller than 64MiB when I was previously trying to run the test via
LocalTestCluster:cfg := roachpb.SpanConfig{ RangeMinBytes: 0, RangeMaxBytes: 16 << 20, // 16MiB GCPolicy: roachpb.GCPolicy{ TTLSeconds: 60 * 60 * 4, // 4 hours }, NumReplicas: 3, } tc := &localtestcluster.LocalTestCluster{ Cfg: kvserver.StoreConfig{ DefaultSpanConfig: cfg, // This only applies to NEW ranges }, StoreTestingKnobs: &kvserver.StoreTestingKnobs{ DisableMergeQueue: true, DisableSplitQueue: true, // Prevent splits so range gets oversized DisableGCQueue: true, }, }this didn't work due as
DefaultSpanConfigonly applies to new ranges created in the store. Thespan_configurationstable range already exists with its own hardcoded configuration.and then also via:
_, err = tdb.Exec("ALTER DEFAULT ZONE CONFIGURE ZONE USING range_max_bytes = $1, range_min_bytes = $2", 16 << MiB, defaultMinRangeBytes)System tables have restricted zone configuration and so this approach also failed.
However it seems that
SpanConfigKVAccessorupdates the span configuration, bypassing the zone config inheritance logic.
Great, thanks for taking another look.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 226 at r5 (raw file):
Previously, dodeca12 (Swapneeth Gorantla) wrote…
so this got me into a little rabbit hole - turns out this is printing the stats for a different replica than what is relevant, I've addressed this by printing out the aggregate MVCC stats for all the ranges in the table. Thanks to @iskettaneh for helping me with identifying the underlying issue
Nice, glad you got to the bottom of this.
-- commits line 32 at r8:
I'm not sure "self referential" is the best description here and my own understanding of the problem implies that there could be some value in not backpressuring other tables as well.
Let's imagine that the span_configuration table is getting large because of a protected timestamp. The range can't be cleaned up until the protected timestamp is removed. To remove the protected timestamp, we write to the system.protected_ts_records table. Then the "reconciliation process" attempts to "translate" that protected timestamp into span configuration and write it to the system.span_configurations table. From there, the replica will eventually learn of the updated span configuration and the next GC run will allow it to collect the MVCC garbage. As you can see, cleaning up this protected timestamp actually required 2 writes: one to actually remove the PTS record and one to translate that removal into new span configuration. You can also see that it isn't so much that the span_configuration update triggers reconciliation, but rather that successfully processing the reconciliation requires a span_configuration update.
All that said, I do think it is reasonable to only put this table into the allow list for now because there is a bit of a fanout that happens. Namely 1 PTS record can very easily cause many span_configuration updates.
pkg/keys/constants.go line 380 at r7 (raw file):
SpanConfigTableDataMin = SystemSQLCodec.TablePrefix(SpanConfigurationsTableID) // SpanConfigTableDataMax is the end key of system.span_configurations. SpanConfigTableDataMax = SystemSQLCodec.TablePrefix(SpanConfigurationsTableID + 1)
For consistency with the namespace table I might name these SpanConfigTableMin and SpanConfigTableMax, but I leave that to y'alls judgement.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 242 at r8 (raw file):
t.Logf("aggregate table size after all writes: %d bytes (%d MiB), "+ "key count: %d, live count: %d", aggregateStats.Total(), aggregateStats.Total()>>20,
Just as an FYI, we have: humanize.Bytes which pretty-prints sizes.
iskettaneh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dodeca12 and @stevendanna)
pkg/keys/constants.go line 380 at r7 (raw file):
Previously, stevendanna (Steven Danna) wrote…
For consistency with the namespace table I might name these
SpanConfigTableMinandSpanConfigTableMax, but I leave that to y'alls judgement.
+1
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 29 at r3 (raw file):
Previously, dodeca12 (Swapneeth Gorantla) wrote…
Done.
I might be looking at a slightly stale PR, but is there already a backpressure test file that we can add this test to, instead of creating a new file?
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 87 at r3 (raw file):
Previously, dodeca12 (Swapneeth Gorantla) wrote…
These are span configs for the scratch range - the
systemSpanConfigis the one that targets thesystem.span_configurationstable and is the one responsible for changing the range size to a lower value. We need these configs to be quite large to write as little as possible to trigger the backpressure behaviour.Additionally,
math.MaxInt64is used here to create a larger span config since the span config needs to be marshalled into protobuf, which uses variant encoding
I tested it manually without filling those fields with MaxInt64, and the test succeeds fine. I mean an integer with a value of 0 or MAX_INT will have the same bytes anyways. The main size is coming from strings.Repeat(...
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 106 at r8 (raw file):
} configBytessdfsdf, err := protoutil.Marshal(&systemSpanConfig)
typo: configBytessdfsdf
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 195 at r8 (raw file):
// Write span configurations using KVAccessor. // We expect this to fail due to backpressure.
Can you fix this such that in the first commit, it fails, and and the second commit after the fix, it doesn't fail?
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 217 at r8 (raw file):
distSender := s.DistSenderI().(*kvcoord.DistSender) // Track aggregate MVCC stats across all discovered ranges
nit: it's not very clear to me what discovered ranges are. Can you change this comment to something like: Track aggregate MVCC stats across all SpanConfigurationsTable ranges?
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 240 at r8 (raw file):
} t.Logf("aggregate table size after all writes: %d bytes (%d MiB), "+
Can you replace the log with an assertion? Something like:
require.Greater(t, aggregateStats.Total(), int64(overloadMaxRangeBytes))
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 254 at r8 (raw file):
require.NoError(t, err) t.Logf("testing one more write with a small span config...")
No need for this log line, as there is a clear assertion below!
pkg/kv/kvserver/replica_backpressure.go line 91 at r8 (raw file):
var backpressurableSpans = []roachpb.Span{ {Key: keys.TimeseriesPrefix, EndKey: keys.TimeseriesKeyMax}, // Split the span to exclude the span_configurations table to avoid
nit: I would change this comment slightly to:
Exclude the span_configurations table to avoid ....
pkg/keys/constants.go line 378 at r8 (raw file):
NamespaceTableMax = SystemSQLCodec.TablePrefix(NamespaceTableID + 1) // SpanConfigTableDataMin is the start key of system.span_configurations. SpanConfigTableDataMin = SystemSQLCodec.TablePrefix(SpanConfigurationsTableID)
I see that the commit "kvserver: reproduce range-size backpressure for spanconfig updates" has this, and it also has the fix. While the commit "kvserver: allowlist spanconfig updates to bypass range-size backpressure" only changes some comments
iskettaneh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this! I think it is good to go after this review round
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dodeca12 and @stevendanna)
Spanconfig updates could be blocked by range-size backpressure when range-size gets too large, creating a catch-22 situation where garbage collection and protected timestamp removal operations were prevented from running, which in turn prevented range splitting and alleviation of the backpressure condition - leading to cluster wide issues. This commit adds a test that reproduces the issue by repeatedly writing spanconfig updates for a single key until the range becomes too large to split, triggering backpressure. The test demonstrates how spanconfig updates (including protected timestamp deletions) get blocked by backpressure, preventing the very operations needed to resolve the backpressure condition. Fixes: None Release note: None
e2455fa to
cb458e3
Compare
dodeca12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @iskettaneh and @stevendanna)
pkg/keys/constants.go line 380 at r7 (raw file):
Previously, iskettaneh wrote…
+1
Done.
pkg/keys/constants.go line 378 at r8 (raw file):
Previously, iskettaneh wrote…
I see that the commit "kvserver: reproduce range-size backpressure for spanconfig updates" has this, and it also has the fix. While the commit "kvserver: allowlist spanconfig updates to bypass range-size backpressure" only changes some comments
Done.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 29 at r3 (raw file):
Previously, iskettaneh wrote…
I might be looking at a slightly stale PR, but is there already a backpressure test file that we can add this test to, instead of creating a new file?
Done.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 87 at r3 (raw file):
Previously, iskettaneh wrote…
I tested it manually without filling those fields with MaxInt64, and the test succeeds fine. I mean an integer with a value of 0 or MAX_INT will have the same bytes anyways. The main size is coming from strings.Repeat(...
integers are fixed width in go, but for protobuf they are not since pb uses variant encoding. as a little example:
spanConfig2KiB := roachpb.SpanConfig{ // 2078 bytes ~ 2 KiB.
RangeMaxBytes: math.MaxInt64,
RangeMinBytes: math.MaxInt64,
GCPolicy: roachpb.GCPolicy{
TTLSeconds: math.MaxInt32,
ProtectionPolicies: []roachpb.ProtectionPolicy{
{
ProtectedTimestamp: hlc.MaxTimestamp,
},
{
ProtectedTimestamp: hlc.MaxTimestamp,
},
},
},
NumReplicas: math.MaxInt32,
GlobalReads: true,
NumVoters: math.MaxInt32,
VoterConstraints: []roachpb.ConstraintsConjunction{
{
Constraints: []roachpb.Constraint{
{Key: "max_key", Value: strings.Repeat("x", 1024)}, // very long constraint value
},
},
},
LeasePreferences: []roachpb.LeasePreference{
{
Constraints: []roachpb.Constraint{
{Key: "max_key", Value: strings.Repeat("y", 1024)}, // very long constraint value
},
},
},
}
spanConfig2KiBWithOneIntFieldEqualToZero := roachpb.SpanConfig{ // 2078 bytes ~ 2 KiB.
RangeMaxBytes: 0,
RangeMinBytes: math.MaxInt64,
GCPolicy: roachpb.GCPolicy{
TTLSeconds: math.MaxInt32,
ProtectionPolicies: []roachpb.ProtectionPolicy{
{
ProtectedTimestamp: hlc.MaxTimestamp,
},
{
ProtectedTimestamp: hlc.MaxTimestamp,
},
},
},
NumReplicas: math.MaxInt32,
GlobalReads: true,
NumVoters: math.MaxInt32,
VoterConstraints: []roachpb.ConstraintsConjunction{
{
Constraints: []roachpb.Constraint{
{Key: "max_key", Value: strings.Repeat("x", 1024)}, // very long constraint value
},
},
},
LeasePreferences: []roachpb.LeasePreference{
{
Constraints: []roachpb.Constraint{
{Key: "max_key", Value: strings.Repeat("y", 1024)}, // very long constraint value
},
},
},
}
cfgBytesOneZero, err := protoutil.Marshal(&spanConfig2KiBWithOneIntFieldEqualToZero)
configBytes, err := protoutil.Marshal(&spanConfig2KiB)
fmt.Printf("spanConfig2KiBWithOneIntFieldEqualToZero marshalled size: %d bytes\n", len(cfgBytesOneZero))
fmt.Printf("spanConfig2KiB marshalled size: %d bytes\n", len(configBytes))which when run prints:
spanConfig2KiBWithOneIntFieldEqualToZero marshalled size: 2156 bytes
spanConfig2KiB marshalled size: 2166 bytes
I set it to math.MaxInt64 wherever I could before when I was under the assumption that I couldn't set the RangeMaxBytes to be below 64MiB but I suppose the ~single digit byte difference doesn't make too much of a difference now that we can set RangeMaxBytes to any arbitrarily low value now
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 106 at r8 (raw file):
Previously, iskettaneh wrote…
typo: configBytessdfsdf
thanks for the catch
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 195 at r8 (raw file):
Previously, iskettaneh wrote…
Can you fix this such that in the first commit, it fails, and and the second commit after the fix, it doesn't fail?
I may have messed up the rebase since I did have the test passing (as in asserting that there's an error ) in the first commit. This also explains why some file changes from the next commit show up in this earlier commit
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 240 at r8 (raw file):
Previously, iskettaneh wrote…
Can you replace the log with an assertion? Something like:
require.Greater(t, aggregateStats.Total(), int64(overloadMaxRangeBytes))
Done.
pkg/kv/kvserver/client_spanconfig_backpressure_test.go line 254 at r8 (raw file):
Previously, iskettaneh wrote…
No need for this log line, as there is a clear assertion below!
Done.
dodeca12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @iskettaneh and @stevendanna)
Previously, stevendanna (Steven Danna) wrote…
I'm not sure "self referential" is the best description here and my own understanding of the problem implies that there could be some value in not backpressuring other tables as well.
Let's imagine that the span_configuration table is getting large because of a protected timestamp. The range can't be cleaned up until the protected timestamp is removed. To remove the protected timestamp, we write to the
system.protected_ts_recordstable. Then the "reconciliation process" attempts to "translate" that protected timestamp into span configuration and write it to thesystem.span_configurationstable. From there, the replica will eventually learn of the updated span configuration and the next GC run will allow it to collect the MVCC garbage. As you can see, cleaning up this protected timestamp actually required 2 writes: one to actually remove the PTS record and one to translate that removal into new span configuration. You can also see that it isn't so much that the span_configuration update triggers reconciliation, but rather that successfully processing the reconciliation requires a span_configuration update.All that said, I do think it is reasonable to only put this table into the allow list for now because there is a bit of a fanout that happens. Namely 1 PTS record can very easily cause many span_configuration updates.
Reworded to make the description better
This commit modifies `backpressurableSpans` to exclude the `system.span_configurations` table by splitting the original span into two ranges that create a gap around the `system.span_configurations` table. This allows spanconfig updates to bypass backpressure entirely, preventing the catch-22 deadlock. The corresponding test is then modified to assert the new behaviour (i.e., span config updates are not blocked). We only allowlist the \`system.span_configurations\` table because protected timestamp cleanup requires two writes: one to remove the PTS record from \`system.protected_ts_records\` and another to translate that removal into updated span configuration in \`system.span_configurations\`. When backpressure blocks the spanconfig update, the reconciliation process cannot complete, preventing protected timestamp cleanup and creating a deadlock where the system cannot clean up its own protected timestamps. Other system tables lack this circular dependency and are not excluded. Fixes: cockroachdb#146982 Release note: None
cb458e3 to
04e58b8
Compare
iskettaneh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @stevendanna)
|
bors r=iskettaneh |
This PR ensures that all spanconfig updates bypass the backpressure mechanism. This is done by excluding the
system.span_configurationstable from backpressurable spans by splitting the backpressurable range around it.Implementation
Commit 1: Reproduce the Issue
Commit 2:
backpressurableSpansinreplica_backpressure.go: Split the original backpressurable span{SystemConfigTableDataMax, TableDataMax}into two spans that exclude the span_configurations table:{SystemConfigTableDataMax, span_configurations_table_start}{span_configurations_table_end, TableDataMax}Expected behaviour - any span config updates now bypass the range-size backpressure mechanism.
Fixes: #146982
Release note: None