-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: cdc/scan/catchup/nodes=5/cpu=16/rows=1G/ranges=100k/protocol=rangefeed/format=json/sink=null failed #113489
Comments
@erikgrinaker
to I think the goal was to manage the goroutine creation rate, and we didn't think about the indirect impact the fixed sized semaphore had on the (untracked) memory usage in the http2/grpc land. I wonder if we should add the "non-empty" checkpoint bit. |
The empty checkpoint is emitted after the catchup scan. Its purpose is exactly to signal to the client that the catchup scan is completed. |
Hmm... Then it's rather strange in this case, and I'm confused how we could have used up so much memory in http2 buffers. |
|
We are going to reintroduce catchup semaphore (in addition to rate limiter) for regular rangefeeds. |
I just confirmed that we got pretty bad plan when changefeed restarted:
|
113966: kvcoord: Reintroduce catchup scan semaphore for regular rangefeed r=miretskiy a=miretskiy Re-introduce catchup scan semaphore limit, removed by #110919, for regular rangefeed. This hard limit on the number of catchup scans is necessary to avoid OOMs when handling large scan rangefeeds (large fan-in factor) when executing many non-local ranges. Fixes #113489 Release note: None 114000: colfetcher: disable metamorphic randomization for direct scans r=yuzefovich a=yuzefovich This commit makes it so that we no longer - for now - use metamorphic randomization for the default value of `sql.distsql.direct_columnar_scans.enabled` cluster setting that controls whether the direct columnar scans (aka "KV projection pushdown") is enabled. It appears that we might be missing some memory accounting in the local fast path of this feature, and some backup-related roachtests run into OOMs with binaries with "enabled assertions". Disabling this metamorphization for now seems good to silence failures in case of this now-known issue. Informs: #113816 Epic: None Release note: None 114026: kvnemesis: bump default steps to 100 r=erikgrinaker a=erikgrinaker 50 steps is usually too small to trigger interesting behaviors. Bump it to 100, which is still small enough to be easily debuggable. The nightlies already run with 1000 steps. Epic: none Release note: None Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
roachtest.cdc/scan/catchup/nodes=5/cpu=16/rows=1G/ranges=100k/protocol=rangefeed/format=json/sink=null failed with artifacts on release-23.2 @ 348d11425a1184ce4de8a6f8dc85995cd3c653bc:
Parameters:
ROACHTEST_arch=amd64
,ROACHTEST_cloud=gce
,ROACHTEST_cpu=16
,ROACHTEST_encrypted=false
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
See: Grafana
This test on roachdash | Improve this report!
Jira issue: CRDB-33017
Epic CRDB-26372
The text was updated successfully, but these errors were encountered: