-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI Failure (key symptom) in ShadowIndexingCacheSpaceLeakTest.test_si_cache
#21597
Comments
The failure is caused by this PR #21556. The test now takes 3x more time and sometimes it fails due to timeouts. @abhijat @Lazin @jcipar any insights on why disabling trim carryover makes this test so slow? Are cache puts getting blocked? Do we expect this? Should we just bump the timeout? Reverting the above mentioned PR makes the test run just fine. |
@nvartolomei Should we backport the fix? Seeing the error in 24.2. |
@nvartolomei Hmm, but #23006 (which was backported as #23179 and #23024) fixes the scale test. I was talking about #22796 (which wasn't backported) |
Got confused. Backporting now. |
This test is too slow with default configuration making the test flaky. Instead of raising the timeouts I'm trying to reduce the cache eviction throttling which makes the test 3x faster. The test became flaky after in-memory trim was introduced in redpanda-data#21556. The main insight was provided by https://github.com/abhijat in a private exchange: > I think it might be the extra throttling. With the carry over > disabled, we always have to do a trim when reserving space, which > results in a lot more throttling and sleep: > > ``` > $ grep -Ri "Cache trimming throttled" * | grep -c cache > 139 > ``` > > With the carryover list in place, about half of the calls to reserve > space end up in an early return because the list provides enough room > to clear up space, which does not cause the trimming to be throttled > as much: > > ``` > $ grep -Ri "Cache trimming throttled" * | grep -c cache > 63 > ``` > > Although that doesn't explain how this test used to work before, IIRC > carryover is a fairly new feature Fixes redpanda-data#21597 (cherry picked from commit 7763669)
https://buildkite.com/redpanda/redpanda/builds/51904
JIRA Link: CORE-5760
The text was updated successfully, but these errors were encountered: