Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure in kafka_server_rpfixture.fetch_leader_epoch #17943

Closed
travisdowns opened this issue Apr 18, 2024 · 2 comments · Fixed by #18468
Closed

CI Failure in kafka_server_rpfixture.fetch_leader_epoch #17943

travisdowns opened this issue Apr 18, 2024 · 2 comments · Fixed by #18468
Labels
kind/bug Something isn't working rpunit unit test ci-failure (not ducktape)

Comments

@travisdowns
Copy link
Member

travisdowns commented Apr 18, 2024

This is in a PR, which only changed python DT code, so it couldn't be the cause of this failure (so I'm assume it's a flake):

https://buildkite.com/redpanda/redpanda/builds/47976#018eedeb-71b6-445d-bd66-0d7f1d4bba6c

I couldn't quickly pull out the actual failing assertion.

JIRA Link: CORE-2428

@travisdowns travisdowns added kind/bug Something isn't working rpunit unit test ci-failure (not ducktape) labels Apr 18, 2024
@bharathv
Copy link
Contributor

/var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-02971863fd9d6bf64-1/redpanda/redpanda/src/v/redpanda/tests/fixture.h(515): fatal error: in "fetch_leader_epoch": Timed out at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-02971863fd9d6bf64-1/redpanda/redpanda/src/v/redpanda/tests/fixture.h:515

Failure is in test fetch_leader_epoch , updating the title.

@bharathv bharathv changed the title CI Failure in kafka_server_rpfixture.test_replicated_partition_end_offset CI Failure in kafka_server_rpfixture.fetch_leader_epoch Apr 19, 2024
@bharathv
Copy link
Contributor

INFO  2024-04-17 22:54:55,304 [shard 0:main] raft - [group_id:1, {kafka/foo/0}] consensus.cc:214 - [external_stepdown - trigger epoch change] Stepping down as leader in term 1, dirty offset 113^M
....
TRACE 2024-04-17 22:54:58,127 [shard 0:main] kvstore - kvstore.cc:212 - Apply op: update: key={bytes:10} value={{bytes=8, fragments=1}}^M
/var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-02971863fd9d6bf64-1/redpanda/redpanda/src/v/redpanda/tests/fixture.h(515): fatal error: in "fetch_leader_epoch": Timed out at /var/lib/buildkite-agent/builds/buildkite-amd64-builders-i-02971863fd9d6bf64-1/redpanda/redpanda/src/v/redpanda/tests/fixture.h:515^M
...

The vote timer didn't fire on time (1.5s + jitter ..  <= 2.25s), perhaps something to do with debug build being slow. the wait time is 3s.. guess an obvious fix to bump the timeout but we could probably do something more deterministic. 

mmaslankaprv added a commit to mmaslankaprv/redpanda that referenced this issue May 14, 2024
Drift in seastar lowres clock may lead to situation in which a leader
election will not trigger in 1.5 seconds. Increased a wait for leader
timeout to eliminate test spurious failures.

Fixes: redpanda-data#17943
Fixes: redpanda-data#18059

Signed-off-by: Michał Maślanka <michal@redpanda.com>
Lazin pushed a commit to Lazin/redpanda that referenced this issue Jun 1, 2024
Drift in seastar lowres clock may lead to situation in which a leader
election will not trigger in 1.5 seconds. Increased a wait for leader
timeout to eliminate test spurious failures.

Fixes: redpanda-data#17943
Fixes: redpanda-data#18059

Signed-off-by: Michał Maślanka <michal@redpanda.com>
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue Jun 12, 2024
Drift in seastar lowres clock may lead to situation in which a leader
election will not trigger in 1.5 seconds. Increased a wait for leader
timeout to eliminate test spurious failures.

Fixes: redpanda-data#17943
Fixes: redpanda-data#18059

Signed-off-by: Michał Maślanka <michal@redpanda.com>
(cherry picked from commit c5fe910)
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue Jun 12, 2024
Drift in seastar lowres clock may lead to situation in which a leader
election will not trigger in 1.5 seconds. Increased a wait for leader
timeout to eliminate test spurious failures.

Fixes: redpanda-data#17943
Fixes: redpanda-data#18059

Signed-off-by: Michał Maślanka <michal@redpanda.com>
(cherry picked from commit c5fe910)
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue Jun 12, 2024
Drift in seastar lowres clock may lead to situation in which a leader
election will not trigger in 1.5 seconds. Increased a wait for leader
timeout to eliminate test spurious failures.

Fixes: redpanda-data#17943
Fixes: redpanda-data#18059

Signed-off-by: Michał Maślanka <michal@redpanda.com>
(cherry picked from commit c5fe910)
Lazin pushed a commit to Lazin/redpanda that referenced this issue Jun 12, 2024
Drift in seastar lowres clock may lead to situation in which a leader
election will not trigger in 1.5 seconds. Increased a wait for leader
timeout to eliminate test spurious failures.

Fixes: redpanda-data#17943
Fixes: redpanda-data#18059

Signed-off-by: Michał Maślanka <michal@redpanda.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working rpunit unit test ci-failure (not ducktape)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants