CI Failure (Consumed from an unexpected offset) in `PartitionMoveInterruption.test_cancelling_partition_move` #17847

vbotbuildovich · 2024-04-12T21:42:09Z

https://buildkite.com/redpanda/redpanda/builds/47713

Module: rptest.tests.partition_move_interruption_test
Class: PartitionMoveInterruption
Method: test_cancelling_partition_move
Arguments: {
    "recovery": "restart_recovery",
    "compacted": false,
    "unclean_abort": true,
    "replication_factor": 3
}

test_id:    PartitionMoveInterruption.test_cancelling_partition_move
status:     FAIL
run time:   141.675 seconds

Exception('VerifiableConsumer-0-139821824201184-worker-1: Traceback (most recent call last):\n  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/services/background_thread.py", line 38, in _protected_worker\n    self._worker(idx, node)\n  File "/root/tests/rptest/services/verifiable_consumer.py", line 356, in _worker\n    raise e\n  File "/root/tests/rptest/services/verifiable_consumer.py", line 338, in _worker\n    handler.handle_records_consumed(event, self.logger)\n  File "/root/tests/rptest/services/verifiable_consumer.py", line 101, in handle_records_consumed\n    raise AssertionError(msg)\nAssertionError: Consumed from an unexpected offset (1455, 0) for partition TopicPartition(topic=\'topic-zrjtbbdhfp\', partition=0)\n')
Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 184, in _do_run
    data = self.run_test()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 276, in run_test
    return self.test_context.function(self.test)
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/mark/_mark.py", line 535, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/services/cluster.py", line 104, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/partition_move_interruption_test.py", line 199, in test_cancelling_partition_move
    self.consumer.stop()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/services/background_thread.py", line 86, in stop
    self._propagate_exceptions()
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/services/background_thread.py", line 100, in _propagate_exceptions
    raise Exception(self.errors)
Exception: VerifiableConsumer-0-139821824201184-worker-1: Traceback (most recent call last):
  File "/opt/.ducktape-venv/lib/python3.10/site-packages/ducktape/services/background_thread.py", line 38, in _protected_worker
    self._worker(idx, node)
  File "/root/tests/rptest/services/verifiable_consumer.py", line 356, in _worker
    raise e
  File "/root/tests/rptest/services/verifiable_consumer.py", line 338, in _worker
    handler.handle_records_consumed(event, self.logger)
  File "/root/tests/rptest/services/verifiable_consumer.py", line 101, in handle_records_consumed
    raise AssertionError(msg)
AssertionError: Consumed from an unexpected offset (1455, 0) for partition TopicPartition(topic='topic-zrjtbbdhfp', partition=0)

JIRA Link: CORE-2353

The text was updated successfully, but these errors were encountered:

vbotbuildovich · 2024-04-13T21:14:05Z

*https://buildkite.com/redpanda/redpanda/builds/47752
*https://buildkite.com/redpanda/redpanda/builds/47762

vbotbuildovich · 2024-04-16T21:13:53Z

*https://buildkite.com/redpanda/redpanda/builds/47858

Previously, when force-aborting a reconfiguration, we appended an aborting configuration on all replicas. This can lead to log inconsistencies as on followers the configuration will be duplicated (one from own append, one replicated by the leader). Although these inconsistencies are expected for force-abort, if the leader is alive, we can minimize the chance of their appearance by waiting on followers for the aborting config to be replicated from the leader. Fixes redpanda-data#17847

ztlpn · 2024-04-23T11:25:47Z

This was indirectly caused by #17789 that fixed a bug in offset translation of log end offset (and as a result fetch offset validation became stricter). In case of force-abort there is a log discrepancy between leaders and followers that (after a leadership change) leads to offset-out-of-range error and fetch offset reset (previously this wasn't the case because fetch offset validation was incorrect). Although this discrepancy is kind of expected for force-abort, we can minimize the chance of it, see the attached pr.

vbotbuildovich · 2024-04-23T21:16:40Z

*https://buildkite.com/redpanda/vtools/builds/13121

Previously, when force-aborting a reconfiguration, we appended an aborting configuration on all replicas. This can lead to log inconsistencies as on followers the configuration will be duplicated (one from own append, one replicated by the leader). Although these inconsistencies are expected for force-abort, if the leader is alive, we can minimize the chance of their appearance by waiting on followers for the aborting config to be replicated from the leader. Fixes redpanda-data#17847 (cherry picked from commit 8e221d3)

dotnwat · 2024-04-26T20:51:41Z

It seems that this failure popped up in a PR run today:

#18105

https://buildkite.com/redpanda/redpanda/builds/48353#018f1bd9-a4db-4853-ad9c-e9b416447aca

https://ci-artifacts.dev.vectorized.cloud/redpanda/48353/018f1bd9-a4db-4853-ad9c-e9b416447aca/vbuild/ducktape/results/final/report.html

vbotbuildovich · 2024-04-27T21:14:11Z

*https://buildkite.com/redpanda/redpanda/builds/48381

vbotbuildovich · 2024-04-29T21:13:57Z

*https://buildkite.com/redpanda/redpanda/builds/48422

vbotbuildovich · 2024-04-30T21:13:04Z

*https://buildkite.com/redpanda/redpanda/builds/48470
*https://buildkite.com/redpanda/redpanda/builds/48468

vbotbuildovich · 2024-05-05T21:15:12Z

*https://buildkite.com/redpanda/redpanda/builds/48728
*https://buildkite.com/redpanda/redpanda/builds/48729
*https://buildkite.com/redpanda/redpanda/builds/48726

vbotbuildovich · 2024-05-07T21:14:27Z

*https://buildkite.com/redpanda/redpanda/builds/48752
*https://buildkite.com/redpanda/redpanda/builds/48760

vbotbuildovich · 2024-05-08T21:12:53Z

*https://buildkite.com/redpanda/redpanda/builds/48798

vbotbuildovich · 2024-05-10T21:13:13Z

*https://buildkite.com/redpanda/redpanda/builds/48915

vbotbuildovich · 2024-05-11T21:15:19Z

*https://buildkite.com/redpanda/redpanda/builds/48958

vbotbuildovich · 2024-05-12T21:15:34Z

*https://buildkite.com/redpanda/redpanda/builds/48979

vbotbuildovich · 2024-05-14T21:16:55Z

*https://buildkite.com/redpanda/redpanda/builds/49036
*https://buildkite.com/redpanda/redpanda/builds/49106

vbotbuildovich · 2024-05-15T21:14:14Z

*https://buildkite.com/redpanda/redpanda/builds/49174

vbotbuildovich · 2024-05-21T21:04:55Z

*https://buildkite.com/redpanda/redpanda/builds/49295
*https://buildkite.com/redpanda/redpanda/builds/49315
*https://buildkite.com/redpanda/redpanda/builds/49332
*https://buildkite.com/redpanda/redpanda/builds/49356

vbotbuildovich · 2024-05-24T21:12:26Z

*https://buildkite.com/redpanda/redpanda/builds/49519

vbotbuildovich · 2024-05-25T21:05:43Z

*https://buildkite.com/redpanda/redpanda/builds/49553

vbotbuildovich · 2024-05-26T21:09:03Z

*https://buildkite.com/redpanda/redpanda/builds/49567

vbotbuildovich · 2024-05-27T21:08:53Z

*https://buildkite.com/redpanda/redpanda/builds/49583

vbotbuildovich · 2024-05-29T21:08:05Z

*https://buildkite.com/redpanda/redpanda/builds/49619

vbotbuildovich · 2024-06-05T20:44:38Z

*https://buildkite.com/redpanda/redpanda/builds/49797
*https://buildkite.com/redpanda/redpanda/builds/49866

vbotbuildovich added auto-triaged used to know which issues have been opened from a CI job ci-failure labels Apr 12, 2024

nvartolomei mentioned this issue Apr 17, 2024

[v23.3.x] id_allocator: do not forward requests beyond first hop #17898

Merged

travisdowns added the area/replication label Apr 17, 2024

ztlpn mentioned this issue Apr 18, 2024

raft: in append_entries skip batches that we already have #17895

Merged

6 tasks

andijcr mentioned this issue Apr 19, 2024

CORE-2452 followup for azure managed identities #17840 #17953

Merged

6 tasks

mmaslankaprv mentioned this issue Apr 22, 2024

Added detection of the node with longest log during leader election process #17915

Merged

6 tasks

ztlpn self-assigned this Apr 22, 2024

ztlpn mentioned this issue Apr 23, 2024

c/controller_backend: try to force-abort reconfiguration only on leaders #18021

Merged

7 tasks

piyushredpanda closed this as completed in #18021 Apr 24, 2024

ztlpn changed the title ~~CI Failure (key symptom) in PartitionMoveInterruption.test_cancelling_partition_move~~ CI Failure (Consumed from an unexpected offset) in PartitionMoveInterruption.test_cancelling_partition_move Apr 24, 2024

ztlpn added the ci-rca/redpanda CI Root Cause Analysis - Redpanda Issue label Apr 24, 2024

michael-redpanda mentioned this issue Apr 25, 2024

[v23.3.x] Address oversized allocs across kafka API and schema registry #18053

Merged

dotnwat reopened this Apr 26, 2024

dotnwat mentioned this issue Apr 26, 2024

[CORE-2062] rpc: convert rpc module to new-style module #18105

Merged

7 tasks

mmaslankaprv mentioned this issue Apr 30, 2024

Fixed possible log discrepancy when using forced reconfiguration #18153

Merged

7 tasks

bharathv mentioned this issue May 8, 2024

ducktape: deflake test_adding_nodes_to_cluster #17956

Merged

6 tasks

WillemKauf mentioned this issue May 13, 2024

[v24.1.x] cloud_storage: correct list_object() request headers and parameters (manual backport) #18448

Merged

7 tasks

ztlpn mentioned this issue May 16, 2024

[v23.2.x] Fix some concurrent memory access problems in partition balancer #18498

Merged

bharathv mentioned this issue May 20, 2024

rm_stm: prepping for log_state port #18516

Merged

7 tasks

piyushredpanda mentioned this issue May 23, 2024

[v24.1.x] http: Fix double call to stop() in http::client #18428

Merged

mmaslankaprv closed this as completed in #18153 Jun 6, 2024

vbotbuildovich mentioned this issue Jun 6, 2024

[v24.1.x] CI Failure (Consumed from an unexpected offset) in PartitionMoveInterruption.test_cancelling_partition_move #18834

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI Failure (Consumed from an unexpected offset) in `PartitionMoveInterruption.test_cancelling_partition_move` #17847

CI Failure (Consumed from an unexpected offset) in `PartitionMoveInterruption.test_cancelling_partition_move` #17847

vbotbuildovich commented Apr 12, 2024 •

edited by jira bot

Loading

vbotbuildovich commented Apr 13, 2024

vbotbuildovich commented Apr 16, 2024

ztlpn commented Apr 23, 2024

vbotbuildovich commented Apr 23, 2024

dotnwat commented Apr 26, 2024

vbotbuildovich commented Apr 27, 2024

vbotbuildovich commented Apr 29, 2024

vbotbuildovich commented Apr 30, 2024

vbotbuildovich commented May 5, 2024

vbotbuildovich commented May 7, 2024

vbotbuildovich commented May 8, 2024

vbotbuildovich commented May 10, 2024

vbotbuildovich commented May 11, 2024

vbotbuildovich commented May 12, 2024

vbotbuildovich commented May 14, 2024

vbotbuildovich commented May 15, 2024

vbotbuildovich commented May 21, 2024

vbotbuildovich commented May 24, 2024

vbotbuildovich commented May 25, 2024

vbotbuildovich commented May 26, 2024

vbotbuildovich commented May 27, 2024

vbotbuildovich commented May 29, 2024

vbotbuildovich commented Jun 5, 2024

CI Failure (Consumed from an unexpected offset) in PartitionMoveInterruption.test_cancelling_partition_move #17847

CI Failure (Consumed from an unexpected offset) in PartitionMoveInterruption.test_cancelling_partition_move #17847

Comments

vbotbuildovich commented Apr 12, 2024 • edited by jira bot Loading

vbotbuildovich commented Apr 13, 2024

vbotbuildovich commented Apr 16, 2024

ztlpn commented Apr 23, 2024

vbotbuildovich commented Apr 23, 2024

dotnwat commented Apr 26, 2024

vbotbuildovich commented Apr 27, 2024

vbotbuildovich commented Apr 29, 2024

vbotbuildovich commented Apr 30, 2024

vbotbuildovich commented May 5, 2024

vbotbuildovich commented May 7, 2024

vbotbuildovich commented May 8, 2024

vbotbuildovich commented May 10, 2024

vbotbuildovich commented May 11, 2024

vbotbuildovich commented May 12, 2024

vbotbuildovich commented May 14, 2024

vbotbuildovich commented May 15, 2024

vbotbuildovich commented May 21, 2024

vbotbuildovich commented May 24, 2024

vbotbuildovich commented May 25, 2024

vbotbuildovich commented May 26, 2024

vbotbuildovich commented May 27, 2024

vbotbuildovich commented May 29, 2024

vbotbuildovich commented Jun 5, 2024

CI Failure (Consumed from an unexpected offset) in `PartitionMoveInterruption.test_cancelling_partition_move` #17847

CI Failure (Consumed from an unexpected offset) in `PartitionMoveInterruption.test_cancelling_partition_move` #17847

vbotbuildovich commented Apr 12, 2024 •

edited by jira bot

Loading