Use remote recovery when topic is configured to use remote data #22908

mmaslankaprv · 2024-08-16T12:43:55Z

When topic is configured to use remote data the data can be used when
force recovering partitions that lost majority. In this case instead of
creating an empty partition replica instance we customize arguments
passed into the partition_manger::manage method to enable remote
recovery of replica data.

Backports Required

Release Notes

Improvements

with this improvement the force reconfigured partitions will be backfilled from Tiered Storage even if they lost all local data

vbotbuildovich · 2024-08-16T15:35:50Z

new failures in https://buildkite.com/redpanda/redpanda/builds/53044#01915b88-9afa-4011-9e70-1a139c563e0f:

"rptest.tests.cluster_config_test.ClusterConfigTest.test_valid_settings"

new failures in https://buildkite.com/redpanda/redpanda/builds/53548#01918f2f-9bd9-49ec-b693-5be4f3e5348e:

"rptest.tests.partition_force_reconfiguration_test.PartitionForceReconfigurationTest.test_basic_reconfiguration.acks=1.restart=True.controller_snapshots=False"

new failures in https://buildkite.com/redpanda/redpanda/builds/54942#01921e82-34f4-4454-997c-574449687a2a:

"rptest.tests.leadership_transfer_test.AutomaticLeadershipBalancingTest.test_automatic_rebalance"

vbotbuildovich · 2024-08-16T15:37:17Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53044#01915b87-27a9-43bf-9b32-91e32ab0071a

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53044#01915b88-9afd-4965-b953-53c8457c84fc

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53119#019169bf-5f90-4ecd-b3f3-24507e924a68

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53207#01916feb-b5ff-484b-a33d-613e63733355

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53207#01916fed-9b30-482e-bac9-8fe6c015522d

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53207#01916fed-9b2f-4907-b3ce-ac2f712a72e7

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53207#01916fed-9b2d-4d0b-9df3-e6bdb31126dd

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53207#0191752f-380d-4ad6-b97a-6cf7cba5dbfd

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53548#01918f2f-9bd4-493d-a3a1-88bb8ef2a647

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53548#01918f2f-9bd9-49ec-b693-5be4f3e5348e

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53548#01918f49-493a-47a4-9e86-81a155b3d090

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53548#01918f49-4934-4011-9d02-c276f30102a7

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53600#01919316-c335-432e-a052-8a191ce548c4

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53600#01919316-c330-4abb-8d73-4d50eedf87d4

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53662#01919807-fe50-4ae8-a476-72514e6cfd3f

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53662#01919807-fe4c-4b13-9001-086a4c517a9b

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53880#0191b189-f141-4c76-98c8-b35d46fb5f91

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/53880#0191b189-f148-4ad6-8b9a-ec57a5c7ad2f

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54508#0191f9aa-8240-4ac8-9c5a-d63ce498fad7

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54508#0191f9ab-d05d-4db2-a684-8dfb53170198

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54508#0191f9ab-d05a-4a42-858e-5dbdaa42692e

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54508#0191f9ab-d061-484c-b2a5-28634b79e61d

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54508#0191f9ab-d064-4acb-a24a-0be7aec62676

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54942#01921e82-34f3-4107-86b6-4496dd60bcb0

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54942#01921e82-34f4-4454-997c-574449687a2a

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/54942#01921e82-34f6-4e04-85d8-30f6c0f7f81f

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/55444#0192436b-e48b-420f-ac68-83da7bf83a1e

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/55444#0192436b-e487-4b93-b517-6f6854e2c381

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/56133#01927238-3452-412c-96e7-637887f14eaf
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/56133#01927238-3456-4e01-b896-a0248bad2ad4

src/v/cluster/controller_backend.cc

tests/rptest/services/redpanda.py

bashtanov · 2024-08-16T13:30:22Z

tests/rptest/services/redpanda.py

        admin_client = admin_client or self._admin
+        if tolerate_stopped_nodes:
+            started_node_ids = {self.node_id(n) for n in self.started_nodes()}


TBH I find having a variable declared/undeclared depending on the condition somewhat error-prone and harder to read too. What do you think of something like the following?

if tolerate_stopped_nodes: started_node_ids = {self.node_id(n) for n in self.started_nodes()} node_check_predicate = lambda n: n in started_node_ids else: node_check_predicate = lambda n: True ... ready = all([n['config_version'] >= config_version for n in status if node_check_predicate(n)])

src/v/cluster/controller_backend.cc

tests/rptest/tests/partition_force_reconfiguration_test.py

bashtanov

only nits left, so approving in case it's really urgent to merge

tests/rptest/tests/partition_force_reconfiguration_test.py

Signed-off-by: Michał Maślanka <michal@redpanda.com>

Added log entry emitted when partition instance is being created. The entry will allow us to quickly identify partition configuration. Signed-off-by: Michał Maślanka <michal@redpanda.com>

When topic is configured to use remote data the data can be used when force recovering partitions that lost majority. In this case instead of creating an empty partition replica instance we customize arguments passed into the `partition_manger::manage` method to enable remote recovery of replica data. Signed-off-by: Michał Maślanka <michal@redpanda.com>

mmaslankaprv · 2024-08-28T06:25:27Z

/ci-repeat 1

mmaslankaprv · 2024-08-28T14:17:20Z

unrelated test failure: https://redpandadata.atlassian.net/issues/CORE-7002

tests/rptest/tests/partition_force_reconfiguration_test.py

Added replicating some data and waiting for then to be uploaded to the cloud when executing node wise recovery. This way a test is able to verify if cloud storage data are used when force re-configuring partitions with lost majority. Signed-off-by: Michał Maślanka <michal@redpanda.com>

mmaslankaprv · 2024-09-16T05:55:27Z

/ci-repeat 1

mmaslankaprv · 2024-09-23T09:29:20Z

/ci-repeat 1

mmaslankaprv · 2024-09-30T13:33:38Z

/ci-repeat 1

bharathv · 2024-10-04T04:45:03Z

src/v/cluster/controller_backend.cc

+            // topic being cloud enabled implies existence of overrides
+            ntp_config.get_overrides().recovery_enabled
+              = storage::topic_recovery_enabled::yes;
+            rtp.emplace(*initial_rev, cfg->partition_count);


I'm not too familiar with initial_version perhaps @ztlpn can take another look.

mmaslankaprv · 2024-10-09T15:43:26Z

/ci-repeat 1

vbotbuildovich · 2024-10-09T18:55:21Z

Retry command for Build#56133

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/cloud_storage_timing_stress_test.py::CloudStorageTimingStressTest.test_cloud_storage@{"cleanup_policy":"delete"}

mmaslankaprv · 2024-10-10T06:38:30Z

known ci failure:

https://redpandadata.atlassian.net/browse/CORE-7002

github-actions bot added the area/redpanda label Aug 16, 2024

mmaslankaprv requested review from bharathv, bashtanov and ztlpn August 16, 2024 12:44

mmaslankaprv force-pushed the recovery-ts branch from f461d52 to 837455f Compare August 16, 2024 12:44

bashtanov changed the title ~~Use remove recovery when topic is configured to use remote data~~ Use remote recovery when topic is configured to use remote data Aug 16, 2024

mmaslankaprv force-pushed the recovery-ts branch from 837455f to 1321799 Compare August 16, 2024 12:55

mmaslankaprv force-pushed the recovery-ts branch from 1321799 to 12c75b9 Compare August 19, 2024 06:44

bharathv reviewed Aug 19, 2024

View reviewed changes

src/v/cluster/controller_backend.cc Outdated Show resolved Hide resolved

bashtanov reviewed Aug 19, 2024

View reviewed changes

mmaslankaprv force-pushed the recovery-ts branch from 12c75b9 to 010e107 Compare August 20, 2024 11:59

mmaslankaprv requested review from bharathv and bashtanov August 21, 2024 14:03

bashtanov previously approved these changes Aug 22, 2024

View reviewed changes

mmaslankaprv dismissed bashtanov’s stale review via 2cadadd August 23, 2024 06:55

mmaslankaprv force-pushed the recovery-ts branch from 010e107 to 2cadadd Compare August 23, 2024 06:55

mmaslankaprv requested a review from bashtanov August 23, 2024 07:16

mmaslankaprv force-pushed the recovery-ts branch 2 times, most recently from c766d9d to fef7f59 Compare August 26, 2024 13:41

mmaslankaprv added 4 commits August 27, 2024 08:01

tests/redpanda: tolerate some nodes not being alive when scrubbing

7611c78

Signed-off-by: Michał Maślanka <michal@redpanda.com>

c/backend: fixed small typo in log message

eea417a

Signed-off-by: Michał Maślanka <michal@redpanda.com>

c/partition_manager: added log describing created partition

313358a

Added log entry emitted when partition instance is being created. The entry will allow us to quickly identify partition configuration. Signed-off-by: Michał Maślanka <michal@redpanda.com>

mmaslankaprv force-pushed the recovery-ts branch from fef7f59 to b964da2 Compare August 27, 2024 08:01

bashtanov previously approved these changes Aug 30, 2024

View reviewed changes

tests/rptest/tests/partition_force_reconfiguration_test.py Outdated Show resolved Hide resolved

mmaslankaprv dismissed bashtanov’s stale review via de93ab9 September 2, 2024 05:42

mmaslankaprv force-pushed the recovery-ts branch from b964da2 to de93ab9 Compare September 2, 2024 05:42

mmaslankaprv requested a review from bashtanov September 2, 2024 05:42

bashtanov approved these changes Sep 2, 2024

View reviewed changes

bharathv approved these changes Oct 4, 2024

View reviewed changes

mmaslankaprv merged commit 082c700 into redpanda-data:dev Oct 10, 2024
15 of 18 checks passed

mmaslankaprv deleted the recovery-ts branch October 10, 2024 06:38

dotnwat requested a review from Lazin October 10, 2024 21:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use remote recovery when topic is configured to use remote data #22908

Use remote recovery when topic is configured to use remote data #22908

mmaslankaprv commented Aug 16, 2024 •

edited

Loading

vbotbuildovich commented Aug 16, 2024 •

edited

Loading

vbotbuildovich commented Aug 16, 2024 •

edited

Loading

bashtanov Aug 16, 2024

bashtanov left a comment

mmaslankaprv commented Aug 28, 2024

mmaslankaprv commented Aug 28, 2024

mmaslankaprv commented Sep 16, 2024

mmaslankaprv commented Sep 23, 2024

mmaslankaprv commented Sep 30, 2024

bharathv Oct 4, 2024

mmaslankaprv commented Oct 9, 2024

vbotbuildovich commented Oct 9, 2024

mmaslankaprv commented Oct 10, 2024

Use remote recovery when topic is configured to use remote data #22908

Use remote recovery when topic is configured to use remote data #22908

Conversation

mmaslankaprv commented Aug 16, 2024 • edited Loading

Backports Required

Release Notes

Improvements

vbotbuildovich commented Aug 16, 2024 • edited Loading

vbotbuildovich commented Aug 16, 2024 • edited Loading

bashtanov Aug 16, 2024

Choose a reason for hiding this comment

bashtanov left a comment

Choose a reason for hiding this comment

mmaslankaprv commented Aug 28, 2024

mmaslankaprv commented Aug 28, 2024

mmaslankaprv commented Sep 16, 2024

mmaslankaprv commented Sep 23, 2024

mmaslankaprv commented Sep 30, 2024

bharathv Oct 4, 2024

Choose a reason for hiding this comment

mmaslankaprv commented Oct 9, 2024

vbotbuildovich commented Oct 9, 2024

Retry command for Build#56133

mmaslankaprv commented Oct 10, 2024

mmaslankaprv commented Aug 16, 2024 •

edited

Loading

vbotbuildovich commented Aug 16, 2024 •

edited

Loading

vbotbuildovich commented Aug 16, 2024 •

edited

Loading