Metadata not rebuilt after replacing one store and hot reloading #131

scottyeager · 2024-11-22T03:02:07Z

I am trying to test this behavior, as described on the README:

A metadata store can be replaced by a new one, by removing the old one in the config and inserting the new one. The repair subsystem will take care of rebulding the data, regenerating the shards, and storing the new shards on the new metatada store.

As I understand it, these steps should be sufficient:

Start zstor with an initial config and write some data
Replace the config file with a new one where 3/4 data backends and 3/4 metadata backends are the same but one is changed (one original backend of each type has been decommissioned so the namespace is deleted)
Do a hot reload of the config with SIGUSR1

Here's my status output after step 1:

+----------------------------------------------------------------+-----------+---------+------------+------------+------------------+
| backend                                                        | reachable | objects | used space | free space | usage percentage |
+================================================================+===========+=========+============+============+==================+
| [2a02:1802:5e:0:e8f8:dcff:fe44:6b2]:9900 - 5545-708153-meta10  | Yes       |      44 |      20380 | 1073741824 |                0 |
+----------------------------------------------------------------+-----------+---------+------------+------------+------------------+
| [2a02:1802:5e:0:8c0e:bfff:fe33:fa8b]:9900 - 5545-708150-meta11 | Yes       |      44 |      20380 | 1073741824 |                0 |
+----------------------------------------------------------------+-----------+---------+------------+------------+------------------+
| [2a02:1802:5e:0:107d:6dff:fe38:c095]:9900 - 5545-708152-meta13 | Yes       |      44 |      20380 | 1073741824 |                0 |
+----------------------------------------------------------------+-----------+---------+------------+------------+------------------+
| [2a02:1802:5e:0:a89b:bfff:fec4:5205]:9900 - 5545-708151-meta8  | Yes       |      44 |      20380 | 1073741824 |                0 |
+----------------------------------------------------------------+-----------+---------+------------+------------+------------------+

Then after step 3:

+----------------------------------------------------------------+-----------+---------+------------+------------+------------------+
| backend                                                        | reachable | objects | used space | free space | usage percentage |
+================================================================+===========+=========+============+============+==================+
| [2a02:1802:5e:0:e8f8:dcff:fe44:6b2]:9900 - 5545-708153-meta10  | Yes       |      44 |      20380 | 1073741824 |                0 |
+----------------------------------------------------------------+-----------+---------+------------+------------+------------------+
| [2a02:1802:5e:0:8c0e:bfff:fe33:fa8b]:9900 - 5545-708150-meta11 | Yes       |      44 |      20380 | 1073741824 |                0 |
+----------------------------------------------------------------+-----------+---------+------------+------------+------------------+
| [2a02:1802:5e:0:107d:6dff:fe38:c095]:9900 - 5545-708152-meta24 | Yes       |       0 |          0 | 1073741824 |                0 |
+----------------------------------------------------------------+-----------+---------+------------+------------+------------------+
| [2a02:1802:5e:0:a89b:bfff:fec4:5205]:9900 - 5545-708151-meta8  | Yes       |      44 |      20380 | 1073741824 |                0 |
+----------------------------------------------------------------+-----------+---------+------------+------------+------------------+

Even after some hours of waiting for the repair system to kick in and recreate the metadata shards on the new backend, there are no objects and no space used on the new backend.

I also noticed that zstor rejects writes in this state, indicating that it doesn't detect that it has four healthy metadata stores for writing new data.

The text was updated successfully, but these errors were encountered:

scottyeager · 2024-11-23T01:31:35Z

See #132 (comment). After writing new data to the new config, the metadata is rebuilt as expected.

I need to double check the bit about zstor rejecting writes, as that doesn't seem consistent. I was able to perform writes after trying to recreate the same conditions from the test that prompted this issue.

iwanbk · 2024-11-25T03:44:08Z

From checking the code a bit, there is indeed no code that detects based on the metastore backends.
The detection is only based on the data backends.

cc @LeeSmet
This is what i mean

0-stor_v2/zstor/src/actors/repairer.rs

Lines 89 to 102 in cd24f42

    
           let backends = match backend_manager 
        
               .send(RequestBackends { 
        
                   backend_requests, 
        
                   interest: StateInterest::Readable, 
        
               }) 
        
               .await 
        
           { 
        
               Err(e) => { 
        
                   error!("Failed to request backends: {}", e); 
        
                   return; 
        
               } 
        
               Ok(backends) => backends, 
        
           }; 
        
           let must_rebuild = backends.into_iter().any(|b| !matches!(b, Ok(Some(_))));

iwanbk · 2024-11-25T03:45:20Z

I need to double check the bit about zstor rejecting writes, as that doesn't seem consistent. I was able to perform writes after trying to recreate the same conditions from the test that prompted this issue.

Do you mean you could do write when one of the metastore was down?

scottyeager · 2024-11-25T05:28:48Z

Do you mean you could do write when one of the metastore was down?

No, that part is okay.

I also noticed that zstor rejects writes in this state, indicating that it doesn't detect that it has four healthy metadata stores for writing new data.

I was comparing to this condition, which is that writes were rejected when a full set of live metadata backends was present after a hot reload of the config. This only happened once and I'm not sure how to reproduce it reliably.

To summarize, these are the concerns:

After replacing one meta and one data backend in the manner I described, repair operations for both are blocked until something happens. Making a new write unblocks them but there might be other events
In one case I also wasn't able to make new writes, despite reloading the config with four live meta backends (this is more of a side note for now, pending reproducibility)

iwanbk · 2024-11-26T06:57:07Z

Found few things that we could improve

When replacing meta backends

Need to be implemented

When replacing data backends

Same as meta backends, it is not implemented yet.
But it is already handled by the periodic checking.
So, perhaps we don't need it at all

Periodic checking
Some improvements here:

get all the keys from the metadata, currently returns a Vector, which is not scalable if number of keys is too much
checking the dead data backends. Currently the same backends could be checked multiple times, which is a waste of time.
Not sure how, but detecting meta backends should be doable as well.

The periodic checking is currently run every 10 minutes, which might be overkill.

I'm thinking of this way:

Main mechanism is number (1) & (2)
Periodic checking frequency could be reduced to once a day

wdyt @LeeSmet ?

scottyeager · 2024-11-26T07:03:58Z

I think it definitely makes sense to trigger repair whenever updating the backends.

As for the ten minute cycle I'd say maybe we can reduce to one hour but a day feels too long.

iwanbk · 2024-11-26T07:09:09Z

As for the ten minute cycle I'd say maybe we can reduce to one hour but a day feels too long.

Actually if number (1) and (2) are working properly, we should not need this periodic checking at all.
I haven't proposed to remove it because i'm still not sure that i know the whole architecture.

scottyeager · 2024-11-26T07:15:36Z

The periodic check will always be relevant I think. The reason is that the user can provide more data backends than expected shards. So there can still be a case where some backend goes down and there's a chance to restore the shard count for affected files by rebuilding it on remaining data backends.

iwanbk · 2024-11-26T07:19:42Z

So there can still be a case where some backend goes down and there's a chance to restore the shard count for affected files by rebuilding it on remaining data backends.

In this case, 10 minutes is definitely too fast.
Imagine that a node is temporary down, then we rebuild the data.
But then it goes up again.

So, i think the down definition should be more complete, like down for 1 hour?

scottyeager · 2024-11-26T07:24:56Z

Agreed. I think adding this time period as a config option with a one hour default makes sense.

LeeSmet · 2024-11-26T10:05:04Z

Need to be implemented
2. When replacing data backends

This is actually not needed (unless the data backend is down, in which case its picked up by the periodic rebuilder, though you could manually trigger it). The reasoning here is as follows: the location of the shards (i.e. the 0-db backend) is recorded in the object metadata. Therefore, as long as the backend is reachable, the data can be recovered, even if said backend is no longer part of the running config. And thus it does not need to be rebuild. The running config is relevant for stores, but not for loads.

For metadata backends we should rebuild the data if they are replaced, since we don't have a separate reference, so here the runnig config is relevant for both stores and loads.

The vector is indeed not good, and should be replaced by a Stream or channel.

Periodic check frequency can be reduced, but if the period is longer we should make sure it still runs even under multiple process restarts

iwanbk · 2024-11-26T10:15:03Z

This is actually not needed (unless the data backend is down, in which case its picked up by the periodic rebuilder, though you could manually trigger it). The reasoning here is as follows: the location of the shards (i.e. the 0-db backend) is recorded in the object metadata.

Yes, on second thought, after more checking, i also don't think that it is needed, periodic checking will do it anyway, and the procedures will be the same

Therefore, as long as the backend is reachable, the data can be recovered, even if said backend is no longer part of the running config. And thus it does not need to be rebuild. The running config is relevant for stores, but not for loads.

Oh, i didn't know about this. More reasons to only do it on periodic checking

iwanbk · 2024-11-26T10:16:33Z

For metadata backends we should rebuild the data if they are replaced, since we don't have a separate reference, so here the runnig config is relevant for both stores and loads.

I was thinking about which keys that need to be rebuild.
But then i realize that it should be ALL the keys, CMIIW @LeeSmet

iwanbk · 2024-11-27T10:56:41Z

To summarize, this is the plan:

for data backends, improve periodic checking:
a. replace object_metas (func to get all metadata) which returns Vector to something which returns Stream
b. Avoid duplicated backend checking
c. Make the period longer, 10 mins is too fast, especially for big storage

2.for meta backends, implement meta rebuild when the meta backends replaced

Today i tried to implement 1.a, but facing many difficulties.
At first i thought because i'm still not advanced in Rust language, but after some thinking, implement it as stream is indeed tricky, because the related backends could gone any time.
So, i think i'll change the 1.a to use Redis scan, and expose the cursor to the caller.

iwanbk · 2024-11-29T10:29:05Z

for meta backends, implement meta rebuild when the meta backends replaced

I've fixed this on #134

iwanbk · 2024-11-29T10:29:36Z

for data backends, improve periodic checking:
a. replace object_metas (func to get all metadata) which returns Vector to something which returns Stream
b. Avoid duplicated backend checking
c. Make the period longer, 10 mins is too fast, especially for big storage

For this, looks like better to create new issue.
Created at #136

iwanbk · 2024-11-29T13:01:08Z

As a side note, we also have to make sure that only one rebuild running at a time, tracked at #135

scottyeager mentioned this issue Nov 23, 2024

Data repair delayed #132

Open

LeeSmet assigned iwanbk Nov 25, 2024

iwanbk linked a pull request Nov 29, 2024 that will close this issue

fix: implement metadata rebuild. #134

Open

iwanbk mentioned this issue Nov 29, 2024

Improvements to data rebuild #136

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metadata not rebuilt after replacing one store and hot reloading #131

Metadata not rebuilt after replacing one store and hot reloading #131

scottyeager commented Nov 22, 2024 •

edited

Loading

scottyeager commented Nov 23, 2024 •

edited

Loading

iwanbk commented Nov 25, 2024

iwanbk commented Nov 25, 2024

scottyeager commented Nov 25, 2024

iwanbk commented Nov 26, 2024 •

edited

Loading

scottyeager commented Nov 26, 2024

iwanbk commented Nov 26, 2024 •

edited

Loading

scottyeager commented Nov 26, 2024

iwanbk commented Nov 26, 2024

scottyeager commented Nov 26, 2024

LeeSmet commented Nov 26, 2024

iwanbk commented Nov 26, 2024

iwanbk commented Nov 26, 2024

iwanbk commented Nov 27, 2024

iwanbk commented Nov 29, 2024

iwanbk commented Nov 29, 2024 •

edited

Loading

iwanbk commented Nov 29, 2024

Metadata not rebuilt after replacing one store and hot reloading #131

Metadata not rebuilt after replacing one store and hot reloading #131

Comments

scottyeager commented Nov 22, 2024 • edited Loading

scottyeager commented Nov 23, 2024 • edited Loading

iwanbk commented Nov 25, 2024

iwanbk commented Nov 25, 2024

scottyeager commented Nov 25, 2024

iwanbk commented Nov 26, 2024 • edited Loading

scottyeager commented Nov 26, 2024

iwanbk commented Nov 26, 2024 • edited Loading

scottyeager commented Nov 26, 2024

iwanbk commented Nov 26, 2024

scottyeager commented Nov 26, 2024

LeeSmet commented Nov 26, 2024

iwanbk commented Nov 26, 2024

iwanbk commented Nov 26, 2024

iwanbk commented Nov 27, 2024

iwanbk commented Nov 29, 2024

iwanbk commented Nov 29, 2024 • edited Loading

iwanbk commented Nov 29, 2024

scottyeager commented Nov 22, 2024 •

edited

Loading

scottyeager commented Nov 23, 2024 •

edited

Loading

iwanbk commented Nov 26, 2024 •

edited

Loading

iwanbk commented Nov 26, 2024 •

edited

Loading

iwanbk commented Nov 29, 2024 •

edited

Loading