-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata not rebuilt after replacing one store and hot reloading #131
Comments
See #132 (comment). After writing new data to the new config, the metadata is rebuilt as expected. I need to double check the bit about zstor rejecting writes, as that doesn't seem consistent. I was able to perform writes after trying to recreate the same conditions from the test that prompted this issue. |
From checking the code a bit, there is indeed no code that detects based on the metastore backends. cc @LeeSmet 0-stor_v2/zstor/src/actors/repairer.rs Lines 89 to 102 in cd24f42
|
Do you mean you could do |
No, that part is okay.
I was comparing to this condition, which is that writes were rejected when a full set of live metadata backends was present after a hot reload of the config. This only happened once and I'm not sure how to reproduce it reliably. To summarize, these are the concerns:
|
Found few things that we could improve
Need to be implemented
Same as meta backends, it is not implemented yet.
The periodic checking is currently run every 10 minutes, which might be overkill. I'm thinking of this way:
wdyt @LeeSmet ? |
I think it definitely makes sense to trigger repair whenever updating the backends. As for the ten minute cycle I'd say maybe we can reduce to one hour but a day feels too long. |
Actually if number (1) and (2) are working properly, we should not need this periodic checking at all. |
The periodic check will always be relevant I think. The reason is that the user can provide more data backends than |
In this case, 10 minutes is definitely too fast. So, i think the |
Agreed. I think adding this time period as a config option with a one hour default makes sense. |
This is actually not needed (unless the data backend is down, in which case its picked up by the periodic rebuilder, though you could manually trigger it). The reasoning here is as follows: the location of the shards (i.e. the 0-db backend) is recorded in the object metadata. Therefore, as long as the backend is reachable, the data can be recovered, even if said backend is no longer part of the running config. And thus it does not need to be rebuild. The running config is relevant for stores, but not for loads. For metadata backends we should rebuild the data if they are replaced, since we don't have a separate reference, so here the runnig config is relevant for both stores and loads. The vector is indeed not good, and should be replaced by a Periodic check frequency can be reduced, but if the period is longer we should make sure it still runs even under multiple process restarts |
Yes, on second thought, after more checking, i also don't think that it is needed, periodic checking will do it anyway, and the procedures will be the same
Oh, i didn't know about this. More reasons to only do it on periodic checking |
I was thinking about which keys that need to be rebuild. |
To summarize, this is the plan:
2.for meta backends, implement meta rebuild when the meta backends replaced Today i tried to implement 1.a, but facing many difficulties. |
I've fixed this on #134 |
For this, looks like better to create new issue. |
As a side note, we also have to make sure that only one rebuild running at a time, tracked at #135 |
I am trying to test this behavior, as described on the README:
As I understand it, these steps should be sufficient:
SIGUSR1
Here's my status output after step 1:
Then after step 3:
Even after some hours of waiting for the repair system to kick in and recreate the metadata shards on the new backend, there are no objects and no space used on the new backend.
I also noticed that zstor rejects writes in this state, indicating that it doesn't detect that it has four healthy metadata stores for writing new data.
The text was updated successfully, but these errors were encountered: