-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: delete lots of data from indexers when leaving project #469
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through this and related code I have some thoughts:
- We need to close cores before we unlink/purge them - this should cancel replication requests and closes file descriptors
- We should probably unreplicate cores that we are deleting before deleting them
- We should write the updated role for leaving a project before deleting anything
- We should probably wait for (pre)sync to complete before deleting a project - ensuring connected peers get all the data from the departing device, including the role record that they have left.
- I'm not entirely clear about the state of a project instance after leaving - not everything is closed, but I'm not sure if the instance will work any more, or just have undefined behaviour?
- I'm not sure about deleting schemas from the indexer separately from deleting the multi-core-indexer state - what happens if we don't delete a schema in a namespace but we do delete the multi-core-indexer?
- I'm worried we're introducing complicated states where the indexes could be out-of-sync with the underlying cores. Would it be safer to just delete all indexes and index reader states and then re-index the cores we have left?
- I think we need to maintain some kind of left project state, so that when a new project instance is created for a project that has been left, it doesn't try to re-sync everything and re-create the writer cores.
Sorry for inconclusive review - this PR raises a lot of questions / highlights some potential issues.
I think we're doing that here:
I'm not super familiar with its internals but it looks like Hypercore might do this for us?
Done in ab4d9c1.
At worst, we might've left a project but not cleaned up one or both of the indexers due to a crash or other failure. We could add something to check this at startup. Pseudocode: onStartup(() => {
for (const leftProject in getLeftProjects()) {
leftProject.makeSureCoresAndIndexesAreDestroyed()
}
})
I defer to your expertise here but I think we're just clearing the indexes and closing/deleting the cores. I suppose cores could have data in them that hasn't made it to the indexes and never will, but I think that's fine? This feels like one of a few questions we haven't answered about leaving projects. Others include:
I think these are separate tasks, but I agree. |
I'm going to file the following issues if that seems reasonable to you:
Do those three tasks seem reasonable? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems ok to me, although I admit I can't quite get my head around possible edge-cases, but what is here looks like it should work. It could be worth spending some time writing out some more detailed tests of how we expect devices that have left a project to behave when connecting to others. I think the other aims of project leaving are:
- Remove others data to free up disk space - can we test for this?
- Keep own data so that it can be recovered later - can we test for this?
Good idea. I'll plan to do this in #585. |
When you leave a project, we now delete all indexed data outside of the
auth
namespace. Fixes #427.Co-Authored-By: Andrew Chou andrewchou@fastmail.com