Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault FS/Git Consistency - Atomic Writes/Consistent Writes in the Lower FS #180

Closed
joshuakarp opened this issue Jun 2, 2021 · 16 comments
Closed
Assignees
Labels
development Standard development r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management

Comments

@joshuakarp
Copy link
Contributor

joshuakarp commented Jun 2, 2021

Created by @CMCDragonkai

The lower fs provides the persistence of writes to our "files" in Polykey. We allow incremental mutation of "files" in Polykey. So that means given a 1 GiB file, editing the middle of the file doesn't mean rewriting the entire file. It's supposed to just edit and mutate that small partition in the middle.

However let's say that the buffer we are writing into the middle is 10 MiB. Then the problem is that when passing 10 MiB into the lower FS which is backed by the Node FS, that results in potentially multiple calls to the write syscall. And multiple write syscalls are not atomic. So that means it is possible to corrupt the encrypted ciphertext backing the plaintext file.

We want to make sure that we have "consistent" updates to our ciphertext files so we don't leave it in an inconsistent state. That would be quite bad.

There are a couple ways of doing this:

  1. Write to temporary and perform atomic rename. (This defeats the purpose of incremental mutation)
  2. Some sort of COW system

We should look into databases and ACID implementations for some ideas here.


It appears that using Git should mean this is not a problem for us. Because we don't care atomicity on a block basis or even on a single ciphertext basis. We only care about atomicity from 1 commit transaction to another. So if something fails in the middle of a commit transaction. An automatic rollback should be made.


Resolved in https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205

@joshuakarp
Copy link
Contributor Author

@CMCDragonkai : The ability to rollback inconsistent commit transactions can be introduced in a later version. For now, for the initial release, we can handle potential inconsistent writes as a just a bug.

@CMCDragonkai
Copy link
Member

@scottmmorris this is the key point here:

It appears that using Git should mean this is not a problem for us. Because we don't care atomicity on a block basis or even on a single ciphertext basis. We only care about atomicity from 1 commit transaction to another. So if something fails in the middle of a commit transaction. An automatic rollback should be made.

So we need to be able to detect if we have screwed up a write to the Vault. If this occurs, we would want to rollback to an earlier commit since isogit should have kept a snapshot from before.

First thing first is how do we know if there has been a corruption? Does the user get alerted of this? Is there some hashing mechanism we can make use of?

A potential situation is a powerful while changing a secret. At that point the write to the vault may not fully complete.

We need to consider that if we don't have an automatic commit to the isogit, then it's not considered a successful write, and therefore when we launch the vault again, it should reset to the last good commit, no unclean working space is allowed.

@CMCDragonkai CMCDragonkai changed the title Atomic Writes/Consistent Writes in the Lower FS Vault FS/Git Consistency - Atomic Writes/Consistent Writes in the Lower FS Jul 3, 2021
@CMCDragonkai CMCDragonkai added the design Requires design label Jul 9, 2021
@CMCDragonkai CMCDragonkai added this to the Release Polykey CLI 1.0.0 milestone Jul 9, 2021
@CMCDragonkai
Copy link
Member

We should review edgecases for when partial commits occur, when there is dirty data in the working directory of a vault what happens.

@CMCDragonkai
Copy link
Member

This is primarily about creating test cases for dirty commits, or partial commits. What happens and how do we ensure that the vault stays clean? I think you encountered a bit of this already @scottmmorris in the vaults refactoring. You can address this issue there too.

@CMCDragonkai
Copy link
Member

One idea is to reset the git repository everytime you start the vault. That way at soon as you start js-polykey, the vaults are on a clean state.

@CMCDragonkai
Copy link
Member

@CMCDragonkai CMCDragonkai added development Standard development and removed design Requires design labels Jul 26, 2021
@scottmmorris
Copy link
Contributor

scottmmorris commented Aug 3, 2021

It appears iso-git doesn't have a simple way of resetting the repository either through the likes of a git reset or git clean.

This might mean we need to do a recursive walk through the vault directory, checking the status of each file and deleting it if the status is added.

Another possible (but slightly more out there) method would be to clone the repo from itself each time on start up and overwrite it. I'm not even sure this would be possible, but it seems more complex and harder to implement.

I think I will try the first solution now, as we have a recursive walk function already and the expense to resources for this I don't think will be too high.

I'm still trying to work out the best solution for the modification and deletion commit fail scenarios

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Aug 3, 2021 via email

@scottmmorris
Copy link
Contributor

So after testing you I couldn't find a way to use git reset to discard untracked files. The only command that seems to deal with this is git clean. Unfortunately libgit2 also does not specify a clean command along with iso-git.

@scottmmorris
Copy link
Contributor

The stack overflow answer that Roger provided uses the git stash command and that appears to work. libgit2 does have the stash command and uses it. I think the way it uses it is that it adds and commits all the untracked changes and then this allows it to group all these untracked changes into one commit. This should work for our case, on starting up a vault we add all changes that have not been staged and then we can even just force checkout from there and not need to commit those changes. This also requires us to be able to git add all files within the git directory. Which isn't something that isomophic git explicitly implements, although should be fairly simple to do.

@scottmmorris
Copy link
Contributor

So something like this should work (but I will run a test first) when starting up the vault:

        await git.add({
          fs: this.efs,
          dir: '',
          filepath: '',
        });
        await git.checkout({
          fs: this.efs,
          dir: '',
          force: true,
        });

Then the filepath parameter here would have to be changed. Here we would need something that indicated all the file paths within the directory. Then we would add them all and then the checkout would delete them

@scottmmorris
Copy link
Contributor

scottmmorris commented Aug 5, 2021

Okay so I've run into a problem, when trying to checkout there are some cases where it will fail. i.e. if you have just started up and your first commit fails, you close and start up again. git will not be able to find the master branch because there isn't one technically. Because no commits have been made to the branch.

I found the solution to this is to just introduce an intial commit on startup of the vault. This wont interfere with the history as nothing will be done on this commit.

So after having this, the approach works!

EDIT: After having a look it turns out we already had an initial empty commit already.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Aug 5, 2021 via email

@scottmmorris
Copy link
Contributor

This is being delayed until the changes in EFS are merged as there is a bug in EFS preventing this from being completed

@CMCDragonkai
Copy link
Member

This was done by merging of vaultsrefactoring https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205.

Docs will be MatrixAI/Polykey-Docs#3.

@CMCDragonkai
Copy link
Member

Follow up issue here MatrixAI/js-encryptedfs#55 to maintain better consistency using the Git system. Right now we have a hack using the git status.

@CMCDragonkai CMCDragonkai added the r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management label Jul 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:core activity 1 Secret Vault Sharing and Secret History Management
Development

No branches or pull requests

4 participants