Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Is there any documentation on how the export directory works? #37

Open
Fuco1 opened this issue May 28, 2017 · 4 comments
Open

Comments

@Fuco1
Copy link
Contributor

Fuco1 commented May 28, 2017

It seems like it could be used for backups, but how does it work in detail? Won't it blow up exponentially if I have lots of files? How can I restore such an export in case of disk failure?

@StrumentiResistenti
Copy link
Owner

The export/ directory is dynamically generated, so there's no risk to saturate your disk because of it. It features a really basic structure. At the first level, you have all your tags. Each tag lists all the objects tagged by itself, as symbolic links to the archive/ directory. So basically if you do a tar or a zip of export/ and archive/, you're guaranteed a backup of your repository (bugs excepted: seems like triple tags are not properly listed in export/).

I don't have an automatic procedure or script for recovery. If you naively copy the export you make back into a restoring repository, tag by tag, you should end up with a copy of your original repo. However this approach is really expensive because each file could be deduplicated several times. A better solution would be (in pseudocode):

foreach tag {
  foreach file in tag {
    if (file exists in store/ALL/) {
      mv store/ALL/file store/tag/  # just retag an existing file
    } else {
      cp export/tag/file store/tag/ # copy the missing file
    }
  }
}

Or something like this. You can precisely detect objects in ALL/ because in export/ everything start by its inode number.

@Fuco1
Copy link
Contributor Author

Fuco1 commented May 29, 2017

Oh I see, so it is basically a "filesystem" dump of the database, indexed by the tags, where if a file is under say 3 tags it will be symlinked three times to the archive.

This is great. I'm currently migrating to a new system so I will put this to a test :)

@StrumentiResistenti
Copy link
Owner

Have you found it useful?

@Fuco1
Copy link
Contributor Author

Fuco1 commented Sep 22, 2017

Heh, I don't actually remember how I migrated the data, but I have the repository working so I suppose I've used this method :D It's been some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants