-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform a "backup" of a nested dataset hierarchy on a crippled-fs harddrive #62
Comments
get
on a crippled FS
This had a follow up in the office hour chat and the office hour today. Out of multiple subdatasets, most were pushed to the RIA without an issue, but two did not:
There was an issue about the same error datalad/datalad#5613 which was solved by upgrading git-annex (to 10.20220128) - since the current issue was reported using an older version, we need to wait and see if an update solves the problem. |
IMO there are two issues here:
|
Paraphrased steps followed by the user: Creating the backup
Cloning from the backup
Fetching updates
|
Most of the above is explained in https://handbook.datalad.org/en/latest/beyond_basics/101-147-riastores.html, but I think this compact use case can still stand on its own as a KBI. |
More traffic on this issue today. I have a windows machine and will spend some time looking into ria-stores on NTFS |
User reported that this was no longer an issue for them (they didn't have to use that solution anymore, and they won't be spending time debugging it anymore). So for the purpose of solving the user's problem, this issue is not needed anymore. But for the purpose of writing a KBI, this issue can remain open, pending a test on a windows system and the KBI writeup. |
Origin: DataLad office hour chat 2023-05-08
While performing a recursive
get
of a superdataset clone (multiple subdatasets) onto a crippledFS external harddrive, the user aborted the command and was left of modified dataset clones.TODO (not necessarily to be performed in this order)
Capturing relevant pieces from my reply:
Instead of getting a nested hierarchy of a single version snapshot of your data, it would actually be a full backup (all data, all versions), and it would not suffer from the limitations of your hard-drive file system as much (unverified speculation).
The downside is that it won't look as pretty
But this is our standard solution for collaboration (push/pull) using a location that is not ready for git-annex
if you like papers more than online handbooks: https://doi.org/10.1038/s41597-022-01163-2
Roughly summarizing the difference between what you tried and what this different approach would mean:
This means you will work exclusively in your main dataset clone.
The resulting "RIA store" on the harddrive, can be added to other existing clones as a remote, and they will be able to pull data from it. You would be able to continue to push data (new versions) onto the drive, without having to replace/delete anything (until you run out of space)
(At which point you can detect and cleanup versions you no longer need).
RIA stores also support compressed archives -- so your harddrive might last for quite a bit
CAUTION: I am not aware of anyone having actually tried putting a RIA store on an external harddrive with a non-POSIX filesystem. I expect this to work, but there is no hard evidence for this claim.
The text was updated successfully, but these errors were encountered: