You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please describe the problem you'd like to be solved.
As a someone transferring information into Archivematica I'd like to find duplicate content across AIPs so that I can understand if the content has already been stored for preservation and access, or if there is excess amounts of redundancy in the direct copies that I am maintaining.
Describe the solution you'd like to see implemented.
I would like a checksum comparison to be available somewhere in workflow that will allow me to identify duplicates. I can then make decisions based on the information returned.
Describe alternatives you've considered.
I can detect duplicates before transfer using tools that generate checksums but it is difficult to maintain state over long periods of time, and if I have many AIPs already stored, then there isn't an easy way for me to know if there is content stored that may be identical to the content that I am transferring.
For Artefactual use:
Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:
All PRs related to this issue are properly linked 👍
All PRs related to this issue have been merged 👍
Test plan for this issue has been implemented and passed 👍
Documentation regarding this issue has been written and it has been added to the release notes, if needed 👍
The text was updated successfully, but these errors were encountered:
ross-spencer
changed the title
Problem: Detect duplicate files across AIPs and pipelines
Problem: Detect duplicate files and folders across AIPs and pipelines
Jan 21, 2019
Please describe the problem you'd like to be solved.
As a someone transferring information into Archivematica I'd like to find duplicate content across AIPs so that I can understand if the content has already been stored for preservation and access, or if there is excess amounts of redundancy in the direct copies that I am maintaining.
Describe the solution you'd like to see implemented.
I would like a checksum comparison to be available somewhere in workflow that will allow me to identify duplicates. I can then make decisions based on the information returned.
Describe alternatives you've considered.
I can detect duplicates before transfer using tools that generate checksums but it is difficult to maintain state over long periods of time, and if I have many AIPs already stored, then there isn't an easy way for me to know if there is content stored that may be identical to the content that I am transferring.
For Artefactual use:
Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:
The text was updated successfully, but these errors were encountered: