Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: Detect duplicate files and folders across AIPs and pipelines #448

Open
ross-spencer opened this issue Jan 21, 2019 · 0 comments
Open
Labels
IISH International Institute of Social History

Comments

@ross-spencer
Copy link
Contributor

Please describe the problem you'd like to be solved.

As a someone transferring information into Archivematica I'd like to find duplicate content across AIPs so that I can understand if the content has already been stored for preservation and access, or if there is excess amounts of redundancy in the direct copies that I am maintaining.

Describe the solution you'd like to see implemented.

I would like a checksum comparison to be available somewhere in workflow that will allow me to identify duplicates. I can then make decisions based on the information returned.

Describe alternatives you've considered.

I can detect duplicates before transfer using tools that generate checksums but it is difficult to maintain state over long periods of time, and if I have many AIPs already stored, then there isn't an easy way for me to know if there is content stored that may be identical to the content that I am transferring.


For Artefactual use:
Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

  • All PRs related to this issue are properly linked 👍
  • All PRs related to this issue have been merged 👍
  • Test plan for this issue has been implemented and passed 👍
  • Documentation regarding this issue has been written and it has been added to the release notes, if needed 👍
@ross-spencer ross-spencer changed the title Problem: Detect duplicate files across AIPs and pipelines Problem: Detect duplicate files and folders across AIPs and pipelines Jan 21, 2019
@ross-spencer ross-spencer added IISH International Institute of Social History Status: in progress Issue that is currently being worked on. labels Jan 21, 2019
@ross-spencer ross-spencer self-assigned this Jan 21, 2019
@sevein sevein removed the Status: in progress Issue that is currently being worked on. label Mar 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IISH International Institute of Social History
Projects
None yet
Development

No branches or pull requests

2 participants