-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Citation: Remove MD5s, if you have UNF #2192
Comments
MD5s are commonly used to verify that files were not corrupted during download. Every Mac and and Linux box has the native ability to calculate an MD5 of a file. For Windows it's a supported addon: https://support.microsoft.com/en-us/kb/841290 |
This is issue is not well defined as it is. @thegaryking does this question apply to subsettable Tabular files? why is it bad to have both UNF and MD5? as @pdurbin comments, MD5 are the most commonly used by preservation groups and libraries, and can be useful in addition to UNF to verify the original deposited file. |
we're first and foremost trying to communicate with users, almost none of then from a UI/user understanding point of view, there will be only one GaryGary King - Albert J. Weatherhead III University Professor - Director, On Fri, May 22, 2015 at 2:06 PM, Merce Crosas notifications@github.com
|
Yes, I agree with the general approach, but we need to do some research to
I agree it would be great, as you say, to generalize UNF and make it work Mercè Crosas, Ph.D. On Sat, May 23, 2015 at 10:27 AM, Gary King notifications@github.com
|
I agree with @mcrosas that the MD5 checksum should exist for every file to ensure bit-level preservation. When I presented a preview of Dataverse 4.0 to the Library of Congress' National Digital Stewardship Alliance in the Fall they were particularly impressed that we included MD5s for all our files. Here's a blog post from them discussing the importance of file fixity/data integrity: http://blogs.loc.gov/digitalpreservation/2014/04/protect-your-data-file-fixity-and-data-integrity/ MD5 is a standard that the digital archival community trusts whereas UNF was unknown to them. I don't think we should replace MD5s for all files with UNFs if the community isn't using them outside of Dataverse. |
ok, but let's get MD5s out of the file list now. we can stick it in the and separately, let's create a google doc or something with specifications GaryGary King - Albert J. Weatherhead III University Professor - Director, On Sat, May 23, 2015 at 11:01 AM, Eleni Castro notifications@github.com
|
Maybe we should remove both UNFs and MD5s from the default listing for files. They add a lot of noise. I just clicked on a random dataset and saw this for a PowerPoint file: Isn't this a little... noisy... busy... unfriendly? Who cares that it's MD5 is Sure, show that it's 3 megabytes. Show the date it was uploaded. Stuff like MD5 and UNF could be hidden behind a "details" link, perhaps with some definitions of what MD5 and UNF even are. |
@pdurbin we should not just remove these from the file cards without the appropriate research and consideration of preservation good practices - it has been an expectation from users and partners to easily find the fixity even if it's not used all the time. But you bring good points, it's worth reviewing if they can be displayed in another place. I'm assigning this issue to @mheppler and adding it to the "In Design" milestone, following our process. Once a designed is proposed and reviewed (by @thegaryking and partners who requested the MD5), we'll move it a Release Candidate milestone. To summarize, based on @thegaryking comments above:
|
@pdurbin - yes, the long Microsoft mime types are terrible. But we have a mechanism for dealing with this - it's just a matter of adding the "friendly" version of it (such as "PowerPoint") to the list we maintain. (it's a .property file). |
Thank you for commenting on that @landreev. I was going to ask you about these "friendly" file types, since I recall going over these with you for the file icons. We should separate out that task of identifying as many of these file types as we can in our current production data, and giving them friendly labels. |
@mheppler |
Need to discuss this during a UI/UX team meeting to brainstorm ideas on how to show more file metadata without being overwhelming in the file card on the dataset page. Perhaps having a files metadata section in the metadata tab. @mcrosas @mheppler |
After reviewing it with @eaquigley and @mheppler we plan to move this to 4.0.3. |
Have a section in the metadata tab that is "Files" and displays this extra metadata (MD5 shows here and not on the files card if a UNF is available). |
I had one question about this--- Is there a safeguard in place to ensure MD5 gets assed when tabular ingest fails for any reason? We have so enough failures at the moment to cause me to ask. Sonia Barbosa Dataverse 4.0 is now available for use! All test dataverses should be created in 4.0 Demo! Join our Dataverse Community! From: Michael Heppler [notifications@github.com] Mockups: � |
Note: With the file landing page being pushed to 4.3, this removes the "Original File MD5" for tabular files completely from the UI. |
OK looks good, closing. |
Gary's sent the following:
why are there MD5's? these I think should all be removed. we have UNFs
instead.
The text was updated successfully, but these errors were encountered: