-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fido identifying some XLSX, PPTX, and DOCX as fido-fmt/{x} #152
Comments
If this is an error for these handful of files but not other files of its type, is the outcome better that FIDO should return the generic Microsoft OOXML with a standard PRONOM fmt/189 ID, rather than the custom fido-fmt ID? Asking because I get the same results in master but don't necessarily have the bandwidth to fully investigate and change and test a larger solution for these Microsoft files, but I can remove the custom fido-fmts which will produce fmt/189 results (better for preservation..?) |
@carlwilson I investigated this using commit 6211d66 of the From what I can see for the files in the
Then the priority logic determines that The difference with the files in the
From it a container type Do you have any advice on how to proceed with this? |
Hackathon 2023 Review: Selected for initial tasks. @replaceafill, sorry to do this again, but you're already here. I suggest prioritising this over #94, as it's likely a quicker win. |
@carlwilson if we remove these custom |
That's a good question @replaceafill and one I'm a little too busy to think about right now. Feel free to have a think and suggest something, if not I'll give this some serious thought week starting 31/7. |
Dev Effort
1D
Description
Via @sromkey the MS-Office Open XML files in this Archivematica test data zip are being identified as
fido-fmt/{x}
in Fido:If the
fido-fmt{x}
entries are removed as per here: #36 (comment) then the closest match seems to be generic OOXML:Unfortunately the Skeleton Suite looks like it won't help debug here as the extracted samples (three per puid) all identify correctly.
I have extracted the samples and the skeleton files here for easy access.
NB. Also noted by Sarah is that Siegfried will identify the formats correctly:
The text was updated successfully, but these errors were encountered: