-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDF/UA Requirements #1012
Comments
Slightly related to 7.x issue https://jira.duraspace.org/browse/ISLANDORA-2350 |
I'll drop a note here to say that accessibility has a relatively high priority in the Proposed Technical Roadmap. |
Should we tag this as a meta issue? |
Do we have a meta issue around accessibility? I'll note that the Manitoba government has required that government websites (and the University falls under that) must adhere to Web Consortium (W3C) 2.0 Level AA compliance. So I need it. |
In the States, we're require to adhere to the Section 508 online accessibility standards (WCAG 2.0 Level A and AA) as well. So this is an absolute requirement. This is a simple breakdown (aka checklist). This would be a quick checklist for anyone trying to create a solution for this. Although I don't much about commonlook.com it is referenced from webaim.org so it should be a good place to start. |
Do you have any suggestion with respect to OS tools that can be to generate pdf/ua? As per your ticket, current derivatives https://github.com/Islandora/islandora_solution_pack_pdf/blob/7.x/includes/derivatives.inc#L154 don't meet the accessibility criteria. Maybe gs can do the job with right configurations? |
According to the following docs, gs supports ISO 32000-1 as well as pdf 2.0 standards, which means we should be able to generate pdf/ua using gs. Samples attached below. https://artifex.com/news/ghostscript-9-22-release/ @dannylamb pdfua1
pdfua2
|
@Natkeeran Circling back to this one. Yes!!!! If we can do it with |
For a recent effort I have been doing outside of work needing to access PDF documents provided as a result of a Freedom of Information request to my provincial government, which are images-only and completely inaccessible, I have discovered the "ocrmypdf" utility. It is available via apt-get on Ubuntu and Homebrew on macOS and works very well in my testing. It is a wrapper for a lot of tesseract configurations that re-engineering would be impossible to do so I propose we should make a microservices for it. I have run into a wall without documentation for making a microservice. Before I embark on another copy and paste effort, I think there is a need for an Examples for adding a simple microservices that wraps a straight-forward Unix command. This documentation should give a step-by-step set of files to add, and in what order - one of the harder things to know when embarking on a new application in an unfamiliar framework. The output of ocrmypdf indicates that it produces PDF/A files: iMac:Charlottetown STR FOIPP Vol. 2 aoneill$ ocrmypdf "FOIPP0012 Tab 5 - Peter Kelly.pdf" output.pdf
Scanning contents: 100%|███████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 34.92page/s]
Start processing 4 pages concurrently
5 [tesseract] lots of diacritics - possibly poor OCR
8 [tesseract] lots of diacritics - possibly poor OCR
OCR: 100%|█████████████████████████████████████████████████████████████████████████████████████| 20.0/20.0 [00:08<00:00, 2.35page/s]
Postprocessing...
PDF/A conversion: 100%|████████████████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00, 8.39page/s]
JPEGs: 0image [00:00, ?image/s]
JBIG2: 0%| | 0/20 [00:00<?, ?item/s]
Optimize ratio: 1.96 savings: 49.0%
Output file is a PDF/A-2B (as expected)
iMac:Charlottetown STR FOIPP Vol. 2 aoneill$ |
Ingesting PDFs should result in a PDF/UA (ISO 32000-1) file to be generated and used for the viewer for accessibility compliance.
AIIM Technical Implementation Guide 32000-1
Some suggestions is to explore something like pandoc or drupalwxt.org to achieve this.
The text was updated successfully, but these errors were encountered: