Improve accessibility of generated PDF #97128

oddball-lindsay · 2024-11-14T00:22:27Z

Background

We know that accessible PDFs is a larger issue across the VA.gov platform and while we may not be able to reach the gold standard, it seems like there are some things we can do to improve the accessibility as much as we can.

These are some issues that were reported in our accessibility audit:

There’s no default language set on the PDF.
The PDF doesn’t have a title.
The generated PDF filled-out forms aren’t tagged, so a screen reader user would have no way of knowing if text on a page was a paragraph, a heading, in a table, etc. The screen reader and the PDF reader will make their “best attempt” to figure it out, but it could easily be wrong. Here’s an extreme example.
- There will likely be accessibility issues after it’s tagged, that require fixing/improving the tags.
- The form is laid out as a table, but isn’t tabular data. So someone would likely need to pull content out of any <table> tags and remediate.
- The blank 21-22 does not have issues (it has a title, a language, and tagging) so something is going on with our generation to "flatten" the PDF and remove these details.

Some additional guidance from accessibility:

It's not dissimilar to an HTML page:

page language needs to be defined (English, Spanish, etc)

PDF needs a title (just like HTML needs a <title>)

all content needs to be accurately tagged (the tags themselves are a bit different, but the concept is the same as HTML)

color contrast needs to meet WCAG standards

needs to work with reflow / resizing (just like an HTML page!)

etc
Sec 508 has some guides on how to make accessible PDFs. https://www.section508.gov/create/pdfs/authoring-guides/

Tasks

Set a default language for generated 21-22 and 21-22a PDFs
Set a title for generated 21-22 and 21-22a PDFs (we can match the title of the downloaded PDF)

Tagging:

Generate PDFs at each stage of the process (combining pages, filling out data, creating the "next steps" cover page) to try pinpointing where the tags are being lost.
Connect with Jamie Klenetsky Fay, our CAIA accessibility partner, to assess the level of accessibility of the PDFs at each stage
Assess/address next steps based on conversation with Jamie

Acceptance Criteria

The generated 21-22 and 21-22a PDFs have a default language
The generated 21-22 and 21-22a PDFs have titles
The generated 21-22 and 21-22a PDFs have tagging

The text was updated successfully, but these errors were encountered:

oddball-lindsay · 2024-12-12T17:18:27Z

This was flagged as launch-blocking in Staging Review:

Staging Review finding: Downloadable PDF isn't screen reader friendly #99005

I encourage you to check out that Staging Review ticket for additional context, but here are some highlights

The cover sheet doesn't announce any heading semantics, and the form itself announces sections in a seemingly random order.

If it's not possible to generate an accessible PDF version, then the next best option would be to generate an accessible version in another file type (HTML, txt, something else) that could be presented as an optional download.

They also advise working with CAIA which we already are.

Please close the related Staging Review ticket when this is addressed!

oddball-lindsay · 2024-12-12T18:09:31Z

The "Next steps" page may be causing the loss of accessibility, will verify on Monday during a call with Jamie.

Another idea: a separate download for Next Steps.

opticbob · 2024-12-12T23:03:21Z

I'll meet on Monday the 16th with @coforma-jamie to discuss ways we can get the PDFs accessible.

I'll have ready:

flattened and unflattened versions of just the filled form, before we combine it with the next steps page
A combined version of both of those files with the next steps page where we use hexapdf to combine the pages

I'll try to research generating the next steps page with the proper tags in prawn and hexapdf.

opticbob · 2024-12-16T18:58:04Z

I met with @coforma-jamie and @jquispe-oddball this morning to look at some example files and discuss accessibility in general. All of the example files we tested were unflattened and editable. Some notes:

PDFs merged with hexapdf vs combine_pdf behaved better in Acrobat. Files that were filled out had their data present in Acrobat when merged with hexapdf and the file was still editable. When looking at combine_pdf merged files in Acrobat, the files are editable but you can't see any of the prefilled data.
Serving the user flattened non-editable files is the way to go if we can't get the underlying tagging of the template form to be maintained. A screen reader will have a much easier time navigating just untagged text rather than untagged text and untagged input fields.
Switching the order of the PDFs so that the filled out form was first and the next steps page was last, then combining them with combine_pdf was not an improvement.
Jamie showed us the underlying tagging of a couple of different PDF products from the VA. The tagging of the form 10-10EZ was incredibly thorough. I'm going to try to talk to some other teams that are doing form filling of PDFs and see how their filled in forms check out in terms of accessibility.
hexapdf doesn't have direct support for writing semantic tags into a PDF document. You can use it to modify the structure tree of a PDF and it is possible we can do that to write some rudimentary tags. I'm going to do so for a simple next steps page to see if I can get H1, H2, and P tags to read in Acrobat.

opticbob · 2024-12-16T21:45:50Z

I've migrated the merging of PDFs over to hexapdf and will ask other VA engineers about the accessibility of their filled in PDFs in the morning. I'll also examine their PDF testing fixtures and start generating a PDF with just the form, without any merges to see if that can get us most of the way there.

opticbob · 2024-12-17T22:00:42Z

I modified the code to write directly to the form with no next steps page and no PDF merge of any kind. With that change I generated fresh 2122 and 2122a files. @coforma-jamie looked over the files and they were a big step forward accessibility wise.

Fields and content were annotated but we did not confirm that all headings matched the content and were correct. For example, the NOTE directly under section I was tagged as an H3 on one of the files which wasn't a good match for what that content actually was conveying. I'd guess they just wanted a way to make it bold. I would guess that the annotations in our filled out files match the originals.

Jamie looked at the tab ordering on the versions I sent her and the originals, and they were the same. The original files direct from the VA have nonsensical tab order and our filled out versions duplicate that.

Our filled out versions do look like they have additional tags that are empty of content.

Recommendation:

I think we should remove the next steps PDF page and just fill out the form and return it to the user.

Both combine_pdf and hexapdf remove the form tagging from the form when merging PDFs, and as of right now our next steps page is untagged Neither of the libraries we use for writing PDF files (prawn and hexapdf) support directly tagging content written in the document. The next steps page does make sense if read top to bottom by a screen reader but all the text would be treated the same, there would be no sense of headers vs paragraphs.

I remember a discussion about the possibility of removing the next steps page but I can't find it now. When I find it I'll link it here.

opticbob · 2024-12-18T18:03:05Z

I discussed this today with @coforma-jamie and @EvanAtCoforma . Some notes:

The reading order is nonsensical and shared with the underlying blank PDF.
The reading order in the underlying form isn't going to change and we can't manipulate it.
There are discrepancies in viewing the PDFs in different apps, some apps read the filled in data, some don't. This is widespread across the VA.
Depending on the quality/sophisitcation of the screen reader the filled in text can be read very differently. Example.
The PDFs without the next steps page aren't accessible, but they are more accessible than they were when they were reviewed for staging review.

opticbob · 2024-12-18T19:12:05Z

After discussing the points above with the team we've got the following recommendation:

Remove the next steps page from the PDF and make it a separate non-required download link on the download page.

This allows us to get the more accessible version of the PDF form served to the user while also keeping that additional source of next steps information. This will require some additional frontend and backend work.

opticbob · 2024-12-18T22:05:42Z

We've got the go ahead to separate the next steps page from the PDF. I'm starting on that now.

Reference.

opticbob · 2024-12-20T18:21:35Z

The PR to remove the next steps page has been approved and merged. Work to make the next steps page a separate download has started: #99525

oddball-lindsay added the accredited-representation-management-team Accredited Representation Management team label Nov 14, 2024

oddball-lindsay assigned opticbob Nov 14, 2024

oddball-lindsay added backend mvp Initial version of thing labels Nov 14, 2024

oddball-lindsay added this to the ARM Development: Appoint a Representative 1.0 (MVP) milestone Nov 29, 2024

opticbob mentioned this issue Dec 17, 2024

97128 improve accessibility of generated pdf by removing next steps page department-of-veterans-affairs/vets-api#19937

Merged

9 tasks

opticbob mentioned this issue Dec 19, 2024

Create new next steps page downloads #99525

Open

6 tasks

opticbob closed this as completed Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve accessibility of generated PDF #97128

Improve accessibility of generated PDF #97128

oddball-lindsay commented Nov 14, 2024 •

edited by opticbob

Loading

oddball-lindsay commented Dec 12, 2024

oddball-lindsay commented Dec 12, 2024 •

edited

Loading

opticbob commented Dec 12, 2024

opticbob commented Dec 16, 2024

opticbob commented Dec 16, 2024

opticbob commented Dec 17, 2024

opticbob commented Dec 18, 2024

opticbob commented Dec 18, 2024

opticbob commented Dec 18, 2024

opticbob commented Dec 20, 2024

Improve accessibility of generated PDF #97128

Improve accessibility of generated PDF #97128

Comments

oddball-lindsay commented Nov 14, 2024 • edited by opticbob Loading

Background

Tasks

Acceptance Criteria

oddball-lindsay commented Dec 12, 2024

oddball-lindsay commented Dec 12, 2024 • edited Loading

opticbob commented Dec 12, 2024

opticbob commented Dec 16, 2024

opticbob commented Dec 16, 2024

opticbob commented Dec 17, 2024

Recommendation:

opticbob commented Dec 18, 2024

opticbob commented Dec 18, 2024

opticbob commented Dec 18, 2024

opticbob commented Dec 20, 2024

oddball-lindsay commented Nov 14, 2024 •

edited by opticbob

Loading

oddball-lindsay commented Dec 12, 2024 •

edited

Loading