Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve accessibility of generated PDF #97128

Closed
4 of 11 tasks
oddball-lindsay opened this issue Nov 14, 2024 · 10 comments
Closed
4 of 11 tasks

Improve accessibility of generated PDF #97128

oddball-lindsay opened this issue Nov 14, 2024 · 10 comments
Assignees
Labels
accredited-representation-management-team Accredited Representation Management team backend mvp Initial version of thing

Comments

@oddball-lindsay
Copy link
Contributor

oddball-lindsay commented Nov 14, 2024

Background

We know that accessible PDFs is a larger issue across the VA.gov platform and while we may not be able to reach the gold standard, it seems like there are some things we can do to improve the accessibility as much as we can.

These are some issues that were reported in our accessibility audit:

  • There’s no default language set on the PDF.
  • The PDF doesn’t have a title.
  • The generated PDF filled-out forms aren’t tagged, so a screen reader user would have no way of knowing if text on a page was a paragraph, a heading, in a table, etc. The screen reader and the PDF reader will make their “best attempt” to figure it out, but it could easily be wrong. Here’s an extreme example.
    • There will likely be accessibility issues after it’s tagged, that require fixing/improving the tags.
    • The form is laid out as a table, but isn’t tabular data. So someone would likely need to pull content out of any <table> tags and remediate.
    • The blank 21-22 does not have issues (it has a title, a language, and tagging) so something is going on with our generation to "flatten" the PDF and remove these details.

Some additional guidance from accessibility:

It's not dissimilar to an HTML page:

  • page language needs to be defined (English, Spanish, etc)
  • PDF needs a title (just like HTML needs a <title>)
  • all content needs to be accurately tagged (the tags themselves are a bit different, but the concept is the same as HTML)
  • color contrast needs to meet WCAG standards
  • needs to work with reflow / resizing (just like an HTML page!)
  • etc
    Sec 508 has some guides on how to make accessible PDFs. https://www.section508.gov/create/pdfs/authoring-guides/

Tasks

  • Set a default language for generated 21-22 and 21-22a PDFs
  • Set a title for generated 21-22 and 21-22a PDFs (we can match the title of the downloaded PDF)

Tagging:

  • Generate PDFs at each stage of the process (combining pages, filling out data, creating the "next steps" cover page) to try pinpointing where the tags are being lost.
  • Connect with Jamie Klenetsky Fay, our CAIA accessibility partner, to assess the level of accessibility of the PDFs at each stage
  • Assess/address next steps based on conversation with Jamie

Acceptance Criteria

  • The generated 21-22 and 21-22a PDFs have a default language
  • The generated 21-22 and 21-22a PDFs have titles
  • The generated 21-22 and 21-22a PDFs have tagging
@oddball-lindsay
Copy link
Contributor Author

This was flagged as launch-blocking in Staging Review:

I encourage you to check out that Staging Review ticket for additional context, but here are some highlights

  • The cover sheet doesn't announce any heading semantics, and the form itself announces sections in a seemingly random order.
  • If it's not possible to generate an accessible PDF version, then the next best option would be to generate an accessible version in another file type (HTML, txt, something else) that could be presented as an optional download.

They also advise working with CAIA which we already are.

Please close the related Staging Review ticket when this is addressed!

@oddball-lindsay
Copy link
Contributor Author

oddball-lindsay commented Dec 12, 2024

The "Next steps" page may be causing the loss of accessibility, will verify on Monday during a call with Jamie.

Another idea: a separate download for Next Steps.

@opticbob
Copy link

I'll meet on Monday the 16th with @coforma-jamie to discuss ways we can get the PDFs accessible.

I'll have ready:

  • flattened and unflattened versions of just the filled form, before we combine it with the next steps page
  • A combined version of both of those files with the next steps page where we use hexapdf to combine the pages

I'll try to research generating the next steps page with the proper tags in prawn and hexapdf.

@opticbob
Copy link

I met with @coforma-jamie and @jquispe-oddball this morning to look at some example files and discuss accessibility in general. All of the example files we tested were unflattened and editable. Some notes:

  • PDFs merged with hexapdf vs combine_pdf behaved better in Acrobat. Files that were filled out had their data present in Acrobat when merged with hexapdf and the file was still editable. When looking at combine_pdf merged files in Acrobat, the files are editable but you can't see any of the prefilled data.
  • Serving the user flattened non-editable files is the way to go if we can't get the underlying tagging of the template form to be maintained. A screen reader will have a much easier time navigating just untagged text rather than untagged text and untagged input fields.
  • Switching the order of the PDFs so that the filled out form was first and the next steps page was last, then combining them with combine_pdf was not an improvement.
  • Jamie showed us the underlying tagging of a couple of different PDF products from the VA. The tagging of the form 10-10EZ was incredibly thorough. I'm going to try to talk to some other teams that are doing form filling of PDFs and see how their filled in forms check out in terms of accessibility.
  • hexapdf doesn't have direct support for writing semantic tags into a PDF document. You can use it to modify the structure tree of a PDF and it is possible we can do that to write some rudimentary tags. I'm going to do so for a simple next steps page to see if I can get H1, H2, and P tags to read in Acrobat.

@opticbob
Copy link

I've migrated the merging of PDFs over to hexapdf and will ask other VA engineers about the accessibility of their filled in PDFs in the morning. I'll also examine their PDF testing fixtures and start generating a PDF with just the form, without any merges to see if that can get us most of the way there.

@opticbob
Copy link

I modified the code to write directly to the form with no next steps page and no PDF merge of any kind. With that change I generated fresh 2122 and 2122a files. @coforma-jamie looked over the files and they were a big step forward accessibility wise.

Fields and content were annotated but we did not confirm that all headings matched the content and were correct. For example, the NOTE directly under section I was tagged as an H3 on one of the files which wasn't a good match for what that content actually was conveying. I'd guess they just wanted a way to make it bold. I would guess that the annotations in our filled out files match the originals.

Jamie looked at the tab ordering on the versions I sent her and the originals, and they were the same. The original files direct from the VA have nonsensical tab order and our filled out versions duplicate that.

Our filled out versions do look like they have additional tags that are empty of content.

Recommendation:

I think we should remove the next steps PDF page and just fill out the form and return it to the user.

Both combine_pdf and hexapdf remove the form tagging from the form when merging PDFs, and as of right now our next steps page is untagged Neither of the libraries we use for writing PDF files (prawn and hexapdf) support directly tagging content written in the document. The next steps page does make sense if read top to bottom by a screen reader but all the text would be treated the same, there would be no sense of headers vs paragraphs.

I remember a discussion about the possibility of removing the next steps page but I can't find it now. When I find it I'll link it here.

@opticbob
Copy link

I discussed this today with @coforma-jamie and @EvanAtCoforma . Some notes:

  • The reading order is nonsensical and shared with the underlying blank PDF.
  • The reading order in the underlying form isn't going to change and we can't manipulate it.
  • There are discrepancies in viewing the PDFs in different apps, some apps read the filled in data, some don't. This is widespread across the VA.
  • Depending on the quality/sophisitcation of the screen reader the filled in text can be read very differently. Example.
  • The PDFs without the next steps page aren't accessible, but they are more accessible than they were when they were reviewed for staging review.

@opticbob
Copy link

After discussing the points above with the team we've got the following recommendation:

Remove the next steps page from the PDF and make it a separate non-required download link on the download page.

This allows us to get the more accessible version of the PDF form served to the user while also keeping that additional source of next steps information. This will require some additional frontend and backend work.

@opticbob
Copy link

We've got the go ahead to separate the next steps page from the PDF. I'm starting on that now.

Reference.

@opticbob
Copy link

The PR to remove the next steps page has been approved and merged. Work to make the next steps page a separate download has started: #99525

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accredited-representation-management-team Accredited Representation Management team backend mvp Initial version of thing
Projects
None yet
Development

No branches or pull requests

2 participants