Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add PDF/A support and source attachment to PDF. #1090

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kesara
Copy link
Member

@kesara kesara commented Jan 31, 2024

Fixes #1085

This change requires WeasyPrint >=61.0.

  • Generate a valid PDF/A-3B document.
  • Attach source XML to PDF file.

@kesara
Copy link
Member Author

kesara commented Feb 9, 2024

With updates from Kozea/WeasyPrint#1869 xml2rfc can generate a PDF file that confirms with PDF/A-3B standard.

Sample file: https://t4.fq.nz/rfc9527.pdf

VeraPDF output:

<?xml version="1.0" encoding="utf-8"?>
<report>
  <buildInformation>
    <releaseDetails id="core" version="1.24.1" buildDate="2023-06-22T10:38:00Z"></releaseDetails>
    <releaseDetails id="validation-model" version="1.24.1" buildDate="2023-06-22T11:37:00Z"></releaseDetails>
    <releaseDetails id="gui" version="1.24.1" buildDate="2023-06-22T14:19:00Z"></releaseDetails>
  </buildInformation>
  <jobs>
    <job>
      <item size="91087">
        <name>/docs/rfc9527.pdf</name>
      </item>
      <validationReport jobEndStatus="normal" profileName="PDF/A-3B validation profile" statement="PDF file is compliant with Validation Profile requirements." isCompliant="true">
        <details passedRules="144" failedRules="0" passedChecks="40904" failedChecks="0"></details>
      </validationReport>
      <duration start="1707442324695" finish="1707442325181">00:00:00.486</duration>
    </job>
  </jobs>
  <batchSummary totalJobs="1" failedToParse="0" encrypted="0" outOfMemory="0" veraExceptions="0">
    <validationReports compliant="1" nonCompliant="0" failedJobs="0">1</validationReports>
    <featureReports failedJobs="0">0</featureReports>
    <repairReports failedJobs="0">0</repairReports>
    <duration start="1707442324642" finish="1707442325193">00:00:00.551</duration>
  </batchSummary>
</report>

Poppler pdfdetach output:

pdfdetach -list rfc9527.pdf
1 embedded files
1: rfc9527.xml

@ajeanmahoney
Copy link
Collaborator

I looked at https://t4.fq.nz/rfc9527.pdf, and I can save and open the attached XML file.
Some observations:

  • The WeasyPrint-generated PDF is smaller (89K) than the pdfaPilot PDF (202K).
  • The Document Properties dialog box shows the following differences (if not listed here, then the properties match):
    • List of authors:
      • WeasyPrint-generated PDF: semicolon-delimited and not in quotes
      • pdfaPilot-generated PDF: comma-delimited and in double quotes
    • Keywords:
      • WeasyPrint-generated PDF: contains the abstract in double quotes
      • pdfaPilot-generated PDF: empty
    • Created and Modified:
      • WeasyPrint-generated PDF: empty
      • pdfaPilot-generated PDF: has timestamps

rfc9527_properties_screenshot.pdf
rfc9527_weasy_2_properties_screenshot.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generate an XML source file embedded PDF/A-3 PDF
2 participants