Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Links are lost after combining PDFs #341

Closed
vekunz opened this issue Jan 27, 2020 · 10 comments
Closed

Links are lost after combining PDFs #341

vekunz opened this issue Jan 27, 2020 · 10 comments

Comments

@vekunz
Copy link

vekunz commented Jan 27, 2020

Hi, I use pdf-lib to combine multiple PDFs. One of the PDFs has links in it, like a table of contents. The links direct to other pages of the same PDF. the problem is that these links are lost after combining PDFs with pdf-lib.
Is there a way to preserve the links?

My code:

const pdfDoc = await PDFDocument.create();
for (const file of files) {
    const indices = [];
    for (let i = 0; i < file.getPageCount(); i++)
        indices.push(i);
    const pages = await pdfDoc.copyPages(file, indices);

   for (const page of pages) {
        pdfDoc.addPage(page);
    }
}

Edit: I found out that the links are saved as "Named Destinations" in the PDF. The PDF has Version 1.4. One option would be that I add the destinations after merging, but then I need an option to add these to the pdf manually.

@Hopding
Copy link
Owner

Hopding commented Feb 9, 2020

Hello @vekunz!

As you noted, the links do not work after the pages are merged because the links reference Named Destinations. Named Destinations are stored under the /Dests entry of the document's catalog. Unfortunately, the current page copying code does not copy anything from the donor document that isn't accessible from the page via a chain of indirect references. And most of the resources listed under the catalog are not accessible in this way.

This limitation has come up before in #159 and #218. I would like to see this issue resolved, but haven't had any time to work on it. I'd be open to discussing a solution to anybody interested in implementing a fix for copying catalog entries between documents!

@SteffenLanger
Copy link

Hi @Hopding,

Thanks for your great work!

I'd like to support you in copying catalog entries between documents. I'm new to PDFs internal workings but am a quick learner. I started researching the format and feel like I've got a good overview.

Since you know about pdf-lib best, do you have any suggestions for implementing this feature? My first (uneducated) guess would be:

  1. Find the catalog entries in the original document.
  2. Copy all catalog entries related to links to the new document.

@Hopding
Copy link
Owner

Hopding commented Sep 24, 2021

Added this to the roadmap for tracking: #998.

@oleteacher
Copy link

Wonderful lib! Know this old issue and closed, but links still do not work on merge in latest release. Hoping for support in future.

@rajashree23
Copy link

#1609

any updates for internal link to work?

@Ludevik
Copy link

Ludevik commented Apr 24, 2024

This is how i post process multiple documents after using copy pages.

import { PDFArray, PDFDict, PDFDocument, PDFName, PDFRef } from 'pdf-lib';

function getLinksPDFName(): PDFName {
  return PDFName.of('Dests');
}

function mapSourceToTargetPages(
  sources: PDFDocument[],
  destination: PDFDocument,
): Record<string, PDFRef> {
  const result = {};
  const sourcePages = sources.flatMap(source => source.getPages());
  const destinationPages = destination.getPages();
  for (let i = 0; i < sourcePages.length; i++) {
    result[sourcePages[i].ref.tag] = destinationPages[i].ref;
  }
  return result;
}

export function copyLinks(sources: PDFDocument[], target: PDFDocument): void {
  const targetLinksDict = PDFDict.withContext(target.context);
  sources
    .map(source => source.context.lookupMaybe(source.catalog.get(getLinksPDFName()), PDFDict))
    .filter(links => links != null)
    .forEach(links =>
      links.entries().forEach(([destName, destValue]) => targetLinksDict.set(destName, destValue)),
    );
  const pagesMapping = mapSourceToTargetPages(sources, target);
  (targetLinksDict.values() as PDFArray[]).forEach(array => {
    const currentPageRef = array.get(0) as PDFRef;
    array.set(0, pagesMapping[currentPageRef.tag]);
  });

  const destinationDestsRef = target.context.register(targetLinksDict);
  target.catalog.set(getLinksPDFName(), destinationDestsRef);
}

How it works:

  • copy entries from Dests from all sources to new dictionary
  • fix references (links) in target Dests because copied pages have different PDFRef
  • register dictionary in the target's context
  • set Dests reference in target's catalog to dictionary

@FiveOFive
Copy link

Thanks for sharing @Ludevik. This has been very helpful. One issue I did encounter is that my sources seem to have duplicate ref.tag values between them. Creating a single mapping of all the source documents to the target was overwriting the duplicates, so I refactored to only map one source document at a time.

export function copyLinks(sources: PDFDocument[], target: PDFDocument) {
  const targetLinksDict = PDFDict.withContext(target.context);

  let currentTargetPage = 0;
  for (const source of sources) {
    const { mapping, targetPage } = mapSourceToTargetPages(source, target, currentTargetPage);
    currentTargetPage = targetPage;

    const links = source.context.lookupMaybe(source.catalog.get(LINKS_PDF_NAME), PDFDict);
    if (links !== null) {
      links?.entries().forEach(([destName, destValue]) => {
        const currentRef = (destValue as PDFArray).get(0) as PDFRef;
        (destValue as PDFArray).set(0, mapping[currentRef.tag]);
        targetLinksDict.set(destName, destValue);
      });
    }
  }

  const destinationDestsRef = target.context.register(targetLinksDict);
  target.catalog.set(LINKS_PDF_NAME, destinationDestsRef);
}

function mapSourceToTargetPages(
  source: PDFDocument,
  target: PDFDocument,
  startingTargetPage: number,
): { mapping: Record<string, PDFRef>; targetPage: number } {
  const result: Record<string, PDFRef> = {};
  const targetPages = target.getPages();
  let currentTargetPage = startingTargetPage;
  const sourcePages = source.getPages();

  for (let i = 0; i < sourcePages.length; i++) {
    result[sourcePages[i].ref.tag] = targetPages[currentTargetPage].ref;
    currentTargetPage++;
  }

  return { mapping: result, targetPage: currentTargetPage };
}

@Ludevik
Copy link

Ludevik commented Jul 22, 2024

@FiveOFive nice fix. we don't have such case, so i didn't encounter the issue.

@cknightdevelopment
Copy link

Thanks for sharing @Ludevik. This has been very helpful. One issue I did encounter is that my sources seem to have duplicate ref.tag values between them. Creating a single mapping of all the source documents to the target was overwriting the duplicates, so I refactored to only map one source document at a time.

export function copyLinks(sources: PDFDocument[], target: PDFDocument) {
  const targetLinksDict = PDFDict.withContext(target.context);

  let currentTargetPage = 0;
  for (const source of sources) {
    const { mapping, targetPage } = mapSourceToTargetPages(source, target, currentTargetPage);
    currentTargetPage = targetPage;

    const links = source.context.lookupMaybe(source.catalog.get(LINKS_PDF_NAME), PDFDict);
    if (links !== null) {
      links?.entries().forEach(([destName, destValue]) => {
        const currentRef = (destValue as PDFArray).get(0) as PDFRef;
        (destValue as PDFArray).set(0, mapping[currentRef.tag]);
        targetLinksDict.set(destName, destValue);
      });
    }
  }

  const destinationDestsRef = target.context.register(targetLinksDict);
  target.catalog.set(LINKS_PDF_NAME, destinationDestsRef);
}

function mapSourceToTargetPages(
  source: PDFDocument,
  target: PDFDocument,
  startingTargetPage: number,
): { mapping: Record<string, PDFRef>; targetPage: number } {
  const result: Record<string, PDFRef> = {};
  const targetPages = target.getPages();
  let currentTargetPage = startingTargetPage;
  const sourcePages = source.getPages();

  for (let i = 0; i < sourcePages.length; i++) {
    result[sourcePages[i].ref.tag] = targetPages[currentTargetPage].ref;
    currentTargetPage++;
  }

  return { mapping: result, targetPage: currentTargetPage };
}

Is it possible to see the calling of these functions in context? Having trouble knowing what to pass for sources vs target, etc.? Thanks!

@Ludevik
Copy link

Ludevik commented Oct 14, 2024

Is it possible to see the calling of these functions in context? Having trouble knowing what to pass for sources vs target, etc.? Thanks!

there's not much to add. sources and target are of type PDFDocument. you can either create new PDFDocument or open existing one and then pass them to the function. in my scenario i use copyPages to combine multiple PDF documents, which kills the links in target PDF, so i use copyLinks to post-process it and to fix the links back again in target PDF, e.g.:

const source1Pdf = await PDFDocument.load(source1PdfBytes);
const source2Pdf = await PDFDocument.load(source2PdfBytes);

const target = await PDFDocument.create();

const source1Pages = await source1Pdf.copyPages(source1Pdf);
const source2Pages = await source2Pdf.copyPages(source2Pdf);

source1Pages.forEach(page => target.addPage(page));
source2Pages.forEach(page => target.addPage(page));

copyLinks([source1Pdf, source2Pdf], target);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants