Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF Download #969

Open
cooltrooper opened this issue Sep 18, 2018 · 29 comments
Open

PDF Download #969

cooltrooper opened this issue Sep 18, 2018 · 29 comments
Labels
difficulty: advanced Issues that are complex, e.g. large scoping for long-term maintainability. help wanted Asking for outside help and/or contributions to this particular issue or PR. proposal This issue is a proposal, usually non-trivial change

Comments

@cooltrooper
Copy link

🚀 Feature

The ability for users to download an offline version of the document.

This would be ideal as a PDF with title page and table of contents.

Have you read the Contributing Guidelines on issues?

Yes

Motivation

Interested in using this as a documentation solution for client producs but PDF for print is required by some.

Pitch

A download button could be added to the top of the side nav which when clicked on could would download the file.

@haraldur12
Copy link
Contributor

Could we use something like react-pdf ? I might be able to take a look at it.

@cooltrooper
Copy link
Author

cooltrooper commented Sep 20, 2018

Sorry I may have explained poorly. I believe react-pdf will render pdf's on the webpage. What I'm thinking would be a way to export your documents to a pdf as an additional format.

Like sphinx has PDF generation via LaTeX

@haraldur12
Copy link
Contributor

We could also achieve this with html2canvas module I think. There are a few ways to do that. However I am not sure which one would be the optimal way. If there is interest in it I assume I could take a look.

@Zwitty
Copy link

Zwitty commented Oct 18, 2018

It might be possible to generate PDFs during the build using something like pandoc and then serve them with the document.

@cooltrooper
Copy link
Author

Concatenate markdowns, parse with pandocs to pdf?

@endiliey endiliey added the feature This is not a bug or issue with Docusausus, per se. It is a feature request for the future. label May 17, 2019
@tusharf5
Copy link
Contributor

We could use something like markdown-pdf to create pdf documents for markdown files during the build process and serve them. The URL for each pdf would be the same as the document one but with an extension ".pdf". For translated documents too.

We could also make this feature as optional and turned off as a default based on some siteConfig key. I can take a look into it if it's still a requested feature.

@endiliey endiliey added the help wanted Asking for outside help and/or contributions to this particular issue or PR. label Jul 18, 2019
@sourabhxyz
Copy link

Since I wanted this feature, I implemented it using Travis. See 1, 2, 3. Ofcourse, you would need to change it according to your needs. As might be evident from one of these files, I am pushing generated pdf files here.

@BenHadman
Copy link

Since I wanted this feature, I implemented it using Travis. See 1, 2, 3. Ofcourse, you would need to change it according to your needs. As might be evident from one of these files, I am pushing generated pdf files here.

Please may you go into some more detail as to how I could implement this?
Sorry I'm new to react and web dev in general

@sourabhxyz
Copy link

Since I wanted this feature, I implemented it using Travis. See 1, 2, 3. Ofcourse, you would need to change it according to your needs. As might be evident from one of these files, I am pushing generated pdf files here.

Please may you go into some more detail as to how I could implement this?
Sorry I'm new to react and web dev in general

Apologies for replying late. If you want to understand my approach, start by learning about Travis and see how to convert markdown to pdf using pandoc. What I am essentially doing is whenever I make a commit to my docusaurus site, my Travis script converts those markdown files (which are listed in sidebars.json) to pdf then simply merge them and push it to my other repository.

@reflectively
Copy link

Would love to see this feature!

@dheerajmpai
Copy link

Actually React-pdf can do the job. We need to work on it

@braco
Copy link

braco commented Feb 20, 2020

Lack of this feature, or single-page output, is why my company has to move away from Docusaurus.

@yangshun
Copy link
Contributor

@braco thanks for the feedback. What will you be moving to?

@kohheepeace
Copy link
Contributor

kohheepeace commented Mar 5, 2020

@yangshun Hi I implemented node script to generate pdf file through docs.

This is generated PDF: https://drive.google.com/file/d/19P3qSwLLUHYigrxH3QXIMXmRpTFi4pKB/view

Asking

  • I want to ask my approach is okay way before sending PR
  • I want to know any requirement about this pdf feature (Design etc...)

Approach

I immitated mdx-deck approach.

Author needs to manually generate and serve pdf at local by this approach.

  1. Start docusaurus project at localhost
  2. Run node script to generate pdf
  3. Serve generated pdf file from static folder
  4. Add link to pdf file in client side for reader can access pdf.

*Discussion about this approach: jxnblk/mdx-deck#141

Demo

  1. Run docusaurus oss site at http://localhost:3000/
  2. Make hoge.js at root of oss project
  • NOTE!: this code only works oss docusaurus project now
  • please install "puppeteer", "hummus", "memory-streams"

hoge.js

const puppeteer = require('puppeteer');
const { PDFRStreamForBuffer, createWriterToModify, PDFStreamForResponse } = require('hummus');
const { WritableStream } = require('memory-streams');
const fs = require('fs');

const mergePdfBlobs = (pdfBlobs) => {
  const outStream = new WritableStream();                                                                                                                                             
  const [firstPdfRStream, ...restPdfRStreams] = pdfBlobs.map(pdfBlob => new PDFRStreamForBuffer(pdfBlob));
  const pdfWriter = createWriterToModify(firstPdfRStream, new PDFStreamForResponse(outStream));

  restPdfRStreams.forEach(pdfRStream => pdfWriter.appendPDFPagesFromPDF(pdfRStream));

  pdfWriter.end();
  outStream.end();
  
  return outStream.toBuffer();
};


let generatedPdfBlobs = [];

(async () => {
  const browser = await puppeteer.launch();
  let page = await browser.newPage();
  let nextPageUrl = 'http://localhost:3000/docs/introduction';

  while (nextPageUrl) {
    await page.goto(`${nextPageUrl}`, {waitUntil: 'networkidle2'});
      
    try {
      nextPageUrl = await page.$eval('.pagination-nav__item--next > a', (element) => {
        return element.href;
      });
    } catch (e) {
      nextPageUrl = null;
    }
  
  
    let html = await page.$eval('article', (element) => {
      return element.outerHTML;
    });
  
    
    await page.setContent(html);
    await page.addStyleTag({url: 'http://localhost:3000/styles.css'});
    await page.addScriptTag({url: 'http://localhost:3000/styles.js'});
    const pdfBlob = await page.pdf({path: "", format: 'A4', printBackground: true, margin : {top: 20, right: 15, left: 15, bottom: 20}});

    generatedPdfBlobs.push(pdfBlob);
  }
  await browser.close();

  const mergedPdfBlob = mergePdfBlobs(generatedPdfBlobs);
  fs.writeFileSync('hoge-final.pdf', mergedPdfBlob);
})();
  1. Run hoge.js
    This create hoge-final.pdf in root
node hoge.js

Thanks!

@maxarndt
Copy link

maxarndt commented Mar 5, 2020

Demo

  1. Run docusaurus oss site at http://localhost:3000/
  2. Make hoge.js at root of oss project
  • NOTE!: this code only works oss docusaurus project now
  • please install "puppeteer", "hummus", "memory-streams"

@kohheepeace could you explain what oss stands for in this context? I´m new to docusaurus and looking for a way to export docs as PDF so I tried your demo code-snippet. Unfortunately, the structure of my project initialized with npx docusaurus-init does not match your assumptions regarding stylesheet- and js-files etc.

Thank you for sharing and working on this project!

@kohheepeace
Copy link
Contributor

kohheepeace commented Mar 5, 2020

Hi, @maxarndt, sorry for my bad explanation "oss site" stands for https://github.com/facebook/docusaurus.

If you just made site by runnning npx @docusaurus/init@next init my-website classic, You need to change nextPageUrl in hoge.js, from http://localhost:3000/docs/introduction to http://localhost:3000/docs/doc1.

You need to specify initial url to start scraping. Then, Puppeteer automatically paginates and generates PDF. I hope it will work in your environment!

@kohheepeace
Copy link
Contributor

@maxarndt I made npm package for generating PDF.
https://github.com/KohheePeace/docusaurus-pdf

@jonsanjuan
Copy link

Hi Kohhee. I was trying to install your npm package and it was failing during the following install process. I am also rather new on Docusaurus and pdf translations, so thanks in advanced:

make: *** [Release/obj.target/hummus/src/PDFDictionaryDriver.o] Error 1
gyp ERR! build error
gyp ERR! stack Error: make failed with exit code: 2
gyp ERR! stack at ChildProcess.onExit (/usr/local/lib/node_modules/npm/node_modules/node-gyp/lib/build.js:194:23)
gyp ERR! stack at ChildProcess.emit (events.js:321:20)
gyp ERR! stack at Process.ChildProcess._handle.onexit (internal/child_process.js:275:12)
gyp ERR! System Darwin 19.2.0
gyp ERR! command "/usr/local/Cellar/node/13.8.0/bin/node" "/usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js" "build" "--fallback-to-build" "--module=/usr/local/lib/node_modules/hummus/binding/hummus.node" "--module_name=hummus" "--module_path=/usr/local/lib/node_modules/hummus/binding" "--napi_version=5" "--node_abi_napi=napi"
gyp ERR! cwd /usr/local/lib/node_modules/hummus
gyp ERR! node -v v13.8.0
gyp ERR! node-gyp -v v5.1.0
gyp ERR! not ok
node-pre-gyp ERR! build error
node-pre-gyp ERR! stack Error: Failed to execute '/usr/local/Cellar/node/13.8.0/bin/node /usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js build --fallback-to-build --module=/usr/local/lib/node_modules/hummus/binding/hummus.node --module_name=hummus --module_path=/usr/local/lib/node_modules/hummus/binding --napi_version=5 --node_abi_napi=napi' (1)
node-pre-gyp ERR! stack at ChildProcess. (/usr/local/lib/node_modules/hummus/node_modules/node-pre-gyp/lib/util/compile.js:83:29)
node-pre-gyp ERR! stack at ChildProcess.emit (events.js:321:20)
node-pre-gyp ERR! stack at maybeClose (internal/child_process.js:1026:16)
node-pre-gyp ERR! stack at Process.ChildProcess._handle.onexit (internal/child_process.js:286:5)
node-pre-gyp ERR! System Darwin 19.2.0
node-pre-gyp ERR! command "/usr/local/Cellar/node/13.8.0/bin/node" "/usr/local/lib/node_modules/hummus/node_modules/.bin/node-pre-gyp" "install" "--fallback-to-build"
node-pre-gyp ERR! cwd /usr/local/lib/node_modules/hummus
node-pre-gyp ERR! node -v v13.8.0
node-pre-gyp ERR! node-pre-gyp -v v0.10.3
node-pre-gyp ERR! not ok
Failed to execute '/usr/local/Cellar/node/13.8.0/bin/node /usr/local/lib/node_modules/npm/node_modules/node-gyp/bin/node-gyp.js build --fallback-to-build --module=/usr/local/lib/node_modules/hummus/binding/hummus.node --module_name=hummus --module_path=/usr/local/lib/node_modules/hummus/binding --napi_version=5 --node_abi_napi=napi' (1)
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! hummus@1.0.108 install: node-pre-gyp install --fallback-to-build $EXTRA_NODE_PRE_GYP_FLAGS
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the hummus@1.0.108 install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

@kohheepeace
Copy link
Contributor

@jonsanjuan Thanks for your report! I think it is better to talk about this in my repo. Could you post issue in my repo?

@jonsanjuan
Copy link

will do

@yangshun yangshun added the difficulty: advanced Issues that are complex, e.g. large scoping for long-term maintainability. label Jun 5, 2020
@ritu-sehrawat
Copy link

guys...I am not able to convert static pages created through Docusaurus to pdf...can you please help

@cmdcolin
Copy link
Contributor

cmdcolin commented Jul 7, 2020

I really like @kohheepeace solution and especially has the opportunity to load mdx components too in a way pure pandoc+md can not. I managed to get it running very quickly which is great.

I think pandoc, if it is your solution, might also be a good option too, but is a bit harder to setup

If you simply run like

pandoc *.md -o output.pdf

This looks reasonably good, has a classic "academic paper" style latex font, but it lacks the ordering. I wanted to get some nice ordering and make it use sort of the "sidebar.json" ordering from the docusaurus

I made this small command/bash script

# make_pdf.sh
for i in $(node read_sidebar.js); do
  # trim off the header of the markdown docs e.g. the first 5 lines of each markdown file
  # otherwise pandoc gets confused and takes the title from the last element in the list
  tail -n +5 $i; 
done|pandoc title.md - --toc -o output.pdf

Note the title.md at the bottom, I put it there since I don't want the title's header stripped off

It adds, in addition to everything that docusaurus docs have, a special title.md which just contains these contents

The title.md file:

---
title: Your Product
authors: Optional author list
abstract: Subtext for your product
---

Optionally more content that goes before the TOC  here

Read_sidebar.js looks like

// read_sidebar.js
const fs = require('fs')

const sidebar = JSON.parse(fs.readFileSync('../sidebars.json'))

function readTree(tree, ret = []) {
  for (elt in tree) {
    if (typeof tree[elt] === 'object') {
      readTree(tree[elt], ret)
    }
    if (tree[elt]) {
      ret.push(tree[elt] + '.md')
    }
  }
  return ret
}
console.log(readTree(sidebar).join('\n'))

This just reads a sidebar.json and outputs the list of files to read, in the order that they are specified in the sidebar.json

If you have substructure in your sidebar like different sections I recommend making something like this

// sidebar.json. note if you have a sidebar.js, just convert it to json and make it require.resolve('sidebar.json') instead of the js
{
  "someSidebar": {
    "User guide": [
      "user_guide", // this file has a markdown with a top level # header
      "user_navigation", // rest of these files use two ## headers

    ],
    "Configuration guide": [
      "config_guide", // this file has a markdown with a top level # header
      "config_stuff", // rest of these files use two ## headers

    ],
    "Developer resources": [
      "developer_guide",
      "developer_code_organization",   ]
    ]
  }
}

This produces a nice table of contents where each section that corresponds to a section of the sidebars.json also gets a section in the pdf table of contents

Example from our docs https://jbrowse.org/jb2/jbrowse2.pdf (from https://jbrowse.org/jb2/docs/)

@Josh-Cena Josh-Cena added proposal This issue is a proposal, usually non-trivial change and removed feature This is not a bug or issue with Docusausus, per se. It is a feature request for the future. labels Oct 30, 2021
@Josh-Cena
Copy link
Collaborator

Hi @kohheepeace very cool tool! Do you have plans to turn that into a Docusaurus plugin? I believe you just need to hook into postBuild to read the HTML files. Also, you can choose to wrap the doc footer component and inject a "download PDF" button. (Or leave that to the user) If that is done, we will probably close this as resolved by community.

@kohheepeace
Copy link
Contributor

@Josh-Cena thanks for your proposal. I'm currently busy on another project, so I can't implement this as soon as possible. docusaurus-pdf and mr-pdf are MIT license so please feel free to fork or create completely as another project 👍

@ar-to
Copy link

ar-to commented Jan 30, 2022

Here is an interesting solutions built into the CI process. I did not try but it depend on Prince which is almost like $4k a pop to license!

Also found https://www.npmjs.com/package/docusaurus-plugin-papersaurus?activeTab=readme which I did test and it looked promising up until no pdfs were generated as I described in my issue to that project here.

Pdf generation is pretty important to have in a documentation project like this. A bit surprised its not among the first features. First time testing docusaurus and part of the requirements I have is to generate a PDF so I guess I'll have to try a method outside of docusaurus. If I can get something working I'll try to create a plugin.

@slorber
Copy link
Collaborator

slorber commented Aug 17, 2023

Someone recently submitted this package to our community resources: https://github.com/jean-humann/docs-to-pdf

@alishefaee
Copy link

For anyone desperate for documentation tools with advanced PDF export, I found doxygen to be stable and mature.

@intergalacticmammoth
Copy link

Is this feature fully stalled or is anyone planning to work on it?

@JonZeolla
Copy link

JonZeolla commented Nov 18, 2024

@intergalacticmammoth this worked for me; minor update to @kohheepeace's comment above. Requires memory-streams, muhammara, and puppeteer:

const puppeteer = require('puppeteer');
const { PDFRStreamForBuffer, createWriterToModify, PDFStreamForResponse } = require('muhammara');
const { WritableStream } = require('memory-streams');
const fs = require('fs');

const mergePdfBlobs = (pdfBlobs) => {
  const outStream = new WritableStream();
  const [firstPdfRStream, ...restPdfRStreams] = pdfBlobs.map(pdfBlob => new PDFRStreamForBuffer(pdfBlob));
  const pdfWriter = createWriterToModify(firstPdfRStream, new PDFStreamForResponse(outStream));

  restPdfRStreams.forEach(pdfRStream => pdfWriter.appendPDFPagesFromPDF(pdfRStream));

  pdfWriter.end();
  outStream.end();

  return outStream.toBuffer();
};

let generatedPdfBlobs = [];

(async () => {
  const browser = await puppeteer.launch({ headless: true });
  let page = await browser.newPage();
  let nextPageUrl = 'http://localhost:3000/docs/introduction';
  const outputFileName = 'output.pdf';

  while (nextPageUrl) {
    console.log('Processing page:', nextPageUrl);

    await page.goto(nextPageUrl, { waitUntil: 'networkidle2' });

    try {
      nextPageUrl = await page.$eval('a.pagination-nav__link--next', (element) => element.href);
    } catch (e) {
      console.log(`No next page found. Saving to ${outputFileName}...`);
      nextPageUrl = null;
    }

    try {
      await page.waitForSelector('article');
      let html = await page.$eval('article', (element) => element.outerHTML);
      await page.setContent(html);
    } catch (e) {
      console.warn('Article not found on this page, skipping.');
      continue;
    }

    await page.addStyleTag({ url: 'http://localhost:3000/styles.css' });
    await page.addScriptTag({ url: 'http://localhost:3000/main.js' });

    const pdfBlob = await page.pdf({
      path: "",
      format: 'A4',
      printBackground: true,
      margin: { top: 20, right: 15, left: 15, bottom: 20 },
    });

    generatedPdfBlobs.push(pdfBlob);
  }

  await browser.close();

  const mergedPdfBlob = mergePdfBlobs(generatedPdfBlobs);
  fs.writeFileSync(outputFileName, mergedPdfBlob);

  console.log(`Saved merged PDF to ${outputFileName}`);
})();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty: advanced Issues that are complex, e.g. large scoping for long-term maintainability. help wanted Asking for outside help and/or contributions to this particular issue or PR. proposal This issue is a proposal, usually non-trivial change
Projects
None yet
Development

No branches or pull requests