Skip to content

Commit 719ec6c

Browse files
committed
Updates docs to include section for converting files.
1 parent 5796732 commit 719ec6c

File tree

3 files changed

+139
-3
lines changed

3 files changed

+139
-3
lines changed

docs/converting-files.rst

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
.. include:: header.rst
2+
3+
.. _ConvertingFiles:
4+
5+
==============================
6+
Converting Files
7+
==============================
8+
9+
10+
11+
Files to PDF
12+
~~~~~~~~~~~~~~~~~~
13+
14+
:ref:`Document types supported by PyMuPDF<HowToOpenAFile>` can easily be converted to |PDF| by using the :meth:`Document.convert_to_pdf` method. This method returns a buffer of data which can then be utilized by |PyMuPDF| to create a new |PDF|.
15+
16+
17+
18+
**Example**
19+
20+
.. code-block:: python
21+
22+
import pymupdf
23+
24+
xps = pymupdf.open("input.xps")
25+
pdfbytes = xps.convert_to_pdf()
26+
pdf = pymupdf.open("pdf", pdfbytes)
27+
pdf.save("output.pdf")
28+
29+
30+
31+
PDF to SVG
32+
~~~~~~~~~~~~~~~~~~
33+
34+
Technically, as SVG files cannot be multipage, we must export each page as an SVG.
35+
36+
To get an SVG representation of a page use the :meth:`Page.get_svg_image` method.
37+
38+
**Example**
39+
40+
.. code-block:: python
41+
42+
import pymupdf
43+
44+
doc = pymupdf.open("input.pdf")
45+
page = doc[0]
46+
47+
# Convert page to SVG
48+
svg_content = page.get_svg_image()
49+
50+
# Save to file
51+
with open("output.svg", "w", encoding="utf-8") as f:
52+
f.write(svg_content)
53+
54+
doc.close()
55+
56+
57+
PDF to Markdown
58+
~~~~~~~~~~~~~~~~~
59+
60+
By utlilizing the :doc:`PyMuPDF4LLM API <pymupdf4llm/api>` we are able to convert PDF to a Markdown representation.
61+
62+
**Example**
63+
64+
.. code-block:: python
65+
66+
import pymupdf4llm
67+
import pathlib
68+
69+
md_text = pymupdf4llm.to_markdown("test.pdf")
70+
print(md_text)
71+
72+
pathlib.Path("4llm-output.md").write_bytes(md_text.encode())
73+
74+
75+
PDF to DOCX
76+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
77+
78+
Use the pdf2docx_ library which uses |PyMuPDF| to provide document conversion from |PDF| to **DOCX** format.
79+
80+
81+
82+
**Example**
83+
84+
.. code-block:: python
85+
86+
from pdf2docx import Converter
87+
88+
pdf_file = 'input.pdf'
89+
docx_file = 'output.docx'
90+
91+
# convert pdf to docx
92+
cv = Converter(pdf_file)
93+
cv.convert(docx_file) # all pages by default
94+
cv.close()
95+
96+
97+
.. include:: footer.rst

docs/how-to-open-a-file.rst

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,16 +11,53 @@ Opening Files
1111

1212
.. _Supported_File_Types:
1313

14+
1415
Supported File Types
1516
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1617

18+
|
19+
20+
PyMuPDF
21+
"""""""""
22+
1723
|PyMuPDF| can open files other than just |PDF|.
1824

1925
The following file types are supported:
2026

2127
.. include:: supported-files-table.rst
2228

2329

30+
----
31+
32+
33+
PyMuPDF Pro
34+
"""""""""""""""
35+
36+
|PyMuPDF Pro| can open Office files.
37+
38+
The following file types are supported:
39+
40+
.. list-table::
41+
:header-rows: 1
42+
43+
* - **DOC/DOCX**
44+
- **XLS/XLSX**
45+
- **PPT/PPTX**
46+
- **HWP/HWPX**
47+
* - .. image:: images/icons/icon-docx.svg
48+
:width: 40
49+
:height: 40
50+
- .. image:: images/icons/icon-xlsx.svg
51+
:width: 40
52+
:height: 40
53+
- .. image:: images/icons/icon-pptx.svg
54+
:width: 40
55+
:height: 40
56+
- .. image:: images/icons/icon-hangul.svg
57+
:width: 40
58+
:height: 40
59+
60+
2461

2562
How to Open a File
2663
~~~~~~~~~~~~~~~~~~~~~

docs/recipes.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,11 @@
1010

1111
how-to-open-a-file.rst
1212

13+
----
14+
15+
.. toctree::
16+
17+
converting-files.rst
1318

1419
----
1520

@@ -18,21 +23,18 @@
1823

1924
recipes-text.rst
2025

21-
2226
----
2327

2428
.. toctree::
2529

2630
recipes-images.rst
2731

28-
2932
----
3033

3134
.. toctree::
3235

3336
recipes-annotations.rst
3437

35-
3638
----
3739

3840
.. toctree::

0 commit comments

Comments
 (0)