Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to include image in Page's to_html or to_xhtml method? #69

Open
LazyGeniusMan opened this issue May 19, 2023 · 1 comment
Open

How to include image in Page's to_html or to_xhtml method? #69

LazyGeniusMan opened this issue May 19, 2023 · 1 comment

Comments

@LazyGeniusMan
Copy link

LazyGeniusMan commented May 19, 2023

When I try coverting a page that have image to html or xhtml, the image is not included. With this code:

fn main() {
    use mupdf::{Document, Page};
    use std::fs;

    let doc: Document = Document::open("C:\\Users\\LazyGeniusMan\\Downloads\\mupdf\\test.epub").unwrap();
    let page: Page = doc.load_page(341).unwrap();
    let html: String = page.to_html().unwrap();

    fs::write("C:\\Users\\LazyGeniusMan\\Downloads\\mupdf\\rs-test.html", html);
}

I got this result:
image

there should be an image above Figure 10.3 text.

I tried to do the same thing in PyMuPDF with this code:

import fitz

doc = fitz.Document('C:\\Users\\LazyGeniusMan\\Downloads\\mupdf\\test.epub')
page = doc[331] # the page index is somehow different for the same page I want
html = page.get_text("html")

with open("C:\\Users\\LazyGeniusMan\\Downloads\\mupdf\\py-test.html", "w") as file:
    file.write(html)

I got this result:
image

the image is included in base64 format.

I also tried doing the same thing via mutool convert cli, and can get the same result but there's an option that need to be enabled, I dont find anyway to set this thing in to_html method of this crate. The option in mutool look like this:

Text output options:
        inhibit-spaces: don't add spaces between gaps in the text
        preserve-images: keep images in output
        preserve-ligatures: do not expand ligatures into constituent characters
        preserve-whitespace: do not convert all whitespace into space characters
        preserve-spans: do not merge spans on the same line
        dehyphenate: attempt to join up hyphenated words
        mediabox-clip=no: include characters outside mediabox
@messense
Copy link
Owner

Sorry, this project is not actively maintained at the moment, but I'm happy to accept pull requests to fix this if anyone is up for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants