-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_processed_image
gives reversed byte order
#50
Comments
Hi @Victor-N-Suadicani, thanks for raising this. Yes, this sounds a lot like #9, doesn't it? Pdfium internally uses BGR8 byte ordering throughout; the upstream It may be necessary to do some byte flipping manually within Can you attach the sample PDF you are working with? I'm curious as to what might have caused the error you mentioned, but perhaps it will become clear upon examining your PDF. |
I can't share the PDF unfortunately but I get the same error on the image-test.pdf you have yourself in the test directory. Also interestingly the images there seem to turn very tiny after extraction... |
Ok, that's fine. I'm going to work on this tomorrow. |
Ok. There's at least three problems here :/
Created I'll continue tomorrow with the third problem noted above, with a view to pushing all changes by the end of tomorrow. |
Added new |
Does |
No idea! My initial interest is in seeing whether You're not the only person to encounter this problem. Based on the tiny hint at https://groups.google.com/g/pdfium/c/V-H9LpuHpPY, I looked at the transformation matrices of every image object in This suggests that this is indeed a bug in Pdfium, and you may wish to open a bug upstream. I am open to implementing a work-around in I think at this point I'm going to push the changes I've made so far, and post my test code here, so you can play around with it and have a think about what you want to do. |
In fact, since this Spawned new issue #52 to handle the upstream issue. Implemented unit tests for |
You can experiment with my changes to My sample code is basically the same as yours, but for completeness, here it is: use pdfium_render::prelude::*;
fn main() -> Result<(), PdfiumError> {
let bindings =
Pdfium::bind_to_library(Pdfium::pdfium_platform_library_name_at_path("./"))
.or_else(|_| Pdfium::bind_to_system_library())?;
let pdfium = Pdfium::new(bindings);
let document = pdfium.load_pdf_from_file("image-test.pdf", None)?;
for (page_n, page) in document.pages().iter().enumerate() {
for (object_n, object) in page.objects().iter().enumerate() {
match &object {
PdfPageObject::Text(_) => println!("Got Text object"),
PdfPageObject::Path(_) => println!("Got Path object"),
PdfPageObject::Image(image_obj) => {
println!("Got Image object");
let image = image_obj.get_processed_image(&document)?;
image
.save(format!(
"{page_n:0>3}_{object_n:0>3}-processed.jpeg"
))
.map_err(|_| PdfiumError::ImageError)?;
}
PdfPageObject::Shading(_) => println!("Got Shading object"),
PdfPageObject::FormFragment(_) => println!("Got FormFragment object"),
PdfPageObject::Unsupported(_) => println!("Got Unsupported object"),
}
}
}
Ok(())
} |
Thanks for the quick work! Using the latest master, the color problem has definitely been fixed. I'm able to extract many more images now as well, though I still run into an error later on in my own PDF (image-test.pdf works however). The biggest problem I think is that the images extracted in this way are quite small for some PDFs, much smaller than the highest possible resolution as far as I can tell. |
Ok, that's good news about the colors. Pdfium returning the wrong image size may be related to #52; in any case, it is definitely a problem in Pdfium, rather than |
As there have been no further comments on this, and I believe the original problem has been fixed, I am closing this issue. Let's continue discussion about your upstream problems in #52. Fix for reversed byte order released as part of 0.7.21. Published to crates.io. |
So I tried the following:
This worked on the first image of a PDF I tried it on, after which I got an error
Error: Pdfium(PdfiumLibraryInternalError(Unknown))
. But the important point is that the image that did manage to get extracted had the colours all wrong - a mostly red image turned blue. I'm guessing this is due to #9 and that this doesn't use theset_reverse_byte_order
flag?The text was updated successfully, but these errors were encountered: