Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add relaxed mode (ignores things like false byte offsets in xref table) #41

Open
manfredlotz opened this issue Oct 6, 2018 · 5 comments

Comments

@manfredlotz
Copy link

Found another error for http://mirrors.ibiblio.org/CTAN/macros/latex/contrib/ksp-thesis/ksp-thesis.pdf which gives:

Custom { kind: InvalidData, error: StringError("Not a valid PDF file (xref_and_trailer).\nMismatch { message: "expect repeat at least 1 times, found 0 times", position: 267986 }") }

@J-F-Liu
Copy link
Owner

J-F-Liu commented Oct 7, 2018

The startxref value is wrong in this file, should be 267985.

@manfredlotz
Copy link
Author

pdfinfo doesn't notice it. All pdf viewers in my system are forgiving and don't complain.

Locally on my hard disk I have more pdf documents showing this error.

So, what to do? Perhaps lopdf should have a relaxed mode when parsing where such things will be accepted?

@misos1
Copy link

misos1 commented Oct 24, 2018

It is little annoying that it prints these things into stdout with no possibility to turn this off even in release mode.

Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 37958).\nMismatch { message: "expect repeat at least 1 times, found 0 times", position: 37958 }") }

Yes this pdf is not 100% correct as references to some objects points at one character before actual object at newline character, but pdf viewers learned to forgive these things.

At least would be better to use rust logging for these things like:

	Err(err) => {
		warn!("{:?}", err); // or error!
	}

@ghost
Copy link

ghost commented Mar 4, 2019

I vote to make the error logging configurable and a relaxed parsing mode.
I am seeing a similar error: (Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 16845).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 16995 }")) while attempting to convert some PDF's to text using pdf-extract.. I can also open the PDF throwing this error with evince on a GNU/Linux distribution.
Should we break these out into 2 different issues?
See #46 for the configurable logging.
So, perhaps consider adding a relaxed parsing option / default?

@Heinenen Heinenen changed the title Not a valid PDF file (xref_and_trailer) Add relaxed mode (ignores things like false byte offsets in xref table) Aug 11, 2024
@Heinenen
Copy link
Collaborator

println! has been replaced by logging some time ago and the original issue has been "solved".
I renamed the issue to better fit the remaining issue that came up in discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants