Add relaxed mode (ignores things like false byte offsets in xref table) #41

manfredlotz · 2018-10-06T17:25:46Z

Found another error for http://mirrors.ibiblio.org/CTAN/macros/latex/contrib/ksp-thesis/ksp-thesis.pdf which gives:

Custom { kind: InvalidData, error: StringError("Not a valid PDF file (xref_and_trailer).\nMismatch { message: "expect repeat at least 1 times, found 0 times", position: 267986 }") }

J-F-Liu · 2018-10-07T01:49:57Z

The startxref value is wrong in this file, should be 267985.

manfredlotz · 2018-10-07T03:45:37Z

pdfinfo doesn't notice it. All pdf viewers in my system are forgiving and don't complain.

Locally on my hard disk I have more pdf documents showing this error.

So, what to do? Perhaps lopdf should have a relaxed mode when parsing where such things will be accepted?

misos1 · 2018-10-24T20:26:59Z

It is little annoying that it prints these things into stdout with no possibility to turn this off even in release mode.

Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 37958).\nMismatch { message: "expect repeat at least 1 times, found 0 times", position: 37958 }") }

Yes this pdf is not 100% correct as references to some objects points at one character before actual object at newline character, but pdf viewers learned to forgive these things.

At least would be better to use rust logging for these things like:

	Err(err) => {
		warn!("{:?}", err); // or error!
	}

ghost · 2019-03-04T13:37:50Z

I vote to make the error logging configurable and a relaxed parsing mode.
I am seeing a similar error: (Custom { kind: InvalidData, error: StringError("Not a valid PDF file (read object at 16845).\nMismatch { message: \"seq endobj expect: 101, found: 115\", position: 16995 }")) while attempting to convert some PDF's to text using pdf-extract.. I can also open the PDF throwing this error with evince on a GNU/Linux distribution.
~~Should we break these out into 2 different issues?~~
See #46 for the configurable logging.
So, perhaps consider adding a relaxed parsing option / default?

Heinenen · 2024-08-11T18:15:19Z

println! has been replaced by logging some time ago and the original issue has been "solved".
I renamed the issue to better fit the remaining issue that came up in discussion.

Heinenen changed the title ~~Not a valid PDF file (xref_and_trailer)~~ Add relaxed mode (ignores things like false byte offsets in xref table) Aug 11, 2024

Heinenen added the enhancement label Aug 11, 2024

Heinenen mentioned this issue Sep 4, 2024

15_EventMaxiumSpeed_Qualifying.PDF can't be loaded because of overly strict startxref parsing #318

Open

Heinenen mentioned this issue Sep 16, 2024

ToUnicode CMap error with 0.34.0 #319

Closed

Heinenen mentioned this issue Jan 12, 2025

added binary comment as attribute and for load and write. Binary Comment is gonna be important for pdf in A/2, A/3 format. #370

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add relaxed mode (ignores things like false byte offsets in xref table) #41

Add relaxed mode (ignores things like false byte offsets in xref table) #41

manfredlotz commented Oct 6, 2018

J-F-Liu commented Oct 7, 2018

manfredlotz commented Oct 7, 2018

misos1 commented Oct 24, 2018 •

edited

Loading

ghost commented Mar 4, 2019 •

edited by ghost

Loading

Heinenen commented Aug 11, 2024

Add relaxed mode (ignores things like false byte offsets in xref table) #41

Add relaxed mode (ignores things like false byte offsets in xref table) #41

Comments

manfredlotz commented Oct 6, 2018

J-F-Liu commented Oct 7, 2018

manfredlotz commented Oct 7, 2018

misos1 commented Oct 24, 2018 • edited Loading

ghost commented Mar 4, 2019 • edited by ghost Loading

Heinenen commented Aug 11, 2024

misos1 commented Oct 24, 2018 •

edited

Loading

ghost commented Mar 4, 2019 •

edited by ghost

Loading