Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to initialize parser: Failed to read pdf: Failed to read xref table: Expected xref to start with 'xref' #16

Closed
dv47 opened this issue Sep 11, 2019 · 35 comments

Comments

@dv47
Copy link

dv47 commented Sep 11, 2019

Certain formats of PDF documents do not contain the xref table which is causing the parsing error.
Attaching the pdf so you can duplicate the error.
example_012a.pdf

@phpdave11
Copy link
Owner

Thank you for the bug report and sample PDF.

From the PDF specification:

Beginning with PDF 1.5, cross-reference information may be stored in a cross-reference stream instead of in a cross-reference table.

I will add support for reading cross-reference information from cross-reference streams.

@tylerzika
Copy link

@phpdave11 any idea when this support will be added?

@phpdave11
Copy link
Owner

@tylerzika I will hopefully get to it by the end of the month. I'm just now reading up on how xref streams ought to be parsed. If you're able add support for this sooner, feel free to submit a pull request.

@tylerzika
Copy link

@phpdave11 any updates on this? I'll see if I can contribute..

@phpdave11
Copy link
Owner

@tylerzika I started work on it in the xref-stream-support branch, but I haven’t been able to work on it lately.

@sankethpb
Copy link

@phpdave11 could you please let us know when this would be available?

Thank you very much in advance

@sankethpb
Copy link

@phpdave11 Sorry for asking again, your help would be very much appreciated.

Thank you

@tylerzika
Copy link

@phpdave11 can you share the documentation you would use to fix this issue? I'm clueless on how I would solve it, but I'd like to contribute and try.

@phpdave11
Copy link
Owner

This is fixed in gofpdi v1.0.9.

@phpdave11
Copy link
Owner

I've updated the gofpdi dependency in my fork of gofpdf - jung-kurt/gofpdf is no longer maintained.

@tylerzika
Copy link

tylerzika commented Feb 11, 2020

@phpdave11 I think there is a compiling error. My import statement:

import (
	"crypto/sha1"
	"crypto/subtle"
	"encoding/base64"
	"encoding/json"
	"fmt"
	"github.com/phpdave11/gofpdf"
	"github.com/phpdave11/gofpdf/contrib/gofpdi"
	fpdi "github.com/phpdave11/gofpdi"
	"io/ioutil"
	"log"
	"net/http"
	"os"
	"strings"
	"time"
	"unicode/utf8"
)

Then I get this error:
Screen Shot 2020-02-10 at 7 35 35 PM

@tylerzika
Copy link

@phpdave11 it looks like you change some of the functions and their parameters?

@phpdave11
Copy link
Owner

@tylerzika The external functions and parameters have not changed. I'm not seeing any compiler errors. Try updating the code with go get ./... - I just pushed a few changes.

@tylerzika
Copy link

@phpdave11 when I do go get github.com/phpdave11/gofpdf/contrib/gofpdi I get the following error:

# github.com/phpdave11/gofpdf/contrib/gofpdi
../phpdave11/gofpdf/contrib/gofpdi/gofpdi.go:54:8: i.fpdi.SetSourceStream undefined (type *gofpdi.Importer has
 no field or method SetSourceStream)

@tylerzika
Copy link

@phpdave11

Tylers-MacBook-Pro-2:label-api tyler$ go get ./...
# github.com/phpdave11/gofpdf/contrib/gofpdi
../phpdave11/gofpdf/contrib/gofpdi/gofpdi.go:54:8: i.fpdi.SetSourceStream undefined (type *gofpdi.Importer has no field or method SetSourceStream)

@phpdave11
Copy link
Owner

@tylerzika can you share your go.mod and go.sum files?

@rorycl
Copy link

rorycl commented Feb 12, 2020

Please see issue #23 against v1.0.9

@tylerzika
Copy link

@phpdave11

go.mod

module github.com/phpdave11/gofpdi

go 1.12

go.mod

module github.com/jung-kurt/gofpdf

go 1.12

require (
	github.com/boombuler/barcode v1.0.0
	github.com/phpdave11/gofpdi v1.0.9
	github.com/ruudk/golang-pdf417 v0.0.0-20181029194003-1af4ab5afa58
	golang.org/x/image v0.0.0-20190910094157-69e4b8554b2a
)

replace gofpdf => ./

go.sum

github.com/boombuler/barcode v1.0.0 h1:s1TvRnXwL2xJRaccrdcBQMZxq6X7DvsMogtmJeHDdrc=
github.com/boombuler/barcode v1.0.0/go.mod h1:paBWMcWSl3LHKBqUq+rly7CNSldXjb2rDl3JlRe0mD8=
github.com/davecgh/go-spew v1.1.0 h1:ZDRjVQ15GmhC3fiQ8ni8+OwkZQO4DARzQgrnXU1Liz8=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/jung-kurt/gofpdf v1.0.0/go.mod h1:7Id9E/uU8ce6rXgefFLlgrJj/GYY22cpxn+r32jIOes=
github.com/phpdave11/gofpdi v1.0.7 h1:k2oy4yhkQopCK+qW8KjCla0iU2RpDow+QUDmH9DDt44=
github.com/phpdave11/gofpdi v1.0.7/go.mod h1:vBmVV0Do6hSBHC8uKUQ71JGW+ZGQq74llk/7bXwjDoI=
github.com/phpdave11/gofpdi v1.0.9/go.mod h1:r/fO8a9KSCrpwwTaqEx3amFJ6IHjfvAq7w1GP0XYRcg=
github.com/pkg/errors v0.8.1 h1:iURUrRGxPUNPdy5/HRSm+Yj6okJ6UtLINN0Q9M4+h3I=
github.com/pkg/errors v0.8.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/ruudk/golang-pdf417 v0.0.0-20181029194003-1af4ab5afa58 h1:nlG4Wa5+minh3S9LVFtNoY+GVRiudA2e3EVfcCi3RCA=
github.com/ruudk/golang-pdf417 v0.0.0-20181029194003-1af4ab5afa58/go.mod h1:6lfFZQK844Gfx8o5WFuvpxWRwnSoipWe/p622j1v06w=
github.com/stretchr/testify v1.2.2 h1:bSDNvY7ZPG5RlJ8otE/7V6gMiyenm9RtJ7IUVIAoJ1w=
github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs=
golang.org/x/image v0.0.0-20190902063713-cb417be4ba39 h1:4dQcAORh9oYBwVSBVIkP489LUPC+f1HBkTYXgmqfR+o=
golang.org/x/image v0.0.0-20190902063713-cb417be4ba39/go.mod h1:FeLwcggjj3mMvU+oOTbSwawSJRM1uh48EjtB4UJZlP0=
golang.org/x/image v0.0.0-20190910094157-69e4b8554b2a h1:gHevYm0pO4QUbwy8Dmdr01R5r1BuKtfYqRqF0h/Cbh0=
golang.org/x/image v0.0.0-20190910094157-69e4b8554b2a/go.mod h1:FeLwcggjj3mMvU+oOTbSwawSJRM1uh48EjtB4UJZlP0=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=

@tylerzika
Copy link

tylerzika commented Feb 17, 2020

@phpdave11 I deleted the src files and did a go get to download them again. The error went away. But I'm still getting this error:

http: panic serving [::1]:54121: Failed to initialize parser: Fai
led to read pdf: Failed to read xref table: Unsupported field size in cross-reference
 stream dictionary - only tested with /W [1 2 1]
goroutine 5 [running]:

@phpdave11
Copy link
Owner

@tylerzika this has been fixed in gofpdi v1.0.11.

Originally posted by @phpdave11 in #25 (comment)

@tylerzika
Copy link

tylerzika commented Feb 21, 2020

@phpdave11 this update is fantastic! It's working well with a lot of my company's pdfs we are importing. There is one that is still having problems. It's successfully importing a pdf, but one of the pdf's pages is almost completely blank. At the top of the second page, all of the pdf's image appears to be squashed at the top. It's attached. We had to blot out some person information, so I hope you can experience what we've seen from the original.
7007327.pdf

@0x-1
Copy link

0x-1 commented Feb 27, 2020

With v1.0.11:

Failed to initialize parser: Failed to read pdf: Failed to read xref table: Failed to read prev xref: Unsupported /DecodeParms - only tested with /Columns 4 /Predictor 12

@phpdave11
Copy link
Owner

phpdave11 commented Feb 27, 2020

@phpdave11 this update is fantastic! It's working well with a lot of my company's pdfs we are importing. There is one that is still having problems. It's successfully importing a pdf, but one of the pdf's pages is almost completely blank. At the top of the second page, all of the pdf's image appears to be squashed at the top. It's attached. We had to blot out some person information, so I hope you can experience what we've seen from the original.
7007327.pdf

@tylerzika I'm able to reproduce this issue. I will open a separate issue for this problem.

@phpdave11
Copy link
Owner

With v1.0.11:

Failed to initialize parser: Failed to read pdf: Failed to read xref table: Failed to read prev xref: Unsupported /DecodeParms - only tested with /Columns 4 /Predictor 12

@0x-1 would you mind posting the PDF that causes this error in a new issue?

@0x-1
Copy link

0x-1 commented Feb 27, 2020

Sure thing. Thanks for the patch so far.

@rajat-sr
Copy link

I'm getting the same error as @0x-1

Failed to initialize parser: Failed to read pdf: Failed to read xref table: Failed to read prev xref: Unsupported /DecodeParms - only tested with /Columns 4 /Predictor 12

Does anyone have a solution to this?

@rorycl
Copy link

rorycl commented Dec 28, 2020 via email

@rajat-sr
Copy link

Thanks @rorycl

@wangsirun
Copy link

now this qustion util exist。

@aohanhongzhi
Copy link

aohanhongzhi commented Mar 8, 2022

now this qustion util exist。

GOROOT=/opt/eric/go #gosetup
GOPATH=/home/iris/Project/Go #gosetup
/opt/eric/go/bin/go build -o /tmp/GoLand/___1go_build_go_study_m_pdf__1_ -gcflags all=-N -l /home/iris/Project/Go/go-study/pdf/import_pdf_writer.go #gosetup
/opt/eric/GoLand-2020.1/plugins/go/lib/dlv/linux/dlv --listen=127.0.0.1:43965 --headless=true --api-version=2 --check-go-version=false --only-same-user=false exec /tmp/GoLand/___1go_build_go_study_m_pdf__1_ --
API server listening at: 127.0.0.1:43965
panic: Failed to initialize parser: Failed to read pdf: Failed to read xref table: Unsupported /DecodeParms - only tested with /Columns <= 4 and /Predictor <= 12

goroutine 1 [running]:
github.com/phpdave11/gofpdi.(*Importer).SetSourceFile(0xc000074040, {0x5afc03, 0x35})
        /home/iris/Project/Go/pkg/mod/github.com/phpdave11/gofpdi@v1.0.13/importer.go:71 +0x3c5
github.com/signintech/gopdf.(*GoPdf).ImportPage(0xc00010c000, {0x5afc03, 0x35}, 0x1, {0x5a6c12, 0x9})
        /home/iris/Project/Go/pkg/mod/github.com/signintech/gopdf@v0.10.8/gopdf.go:1033 +0x7c
main.main()
        /home/iris/Project/Go/go-study/pdf/import_pdf_writer.go:25 +0x385

Debugger finished with the exit code 0


@rorycl
Copy link

rorycl commented Mar 8, 2022

Please see my comments above about using pdftk to fix xref tables.

I haven't tried it yet, but I wonder if using the pdfcpu module might work. Check out the watermark method at https://pdfcpu.io/core/watermark. If the cli tool works the API version https://pkg.go.dev/github.com/pdfcpu/pdfcpu/pkg/api#AddPDFWatermarksFile is probably what you want.

@lsymy
Copy link

lsymy commented Apr 27, 2022

Save my day, thanks@rorycl

@rorycl
Copy link

rorycl commented Apr 27, 2022

That's great to hear @lsymy. Do you have an implementation example to share?

Cheers,
Rory

@0x-1
Copy link

0x-1 commented Jun 22, 2022

grafik
pdfcpu doesn't seem to like this one too. Tired it with api.Optimize(in, out, default) causing this error.

edit: Disabled validation, doesn't error but the resulting byte slice doesn't seem to be openable when processing the file afterwards.

Can confirm that unipdf can fix the file like pdftk too. But this one always has a watermark @ saving w/o subscription.
pdftk is the way to go here if it's free in basic features like open/save once before processing.

@rorycl
Copy link

rorycl commented Feb 18, 2023

@phpdave11 : any news on a possible fix for this?

Most of the time gofpdi works great, but the xref table problem causing a panic is affecting the users of rm2pdf; see rorycl/rm2pdf#4.

Using pdftk to rebuild the pdf fixes the issue in most cases. I guess pdftk rebuilds the xref table.

Thanks for any response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants