feat: properly treat errors from recording out of bounds positions in very large files #290

galkahana · 2024-12-07T19:41:11Z

The library current PDF output is limited to ~10gbs.
This is due to a limitation of PDF when writing regular xrefs. 10 digits is what's used to represent position and so anything larger than what can be represented by 10 digits is simply out of bounds. Seems like i always had a check if the object written already exceeds this position on object start but never bothered with propagating this fact. There's a recent issue attempting to write very large PDF files (#289), and there's bad artifacts given this neglected failure.

PDF1.5 and higher provide a method to write xrefs as xref streams...and more important - set the length of the position descriptors. so you can set it to more than 10 digits if you need to create larger files. See towards the end about plans to support this.

Now i knew this would be a terrible one to fix, given it's such a basic and low level functionality in the library, which directly affects things like starting a new object in either fashion. Still. what's needs fixing will be fixed.

Most of the methods were able to properly propagate status without breaking API. i sometimes had to convert a void function to return status...but other than that (which doesnt in itself break existing usages..though maybe it'll be good to start consulting the result) things seem ok.

One area where this change effects the convenience of the API is the content context. It's really nice to be able to write output without having to check status and this kinda ruins it.
So, to make things simple still I coded stuff so that you can write multiple commands without consulting the returned status, and when done you can then check the status of the context and through that learn if there's a failure or not. So...not each time...but rather after a sequence of commands. Use AbstractContentContext::GetCurrentStatusCode() to get the current status. it can only go from eSuccess to eFailure, so once it does no point in continuing.

Do note still that with all these options to check status things are the same. It is only that if you managed to exceed 10gbs or so than the new behavior will properly propagate the inability to represent positions of that size. You probably are not doing that now if the files do not come out corrupted. The only difference is that instead of getting an error on PDFWriter::EndPDF() you'll get it earlier, at the point where the library notices that it exceeded what can be represented by Xref.

While it's nice to get an early warning it'd be nicer to be able to create very large file (well at least till we get to long long which is how file sizes are represented here). Im thinking of taking care of this by providing 1.5 style xref streams. Been dealing with those when added file modifications years ago. i just need to provide a method to write them also on regular files, and provide the option to choose them (and the bytesize). I'll think positively about doing that is about what im willing to say at this point ;).

… very large files

feat: properly treat errors from recording out of bounds positions in…

1354e7b

… very large files

galkahana merged commit 015bd5e into master Dec 7, 2024
7 checks passed

galkahana deleted the galk.safe_xref_writing.check_status_when_starting_object branch December 8, 2024 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: properly treat errors from recording out of bounds positions in very large files #290

feat: properly treat errors from recording out of bounds positions in very large files #290

galkahana commented Dec 7, 2024 •

edited

Loading

feat: properly treat errors from recording out of bounds positions in very large files #290

feat: properly treat errors from recording out of bounds positions in very large files #290

Conversation

galkahana commented Dec 7, 2024 • edited Loading

galkahana commented Dec 7, 2024 •

edited

Loading