feat: properly treat errors from recording out of bounds positions in very large files #290
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The library current PDF output is limited to ~10gbs.
This is due to a limitation of PDF when writing regular xrefs. 10 digits is what's used to represent position and so anything larger than what can be represented by 10 digits is simply out of bounds. Seems like i always had a check if the object written already exceeds this position on object start but never bothered with propagating this fact. There's a recent issue attempting to write very large PDF files (#289), and there's bad artifacts given this neglected failure.
PDF1.5 and higher provide a method to write xrefs as xref streams...and more important - set the length of the position descriptors. so you can set it to more than 10 digits if you need to create larger files. See towards the end about plans to support this.
Now i knew this would be a terrible one to fix, given it's such a basic and low level functionality in the library, which directly affects things like starting a new object in either fashion. Still. what's needs fixing will be fixed.
Most of the methods were able to properly propagate status without breaking API. i sometimes had to convert a void function to return status...but other than that (which doesnt in itself break existing usages..though maybe it'll be good to start consulting the result) things seem ok.
One area where this change effects the convenience of the API is the content context. It's really nice to be able to write output without having to check status and this kinda ruins it.
So, to make things simple still I coded stuff so that you can write multiple commands without consulting the returned status, and when done you can then check the status of the context and through that learn if there's a failure or not. So...not each time...but rather after a sequence of commands. Use AbstractContentContext::GetCurrentStatusCode() to get the current status. it can only go from eSuccess to eFailure, so once it does no point in continuing.
Do note still that with all these options to check status things are the same. It is only that if you managed to exceed 10gbs or so than the new behavior will properly propagate the inability to represent positions of that size. You probably are not doing that now if the files do not come out corrupted. The only difference is that instead of getting an error on PDFWriter::EndPDF() you'll get it earlier, at the point where the library notices that it exceeded what can be represented by Xref.
While it's nice to get an early warning it'd be nicer to be able to create very large file (well at least till we get to long long which is how file sizes are represented here). Im thinking of taking care of this by providing 1.5 style xref streams. Been dealing with those when added file modifications years ago. i just need to provide a method to write them also on regular files, and provide the option to choose them (and the bytesize). I'll think positively about doing that is about what im willing to say at this point ;).