Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunks and renumbering #18

Merged
merged 4 commits into from
Oct 4, 2023
Merged

Chunks and renumbering #18

merged 4 commits into from
Oct 4, 2023

Conversation

laurmaedje
Copy link
Member

This PR introduces the concept of chunks. A chunk is a free-standing collection of indirect objects that can be written to independently from the main writer. It can then be added to it (or another chunk) later. This makes it possible to write two things at once. The main PDF writer derefs to a chunk so it exposes all its methods.

Moreover, this adds a renumbering routine to chunks. This routine remaps IDs of indirect objects and indirect references in a chunk after it has already been written. It does that with just enough knowledge of the PDF syntax to find indirect references and skip "false positives" (like (2 0 R) where a reference-like thing is in a string).

The main motivation for renumbering is to be able to create a stable, memoizable chunk for an SVG in Typst and then only copy it over to the main PDF during writing, patching up IDs. However, it could also be potentially useful for including one PDF file in another without any ID collisions. For that to work, we'd need to add a routing that parses a full PDF file into a Chunk (which shouldn't be too hard since the top-level syntax is not that complex).

In order to keep the naming consistent, this PR also renames the main PdfWriter to Pdf (in line with Content and the new Chunk). The Writer suffix isn't used anywhere the crate anymore now. It also makes the written strings a bit nicer (motivated by being able to test unescaped balanced parentheses in the renumbering) and refactors the intergration tests into unit tests. As a final thing, it removes the unnecessary (and unexported) Type trait.

src/renumber.rs Outdated Show resolved Hide resolved
We of course do not want to be triggered by "endobj" somewhere in the object.
@laurmaedje laurmaedje merged commit 441dc20 into main Oct 4, 2023
@laurmaedje laurmaedje deleted the chunks branch October 4, 2023 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants