Adding annotations to the PDF to link back its content to its source. #2192
+63
−14
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello!
Before anything, a bit of context: this PR is a work in progress, and it is not ready to be merged as such. It will require some more work in order to be eventually added to the main branch, as discussed beforehand with @liZe and @grewn0uille. The idea behind this first draft is to allow WeasyPrint to embed metadata in the PDF for each HTMLElement with an
id
attribute it converts by adding new\Annot
PDF objects that can then be accessed in the PDF readers.What it allowed me to do for now is this:
On the left, you have a webpage; and on the right, you have the PDF produced by this fork of WeasyPrint, previewed with PDF.js. A few event listeners were added to bidirectionally "synchronize" the two visualisations. This is just a proof-of-concept, but from there we basically have what we need to build powerful interfaces that take into account the content of the PDF as semantic data that can be linked back to its source.
We talked about adding a PDF variant for debugging that could be accessible through an option like
--pdf-variant debug
. For now, nothing has been done in this direction, the code I propose here is just "hardcoded" into the default behaviour of WeasyPrint. I guess it will need some cleanup also, as I'm not sure if I understood the spec totally right.Anyway, I'd be really interested in working with you on this and going in a direction that suits the philosophy of the project. If you feel like I could be of help, please share your thoughts here so that we can discuss what would be the best way to proceed, and how I could contribute further!
I can also share on demand the code of the interface I'm building, even though it is not ready to be made totally public for now, so don't hesitate to ask :)
Thanks for the great job!