You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To be able to read the highlighted text via API itself, instead of relying on third-party tools.
What went wrong? (add screenshot)
Not sure. So far, it seems either/and:
the API functionality is there - I simply can't find it
it was never designed for such a use case in the first-place
comments/highlights/annotations are to be rendered, not accessed
If the PDF-JS doesn't support such a feature, and likely never will, any pointers in the direction of any PDF library that does would be highly appreciated. If it does, please forgive my oversight. The API page of the project isn't exactly the most helpful resource in its current state. Blind lookups of "annotation" and "highlight" in the api.js file didn't add much to the clarity, either - unfortunately.
The text was updated successfully, but these errors were encountered:
Unfortunately, the pdf specifications don't say that the highlighted text is a part of the annotation data.
For example for the highlight annotation on page 1:
so as far as I can tell the only thing you can do is to get the quadPoints from the annotation, get the text layer which will contain the coordinates of the text and you'll have to find the text corresponding to the quadPoints.
FYI, there is almost no chance that we add this feature in pdf.js, except if you can demonstrate that it could useful in the Firefox context.
As explained in #17509 (comment) the PDF file-format wasn't really created with such a use-case in mind, since the text-content of the document is completely separate from the annotations.
PDF file:
Given HowtoReadPaper.pdf - the goal is to read the first and only "How to Read a Paper" text, previously highlighted.
Configuration:
Steps to reproduce the problem:
npm init
or similar.pdfjs
to the list of packages.getDocument
.numPages
withgetPage
andgetAnnotations
.any[]
at the end of the last call.JSON.stringify
, receive the "sample" from below.sample
What is the expected behavior? (add screenshot)
To be able to read the highlighted text via API itself, instead of relying on third-party tools.
What went wrong? (add screenshot)
Not sure. So far, it seems either/and:
If the PDF-JS doesn't support such a feature, and likely never will, any pointers in the direction of any PDF library that does would be highly appreciated. If it does, please forgive my oversight. The API page of the project isn't exactly the most helpful resource in its current state. Blind lookups of "annotation" and "highlight" in the api.js file didn't add much to the clarity, either - unfortunately.
The text was updated successfully, but these errors were encountered: