-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving annotations API to get access to all annotations stored in the PDF #5283
Comments
cc @HeXXiiiZ |
do we have any resolutions for this? |
Somebody has to implement it. :-) |
I see that it's not assigned - do we know if anyone is working on it? I have a personal interest in the annotations API describing form elements more completely and have been digging around in the AcroForms section of the PDF spec and pdf.js code lately. I might be able to contribute. |
I'm working on the annotation layer to refactor it (see https://github.com/mozilla/pdf.js/commits/master/src/core/annotation.js for an idea of the kind of patches I make), but I'm not touching the API for that. Feel free to work on this issue and create a PR once you have a working version to get early feedback. |
@mitar By the way, doesn't https://github.com/mozilla/pdf.js/blob/master/src/display/api.js#L743 at least partially do what you want? Maybe you were already familiar with it; in that case you can ignore my comment. |
No, this is what this ticket is about. I listed above the limitations of current API. It does not give you access to all annotations and all their properties. It seems like API provides only things which are rendered by pdf.js or used by it. But like things other apps add are not available. Despite it seems being a normal PDF standard. |
This is something I have worked on for a long time. I have a modified pdf.js version that supports this feature here (sorry, the fork is a mess) but it's not based on the current pdf.js version. I use it in zotfile to extract highlighted text from pdf files. It would be great if the API supports getting the annotation text directly but I am not sure whether it's in the scope of pdf.js |
Hello, I'm also interested in this. What kind of annotations can PDF.js extract? E.g. can the comment text be extracted too? Can the colour be extracted? |
At this point in time we support a lot more Annotation types, compared to when this issue was opened (7 years ago), and for any unsupported types we'll return "generic" Annotation-data. Hence all Annotations should now, at least to some extent, be accessible through the API (and we cannot return arbitrary unverified data for unsupported Annotations). Given the age of this issue, and that #5283 (comment) mentions a bunch of different things, it doesn't seem useful to keep this open any more. If there's still specific issues encountered, please open a new issue for each problem observed; see also https://github.com/mozilla/pdf.js/blob/master/.github/CONTRIBUTING.md |
Currently, support for annotations is incomplete. Along with missing support for some types of annotations, there is also incomplete support for accessing information about annotations through API in the first place.
I am interested in using PDF.js to convert annotations stored in PDFs into open annotation standard, used by new W3C web annotations working group. For that I would be interested in having a PDF.js API which would return all annotations and highlights stored in the PDF. Even if they are not supported by PDF.js rendering them, they could at least be returned for consumption through API.
In particular, the issues I am observing (see this example PDF) are:
The text was updated successfully, but these errors were encountered: