Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WARNING: Unsupported annotation subtype: /'Square' #46

Closed
john-catalano opened this issue Nov 17, 2021 · 15 comments
Closed

WARNING: Unsupported annotation subtype: /'Square' #46

john-catalano opened this issue Nov 17, 2021 · 15 comments

Comments

@john-catalano
Copy link

john-catalano commented Nov 17, 2021

How can I have pdfannots return any value at all - page number, etc. - for annotation subtype: /'Square'?

Here's why... the markup annotation - the marker highlighter annotation - used in iOS is (apparently) reported by pdfannots as subtype: /'Square'. Although, my PDF app - PDF Expert - reports this annotation type as "Rectangle".

For this use case, I simply need pdfannots to return even just a page number, for any type of annotation, at all, on my PDF.

@0xabu
Copy link
Owner

0xabu commented Nov 17, 2021

Do you have a sample annotated PDF you can share?

@0xabu
Copy link
Owner

0xabu commented Nov 17, 2021

I took a quick look at the spec. A "Square" annotation (which despite the name is really a rectangle) could probably be treated as a more limited form of highlight where we capture any text inside the rectangle. Would that make sense for your use-case?

@john-catalano
Copy link
Author

john-catalano commented Nov 17, 2021

Do you have a sample annotated PDF you can share?

Yes. The attached PDF contains both "Highlights" and the Square annotation type that Apple uses with its highlighter-style annotation palette in all iOS devices.
Case for changing Capture to Curate.pdf

Here's the palette, FYI:
PNG image

@john-catalano
Copy link
Author

I took a quick look at the spec. A "Square" annotation (which despite the name is really a rectangle) could probably be treated as a more limited form of highlight where we capture any text inside the rectangle. Would that make sense for your use-case?

Yes, it would. I would absolutely prefer to also have the text returned within the Square. :-)

@0xabu
Copy link
Owner

0xabu commented Nov 17, 2021

The PDF you uploaded contains a single Square annotation, but the "Rect" property for that annotation (which defines the region it covers on the page) begins at "E - Express" and continues until the end of "... you could pluck bits of". It includes the twitter link and the paragraph below it that are not actually highlighted in the pale yellow colour, so I don't think capturing all that text is what you want or what the program intended. I also noticed that other programs (e.g. the Chrome PDF reader, and SumatraPDF) don't interact with these annotations in the same way they do with true Highlight annotations, so I'm at a bit of a loss for what to do about them.

@0xabu
Copy link
Owner

0xabu commented Nov 17, 2021

Your annotations also have extra information in some proprietary fields under AAPL:AKExtras, especially AAPL:AKAnnotationV2 which appears to be a base64-encoded binary property list. I'd be impressed if any non-Apple tools can handle this.

@0xabu
Copy link
Owner

0xabu commented Nov 17, 2021

Going back to your original request, perhaps it's best if we treat "Square" as another type of note, so pdfannots can tell you that it's there on a certain page, and can emit the contents (text comment associated with the annotation) if it has any, but doesn't attempt to capture text covered by the annotation.

@john-catalano
Copy link
Author

Of course. It's Apple, so I understand. Having pdfannots recognize the existence of the annotation, along with any notes, as you suggested, is a good solution.

0xabu added a commit that referenced this issue Nov 17, 2021
Per issue #46, Apple tools support "highlighting" PDFs where the hightlights
are emitted as a Square annotation with a custom appearance that renders the
markup. Figuring out the affected text would be a major undertaking, but
with this change we (1) recognise the existence of the annotation rather than
emitting an "unsupported annotation" warning, and (2) capture the contents
(text note) of the annotation if any.
@0xabu
Copy link
Owner

0xabu commented Nov 17, 2021

Square annotations are now recognised, however depending on the output format you may not see anything, or you may just get a warning, if there is no actual "content" text. They'll always be there in the json format (--format=json), regardless.

@0xabu 0xabu closed this as completed Nov 17, 2021
@john-catalano
Copy link
Author

Great. How can I install the version that has this change?

I understand this change has not yet been made as part of a release, and so I cannot use "pip install pdfannots"

I've tried a few things after downloading the zip, but I've been unsuccessful.

@0xabu
Copy link
Owner

0xabu commented Nov 18, 2021

Per #35 I think you can do:

pip install 'git+https://github.com/0xabu/pdfannots.git'

(but I haven't tested it myself.)

@john-catalano
Copy link
Author

That seems to have installed the update because I no longer get the error message.

But, neither do I get any information about the existence of the square. Using the file I provided, what output should I expect?

@0xabu
Copy link
Owner

0xabu commented Nov 18, 2021

That's what I alluded to. If there is no comment, there's nothing meaningful to output as markdown, but json has it.

$ pdfannots tests/issue46.pdf
## Highlights

 * Page #1: "C – Curate"

 * Page #1: "This was a novel idea at the time"

$ pdfannots --no-group tests/issue46.pdf
 * Page #1 Highlight: "C – Curate"

WARNING: Square annotation at page #1 (54.948,539.534) has neither text nor a comment; skipped
 * Page #1 Highlight: "This was a novel idea at the time"

$ pdfannots --format=json tests/issue46.pdf
[
  {
    "type": "Highlight",
    "page": 1,
    "start_xy": [
      63.2125,
      602.7762
    ],
    "text": "C \u2013 Curate",
    "author": "Mobile User",
    "created": "2021-11-17T21:25:39"
  },
  {
    "type": "Square",
    "page": 1,
    "start_xy": [
      54.94776,
      539.5335
    ],
    "created": "2021-11-17T21:25:58"
  },
  {
    "type": "Highlight",
    "page": 1,
    "start_xy": [
      63.2125,
      392.7762
    ],
    "text": "This was a novel idea at the time",
    "author": "Mobile User",
    "created": "2021-11-17T21:26:38"
  }
]

@john-catalano
Copy link
Author

pdfannots --no-group

Yes... that's what I was missing. I see it now. Thank you.

@john-catalano
Copy link
Author

john-catalano commented Nov 18, 2021

For the sake of completeness, this is the check that now works for me. I use it in Hazel, to check if a PDF has any annotations in it. If yes, I add a Finder tag so that I can easily find the file (on my Mac, iPad, or iPhone) for further follow-up. I have a separate script, using pdfannots, to pull the highlighted text for further distillation.

if pdfannots --format=json "$1" | grep -e "Highlight" -e "Square"
then
   exit 0
else
   exit 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants