Skip to content

Commit

Permalink
tweak README, add demo/banner image
Browse files Browse the repository at this point in the history
  • Loading branch information
0xabu committed Feb 20, 2022
1 parent 3075620 commit 0e68261
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 14 deletions.
29 changes: 15 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,12 @@
[![PyPI version](https://img.shields.io/pypi/v/pdfannots)](https://pypi.org/project/pdfannots/)

This program extracts annotations (highlights, comments, etc.) from a PDF file,
and formats them in a variety of ways. It is primarily intended for use in
reviewing submissions to scientific conferences/journals.
and formats them as Markdown or exports them to JSON. It is primarily intended
for use in reviewing submissions to scientific conferences/journals.

For the default markdown format, the output is as follows:
![Sample/demo of pdfannots extracting Markdown from an annotated PDF](doc/demo.png)

For the default Markdown format, the output is as follows:

* Highlights without an attached comment are output first, as
"highlights" with just the highlighted text included. Note that
Expand All @@ -25,11 +27,11 @@ For the default markdown format, the output is as follows:
of this is to easily separate formatting or grammatical corrections
from more substantial comments about the content of the document.

For each annotation, the page number is given, along with the
associated (highlighted/underlined) text, if any. Additionally, if the
document embeds outlines (aka bookmarks), such as those generated by
the LaTeX hyperref package, they are printed to help identify to which
section in the document the annotation refers.
For each annotation, the page number is given, along with the associated
(highlighted/underlined) text, if any. Additionally, if the document embeds
outlines (aka bookmarks), such as those generated by the LaTeX
[hyperref](https://ctan.org/pkg/hyperref) package, they are printed to help
identify to which section in the document the annotation refers.


### Usage
Expand All @@ -47,8 +49,8 @@ options and invocation.
### Known issues and limitations

* While it is generally reliable, pdfminer (the underlying PDF parser) is
less accurate than other tools (Poppler's pdftotext) at extracting text
from a PDF. It has been known to fail in several different ways:
not infallible at extracting text from a PDF. It has been known to fail
in several different ways:

* Sometimes it misses or misplaces individual characters, resulting in
annotations with some or all of the text missing (in the latter case,
Expand Down Expand Up @@ -83,8 +85,8 @@ options and invocation.

1. I'd like to change how the output is formatted.

Some minor tweaks (e.g.: word wrap, skipping sections) can be accomplished
via command-line arguments.
Some minor tweaks (e.g.: word wrap, skipping or reordering output sections)
can be accomplished via command-line arguments.

All of the output comes from the relevant `Printer` subclass; more elaborate
changes can be accomplished there. Pull requests to introduce new output
Expand All @@ -95,5 +97,4 @@ options and invocation.
I hope that it was a constructive review, and that the annotations
helped the reviewer give you more detailed feedback so you can improve
your paper. This is, after all, just a tool, and it should not be an
excuse for reviewer sloppiness. Note that I am not the only user of
this script.
excuse for reviewer sloppiness.
Binary file added doc/demo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 0e68261

Please sign in to comment.