Google Books .pdf document equivalence problem #7884

chrisaldrich · 2023-03-12T23:26:07Z

I've noticed on a couple of .pdf documents from Google books that their fingerprints, lack thereof, or some other glitch in creating document equivalency all seem to clash creating orphans.

Example, the downloadable .pdf of Geyer's Stationer 1904 found at
https://www.google.com/books/edition/Geyer_s_Stationer/L507AQAAMAAJ?hl=en&gbpv=0 currently has 109 orphaned annotations caused by this issue.

See also a specific annotation on this document: https://hypothes.is/a/vNmUHMB3Ee2VKgt4yhjofg

robertknight · 2023-03-13T07:16:16Z

I've noticed on a couple of .pdf documents from Google books that their fingerprints, lack thereof, or some other glitch in creating document equivalency all seem to clash creating orphans.

For debugging issues like this, the fingerprint is displayed under Help => About this version.

I've noticed on a couple of .pdf documents from Google books that their fingerprints, lack thereof, or some other glitch in creating document equivalency all seem to clash creating orphans.

I don't see a PDF download option on that page. Instead there is a "Preview unavailable" message. This might be related to us being in different locations.

I was however able to reproduce your problem by searching for two differently freely available books ("Great Expectations" and "Oliver Twister") and clicking the "Download PDF" link for the first item that was freely available. In both cases the books had different URLs and content but the same fingerprint value ("ca474facea1eb6917376bd8394b060ad"). This looks like an MD5 hash of some value, but I don't know what. From the browser console, there is some more info about how the PDF was created:

PDF ca474facea1eb6917376bd8394b060ad [1.4 Google Books PDF Converter (rel 3 12/12/14) / -] (PDF.js: 2.14.137)

The PDF converter release date mentioned is quite old, so it is possible that the issue might have been fixed in a more recent version. It would be worth checking some newer publications that are available in case this happened. If the problem still exists it would be worth reporting to Google, since the fingerprint is a standard part of the PDF specification, where it is called the "File ID".

As for workarounds, we could perhaps do something like checking for known-bad PDF generation tools and substituting some other fingerprint. This would break existing annotation links though.

leedenison added the bug label May 19, 2023

leedenison mentioned this issue Jul 5, 2023

Refactor Document Equivalence hypothesis/product-backlog#1505

Open

mkdir-washington-edu added the document equivalence label Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Books .pdf document equivalence problem #7884

Google Books .pdf document equivalence problem #7884

chrisaldrich commented Mar 12, 2023

robertknight commented Mar 13, 2023

Google Books .pdf document equivalence problem #7884

Google Books .pdf document equivalence problem #7884

Comments

chrisaldrich commented Mar 12, 2023

robertknight commented Mar 13, 2023