Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content search not working for druid:vq627fg9932 and 18 additional druids #349

Open
caaster opened this issue Nov 5, 2021 · 2 comments
Open

Comments

@caaster
Copy link

caaster commented Nov 5, 2021

On 11/4/21, 2:55 PM, "purl-feedback on behalf of feedback@purl.stanford.edu" <purl-feedback-bounces@lists.stanford.edu on behalf of feedback@purl.stanford.edu> wrote:

Name: Emma Frothingham
Email: emma.frothingham@stanford.edu
Comment re: druid:vq627fg9932
Noticing this and some other transcripts are having some OCR issues. When you do a text search (I've been
using "interview" since I know its in the front matter) it will take you to the correct page, but the text
that's highlighted is incorrect. 
Also noticing this on the following DRUIDs: qk681gt0202, cf701vv6058, hd885gx4798, fp554yc9826, ky320sk0660,
rm620vw2642, cy927bt6518, cb635cj8418, zy935dw5016, tv486ws8292, st067nr2181, yq520nc5196, vc245bc2056,  
fr376sh7526, cc039qv8268, wg274vq1218, mt661sw7493, wd772fk6025

In a discussion with @anarchivist, they indicate this seems to either be a bug in ABBYY, or a bug in content search. It is noted that for druid:vq627fg9932, the ABBYY-generated page size in the XML differs from the actual page size found in the technicalMetadata, hence the incorrect display of the hit-highlighting.

@calavano do you have any thoughts here? Mark says this is not about Alto 3.1.

@anarchivist
Copy link

To clarify, for https://purl.stanford.edu/vq627fg9932 for example the image size as reported for the first page:

  • in the public XML and by the IIIF server is 1700x2200
  • in the ALTO file as 3400x4400

I haven't looked at all the druids provided by Emma, but I'm seeing the same thing on qk681gt0202.

@andrewjbtw
Copy link

These items were accessioned around 2019 and while I haven't checked all of them, it does not look like the files have ever been changed after accessioning. So they've had mismatching ALTO and image dimensions for a couple of years. Since the SDR accessioning process doesn't change the image size, I wonder if something happened during the OCR process where a set of images were OCR'd and then the images re-sized post-OCR and pre-SDR?

Looking at more recent oral history accessions, I don't see the same image dimension mismatches so it doesn't look like a current, ongoing problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants