Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I would like to get access to a raw OCR fragment #80

Open
altomator opened this issue Mar 1, 2017 · 3 comments
Open

I would like to get access to a raw OCR fragment #80

altomator opened this issue Mar 1, 2017 · 3 comments

Comments

@altomator
Copy link

altomator commented Mar 1, 2017

Description

Some use cases need to get access to information stored in the OCR format:

For these use cases, getting access to the raw OCR objects (or reference to the...) from the IIIF annotation layer would be usefull.

@benwbrum
Copy link

benwbrum commented Mar 1, 2017

From the perspective of an OCR correction platform, I (the correction tool) would like to

  • See the types of OCR resources associated with a manifest (plaintext, HOCR, ALTO, DjVu)
  • See the types of OCR resources associated with a canvas (ditto).

@tomcrane
Copy link

tomcrane commented Mar 2, 2017

So far, people have been using seeAlso to link from canvas to ALTO:

"seeAlso": {
            "@id": "http://wellcomelibrary.org/service/alto/b22014068/0?image=11",
            "format": "text/xml",
            "profile": "http://www.loc.gov/standards/alto/v3/alto.xsd",
            "label": "METS-ALTO XML"
          }

The Newspaper working group have some guidelines around this - https://www.slideshare.net/kestlund/newspapers-iiif-and-alto

This could also be modelled as a service.

@altomator
Copy link
Author

My concern is that accessing the right element in the OCR file from the text annotation is not an straightforward process (using the geometrical information?)

 {
                    "@id":"http://dams.llgc.org.uk/iiif/3320863/annotation/5014243419640",
                        "@type":"oa:Annotation",
                        "motivation":"sc:painting",
                        "resource": 
                        {
                            "@type":"cnt:ContentAsText",
                            "format":"text/plain",                           
                            "chars":"NEWS."
                        },
                        "on":"http://dams.llgc.org.uk/iiif/3320860/canvas/3320863#xywh=5014,2434,196,40"
                    },

I suppose that for this specific use case (getting access to the XML stuff), we need another annotations list to reference XML external segments (http://iiif.io/api/presentation/2.1/#segments):

{
  "@context": "http://iiif.io/api/presentation/2/context.json",
  "@id": "http://example.org/iiif/book1/annotation/anno1",
  "@type": "oa:Annotation",
  "motivation": "sc:painting",
  "resource":{
    "@id": "http://example.org/iiif/book1/res/alto.xml#xpointer(//String[@id='Str_001'])",
    "@type": "dctypes:Text",
    "format": "application/alto+xml"
  },
  "on": "http://example.org/iiif/book1/canvas/p1#xywh=100,100,500,300"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants