-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
structMap[@TYPE=OCR-D-LOGICAL] / FULLDOWNLOAD #154
Conversation
debug: smLink
@@ -160,6 +199,83 @@ encodings of the same page. | |||
</mets:structMap> | |||
``` | |||
|
|||
## OCR-D structMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not make sense to put this section past Grouping files by page
– the latter should be integrated into the former as a subsection!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here my proposal for a new document structure:
Requirements on handling METS/PAGE
-
Metadata
1.1 Unique@ID
for the document processed -
Images
2.1. Pixel density of images must be explicit and high enough
2.2. No multi-page images
2.3 Image coordinates
2.4 If in PAGE then in METS -
File Group
mets:fileGrp
3.1@USE
syntax
Examples -
File
mets:file
4.1@ID
syntax
Examples
4.2@MIMETYPE
syntax
Examples
Examples (Media Type for PAGE XML) -
Grouping files by page
mets:structMap
Example
5.1@TYPE
syntax
Example -
Range of pages
mets:structLink
Example -
Paths
7.1 Always use URL or relative filenames
Example -
Recording processing information in METS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I would:
- subsume 8 (processing information) under 1 (metadata)
- abandon 2.3 (frankly, I don't know why this resides here and not just in PAGE.md)
- replace 2.3 with a general note about original/derived images (what is now in PAGE.md, but including new language from Alternative image same folder #164)
But I wonder: where in that outline did Fulldownload
go? Is it still subsumed under 4.1 for you? (We discussed this elsewhere: then you cannot make these subsections self-contained.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- abandon 2.3 (frankly, I don't know why this resides here and not just in PAGE.md)
- replace 2.3 with a general note about original/derived images (what is now in PAGE.md, but including new language from Alternative image same folder #164)
That's right, I think 2.3 is better found in page.md.
- subsume 8 (processing information) under 1 (metadata)
That`s a good proposal.
But I wonder: where in that outline did
Fulldownload
go? Is it still subsumed under 4.1 for you? (We discussed this elsewhere: then you cannot make these subsections self-contained.)
But I wonder: where in that outline did
Fulldownload
go? Is it still subsumed under 4.1 for you? (We discussed this elsewhere: then you cannot make these subsections self-contained.)
Yes, Fulldownload
is a section/part under 4.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requirements on handling METS/PAGE
1 Metadata
1.1 Recording processing information in METS
1.2 Unique @ID
for the document processed
2 Images
2.1. Pixel density of images must be explicit and high enough
2.2. No multi-page images
2.3 If in PAGE then in METS
3 File Group mets:fileGrp
3.1 @USE
syntax
Examples
3.2 @USE="FULLDOWNLOAD_..."
Examples
4 File mets:file
4.1 @ID
syntax
Examples
4.2 @MIMETYPE
syntax
Examples
Examples (Media Type for PAGE XML)
5 Grouping files by page mets:structMap
Example
5.1 @TYPE
syntax
Example
6 Range of pages mets:structLink
Example
7 Paths
7.1 Always use URL or relative filenames
Example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this resolved?
No, AFAICS it is not. The section Fulldownload
should still be part of the file ID syntax (differentiating between page-local and document-global naming scheme, but not trying to formulate this "self-contained"). Also, the section about grouping files by structMap
should come below fileGrp and file ID sections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Or did you want to do all that in a separate PR, or just wait for the merge with master?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think also 7.1 could be 1.3 instead.
Yes!
Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would agree with @bertsky to implement these changes:
replace 2.3 with a general note about original/derived images (what is now in PAGE.md, but including new language from #164)
The section
Fulldownload
should still be part of the file ID syntax (differentiating between page-local and document-global naming scheme, but not trying to formulate this "self-contained").
Also, the section about
grouping files by structMap
should come below fileGrp and file ID sections.
I think also 7.1 could be 1.3 instead.
No description provided.